Representation with Insufficient Dimensions

Celebrity Nooz: Max Casella

As a kid, I had a friend who spoke with a New York accent. He didn't pronounce the "H" in his "Hu"s. We can hear this when Trump speaks, "Huge" become "Yuge." Among our friends was a boy named "Hugh." At one point my friend was reminiscing and asked me,

"You remember when Hugh XYZ'ed? That was so crazy."

To which I responded, "I never did that."

"No Hugh did."

"No I don't remember ever doing that. Seriously."

"No Hugh did, Hugh."

"Not me man, you're crazy."

At last, he added Hugh's last name, I caught on, and we had a good laugh.

I see this same pattern frequently in software and it goes like this:

Observing the data, an engineer notes that two concepts behave uniformly
Not being a subject matter expert, they make the decision to represent both concepts as one and the same. After all, the parsimonious solution reduces space and complexity
Some time later, data that violates their assumption appears and breaks their model

This is caused by a non-subject matter expert making assumptions about the problem domain. It often appears as tech debt. The fix can be pretty high effort. Moving from low dimensional representation to higher is generally expensive. For a production system, in addition to adopting the higher complexity transformation in the software itself, the historical data must also be remediated.

This runs contrary to our intuition about parsimony. When in doubt best to design for the higher dimension representation. Even if your SME tells you, "it'll never happen," with enough time it may. As long as the higher dimension model isn't prohibitively expensive, using it buys you the option to accommodate unknowns with minimal effort or risk.

Search This Blog

Representation with Insufficient Dimensions

Comments

Post a Comment

Popular posts from this blog

Is it Time to Become an ML Engineer?

Engineering Truisms

The Contemporary Yamanba