APR 22, 2026

Taste is the rarest skill in machine learning

Opinion · Career · 6 min read

Technical skill is table stakes. This is uncomfortable to say as someone still building technical skill, but I think it is true and worth sitting with.

The field selects heavily for a specific type of intelligence: the ability to understand and implement complex mathematical machinery. Linear algebra, calculus, probability theory, optimization. These are real and hard and necessary. I am not diminishing them. I spend a lot of time on them.

But they are increasingly available. The tooling has abstracted away enormous amounts of implementation complexity. HuggingFace's Trainer class, PyTorch Lightning, Keras: these do not replace understanding, but they do mean that the gap between "knows the math" and "shipped something that works" has narrowed dramatically. More people can ship working ML systems now than at any point in the history of the field. The supply of technical competence is increasing. The supply of taste is not.

What do I mean by taste?

I mean the ability to look at a model's outputs and know something is wrong before you can articulate why. The ability to choose the right problem to work on, not just solve the problem in front of you, but question whether it is worth solving. The ability to design an evaluation that actually measures what you care about, not just what is easy to measure. The ability to know when a simpler model is better than a complex one, even when the complex one benchmarks higher.

These are judgment calls. They cannot be fully formalized. You develop them by building a lot of things, paying attention to what works and what does not, and caring about the quality of the output in a way that goes beyond the metric.

The evaluation problem

The evaluation problem is where I notice this most acutely.

In NLP, it is well-documented that BLEU score, a common metric for translation quality, can be gamed in ways that produce high scores but poor translations. A model can learn to produce n-gram overlaps without producing fluent, accurate language. The metric says "good." A human reader says "wrong."

This happens everywhere in ML. Accuracy on imbalanced datasets. FID scores for image generation that do not capture semantic coherence. Reward hacking in reinforcement learning: the famous example of a boat racing game where a model learned to spin in circles collecting power-ups rather than finishing the race, achieving high reward while completely missing the point.

These are not just technical failures. They are failures of judgment about what to measure and why. They are taste failures.

Useful ML vs. impressive ML

Taste is also what separates useful ML from impressive ML.

I have seen technically sophisticated models that no one uses because the output is subtly wrong in ways that erode trust over time. I have seen simple logistic regression models, transparently built and carefully evaluated, that become load-bearing infrastructure for a team because people trust them. The technical complexity of the first did not matter. The judgment embedded in the second, about what to optimize, how to present uncertainty, when the model should abstain from predicting, did.

Can taste be taught?

I do not think taste can be taught directly. But I think it can be cultivated.

It requires exposure to a lot of different work, not just ML papers, but design criticism, architecture, film, writing. Anything that involves making judgment calls about what matters and why. It requires being willing to say "this is wrong" without always being able to justify it formally, and then sitting with that discomfort long enough to figure out why you thought so. It requires caring about things beyond the benchmark.

It also requires a certain resistance to the local optimum. The technically complex solution is satisfying in a way the simple solution is not. It is harder to ship the elegant thing. Taste is partly the discipline to do it anyway.

I am still developing mine. I think that is the work. I do not think it ends.