The Two Cultures of Statistical Modeling

Reading Breiman’s 2001 Paper in 2026

Statistics
Machine Learning
Data Science
Philosophy of Science
Published

April 6, 2026

Why I Read This Paper

I kept seeing this paper mentioned everywhere in lectures, in blog posts, in Twitter threads. “Breiman 2001” kept coming up like some kind of secret handshake between people who take statistics seriously. So I finally read it.

It was written in 2001. But honestly? It feels like it was written about today.

This post is my attempt to walk through the paper’s main ideas in plain language. I’m not an expert. I’m just someone who found this paper really thought-provoking and wanted to share what I learned.


1 The Setup: What Even Is a Statistical Model?

Breiman starts with a simple picture. You have some input data x, and you want to predict some output y. Between them sits a black box nature, doing something mysterious to connect the two.

\[ \mathbf{x} \longrightarrow \boxed{\text{???}} \longrightarrow \mathbf{y} \]

The question is: how do you figure out what’s inside that box?

According to Breiman, in 2001, basically all statisticians answered this question the same way. And he thought that was a big problem.

1.1 Two Very Different Answers

The data modeling culture says: assume nature works in a specific way (like linear regression), fit your data to that assumption, and draw conclusions. Clean, elegant, well-understood mathematically.

The algorithmic modeling culture says: we don’t know how nature works, so let’s just find whatever function predicts best. Don’t assume, just learn.

Breiman’s argument is simple: 98% of statisticians were doing the first thing, and he thought that was causing real problems.


2 Breiman’s Background: He Wasn’t Just Theorizing

One thing that makes this paper different from a typical academic critique is that Breiman actually spent 13 years as a freelance consultant before going back to academia. He worked on real, messy, high-stakes prediction problems ozone levels, chemical toxicity, speech, radar.

Two examples he shares:

Predicting LA ozone levels: They had 7 years of data, 450+ weather variables, and needed to predict next-day ozone 12 hours ahead. They used the best statistical tools available in the 1970s — linear regression, variable selection. It failed. The false alarm rate was too high. Breiman says he wishes he could go back and try it with modern tools.

Identifying chlorine in chemicals: Using mass spectra data, they needed to classify whether a compound contained chlorine. Standard methods struggled because the data had variable dimensions. Breiman built a decision tree with 1,500 binary questions. Result: 95% accuracy.

These two stories basically set up the whole paper. He learned from practice that you should use whatever works, not whatever fits a theoretical framework.

The principles he took from consulting
  1. Focus on finding a good solution — that is the goal
  2. Spend time with the data before building models
  3. Use whichever type of model works best for the problem
  4. Test your model on data it hasn’t seen before
  5. Computers are not just tools — they are partners

3 What’s Actually Wrong with Data Models?

This is where Breiman gets sharp. He argues that the statistical community’s love for data models was causing real harm.

3.1 The core problem

When you fit a data model, the conclusions you draw are about your model, not about reality. If your model is wrong, your conclusions are wrong. Sounds obvious, right? But he argues people kept ignoring this.

His example: someone ran a linear regression to check for gender discrimination in faculty salaries. The gender coefficient was significant at the 5% level. Case closed, proof of discrimination, right? Breiman’s point is that nobody checked whether a linear model was even appropriate for this data in the first place.

3.2 Goodness-of-fit tests don’t really work

You might say “but we test whether the model fits!” And Breiman says yeah, those tests are weaker than you think. He ran a simulation with a clearly non-linear dataset and standard tests kept saying “the linear model fits just fine” until the non-linearity became extreme.

William Cleveland, one of the people who invented residual analysis, admitted at a seminar that residual analysis can’t reliably detect problems in data with more than 4-5 dimensions. Most real datasets have way more than that.

The uncomfortable truth

A model that “passes” goodness-of-fit tests isn’t necessarily correct. It just means the tests couldn’t detect the problem. In high dimensions, many wrong models look fine on paper.

3.3 The multiplicity problem

Here’s another thing that bothered him. Say you have 30 variables and you want to find the best 5-variable linear model. There are around 140,000 possible combinations. Many of them will fit your data almost equally well but they’ll tell completely different stories about which variables matter.

Breiman called this the Rashomon Effect, named after a Kurosawa film where four witnesses describe the same event in completely contradictory ways. The effect is real in statistics too.

So which model is “correct”? Breiman’s uncomfortable answer is: there’s no good way to tell. And yet, each model would lead you to different conclusions about what drives the outcome.


4 The Algorithmic World Was Moving Fast

While statisticians were busy debating which parametric model to use, something was happening outside statistics departments.

In the mid-1980s, a community of computer scientists, physicists and engineers started working on prediction problems where it was obvious no standard model would work handwriting recognition, speech, image classification. They didn’t care about the model. They cared about whether it worked.

This community eventually became what we now call machine learning.

Their theoretical foundation was different too. Vladimir Vapnik built a theory of how well algorithms generalize to new data, which led to Support Vector Machines which at the time outperformed neural nets on many tasks.


5 Three Big Lessons from Machine Learning

Breiman identified three ideas from the ML world that he thought statisticians were ignoring.

5.1 1. The Rashomon Effect

Many models fit equally well but tell different stories. Aggregating them — like Random Forests do reduces this instability and usually improves accuracy.

5.2 2. Simpler isn’t always better

The traditional idea is: simpler models are better, interpretable is good. Breiman says that for prediction accuracy, this is just not true. Complex models predict better.

Here’s the actual data from the paper Random Forests vs single Decision Trees:

On some datasets the error is literally cut in half. The forest just predicts better. But it’s much harder to explain why it made a specific prediction. Breiman’s response: go for accuracy first, then figure out why.

5.3 3. More variables can be a good thing

Old wisdom said: if you have too many variables, reduce them. Pick the important ones.

Breiman says the opposite. The Amit-Geman handwriting recognition system used thousands of small geometric features and ran over 1,000 shallow trees — and got 0.7% error on a massive test set, near human-level performance.

SVMs took this even further. Vapnik showed that if your data isn’t separable, you can increase the dimensionality by adding polynomial features until it is. And he proved it works with a clean theoretical bound on how well the model generalizes.


6 Can a Black Box Give You Real Information?

The obvious objection to algorithmic models is: sure, they predict well, but you can’t understand what they’re doing. A doctor can read a logistic regression. Nobody can read a forest of 100 trees.

Breiman’s response: you’re asking the wrong question. The goal isn’t interpretability it’s accurate information. And sometimes complex models give you more accurate information than simple ones, even about which variables matter.

He shows this with three examples from real medical and biological datasets.

6.1 Hepatitis survival

155 patients, 19 variables, survive or die.

Method Error Variables called “important”
Logistic Regression 17.4% Variables 7 and 11
Random Forest 12.3% Variables 12 and 17
Always predict survival 20.6%

The random forest not only predicted better it pointed to completely different variables as important. When Breiman tested variables 12 and 17 individually, they were much more predictive than 7 and 11. The simpler model was literally pointing at the wrong things because it couldn’t fit the data properly.

6.2 Liver disease

On another medical dataset, the random forest’s similarity measure found that patients with severe liver disease actually split into two distinct groups high blood test scores vs low. Standard logistic regression never would have found this because it doesn’t look for subgroups.

6.3 4,682 genes, 81 patients

Classical statistics would say this is impossible you can’t fit a model with more variables than observations. Random forests handled it fine, got low error, and produced a ranking of which genes mattered most.

The actual point

Better predictive accuracy doesn’t just mean better predictions. It often means the model found something real in the data that a weaker model missed entirely.


7 Why This Paper Feels Different in 2026

When Breiman wrote this, deep learning barely existed. Random forests were new. The idea that a computer could beat a human at chess still felt futuristic.

Now look where we are. GPT-4, Claude, Gemini language models that nobody fully understands but that clearly work. AlphaFold predicted protein structures that biologists spent decades trying to solve, using a black box. Explainable AI has become its own research field, basically formalizing exactly what Breiman argued. The Rashomon Effect is now an active area of study Cynthia Rudin and others are building on it directly.

The 98/2 split he described in 2001 has completely flipped. Most ML researchers are clearly in the algorithmic culture now. The question has actually reversed how do we make these incredibly powerful black boxes more interpretable?

But his warning still applies in both directions. Using a complex model just because it’s trendy is as wrong as using linear regression just because it’s familiar. The data and the problem should drive the choice.

He ended the paper with this line, and I think it still holds:

The roots of statistics, as in science, lie in working with data and checking theory against data.


References

Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, 16(3), 199–231. https://doi.org/10.1214/ss/1009213726

Enjoyed this post?

Get an email when I publish something new on ML, statistics or deep learning.

No spam. Unsubscribe anytime.