Modeling March Madness 2020, a study in data aesthetics

This was prepared by a team including Mariah Samano, Haley Daarstad, Angel Le, Quinn Downey, Simon Hutton, and me, Dan.

A few months ago, we started on the journey of predicting the results to March Madness. Suddenly, the apocalypse came and everything was cancelled. It turns out the end of the world is boring so we continued on this journey by creating what the brackets would have been if these current events had not occurred.

So for those of you playing along at home, we project Michigan to defeat Wisconsin in the national final. With those teams defeating Gonzaga and Villanova in the final four respectively. This is bracket 4/the Orange Bracket. Go Beavs!

The bigger question for us is how we got here, not so much the picks themselves. What we thought would be an objective process turned on human decisions and intuition and political choices. Each bracket presented here was produced using a machine learning model called a random forest (10,000 trees per run, which is just an arbitrary large, round number). We used the Lunardi bracketology from April 14 (he continues to update, we could easily use any of these). Each accuracy rating is derived from the 80/20 train/validate breakdown. Of course the games did not take place, these are merely projections.

For those of you playing along at home, the data and code needed to cook your own brackets can be found in this github repo:

Note, you can’t directly replicate these brackets as you will not have the same 80/20 split that informed our work, you can produce many very similar brackets though.

Here is a good starting point. Four brackets, and four methods. Which one do you think would be best for winning big money, which one would be most plausible?

You can think of our bracket options like “Goldilocks and the Three Bears.” We got our piping hot bracket, hot bracket, too cold bracket and bracket that was just right.

When we looked at the first bracket (Figure 1), it did not contain basketball data. This produced a bracket that was wild with multiple upsets and an underdog in the final.

Total rebounds, total points scored for the season, total points against for the season, turnovers for the season, and team losses this season. No fancy stats, weights, z-scores, or normalizations. Let the machine learn from ten years of games, internally 76.2% accurate.

When we get to looking at the second bracket (Figure 2), it attempts to use basketball data, but fails to show any outliers within the bracket.

Bracket is based on an extremely smooth model with lots of z-scores based on Oliver’s four factors. It was 69.8% accurate based on the last three years of NCAA tournament games.

When looking at the third bracket (Figure 3), it produces an okay final four. But, it produces almost no upsets, which does not realistically happen.

Bracket is based on a hyper-smooth model of an Oliver four-factor like model. Z-score everything. About 72% accurate.

When looking at the fourth and final bracket (Figure 4), the results show a model that is interesting. It produced results that had upsets, but not to many, and had a realistic and interesting final four.

Bracket D has no basketball data. It is just wins, losses, and strength of schedule. About 72% accurate.

Image for post
Image for post

Figure 1. Green Final Four — As you can see this bracket is wild, the final four is a 1, 16, 2, and 15.

Image for post
Image for post

Figure 2. Blue Final Four — (1 Gonzaga def 2 Kansas; 2 Duke def 1 Baylor). Final, Gonzaga def Duke. Aside from a 1:16 upset a very conservative bracket ending in a 1211 final four.

Image for post
Image for post

Figure 3. Red Final Four — 2 Kansas def 1 Gonzaga; 1 Baylor def 1 Villanova. Final, Kansas def Baylor. While this bracket doesn’t include any huge swings. It includes an early, but not first round exit for Virginia.

Image for post
Image for post

Figure 4. Orange Final Four — This bracket has no basketball information, just wins/losses and SOS for each team. This is a satisfying bracket, with plenty of well-chosen upsets and an interesting, but not overly provocative final four, despite calling a 1:16 upset.

A few interesting questions:

  1. How do you know if the industry standard is coherent?

There are large swings produced by individual random selections caused by the sampling process. There was one simulation, once, where Oregon lost a 5–12 upset. Much like the Virginia problem, it is clear that in a world with major outliers and only 300 base data-points the selection of training materials should be regarded as political and important.

The 80/20 standard is used regularly in this field, but there isn’t a great reason why it shouldn’t be 70/30 or even 50/50 to really validate. 80 is round and large enough that we hope we could get every fifth case right. We generally accepted that a 70%+ success rate on predicting classes for the 20 was acceptable. If you train on 16 games then pick 4 more, going three for four on the next games seems good. This is a binary classification problem that is known to have odd results (a bunch of upsets every year). Overfitting is lurking in the shadows if you fit much tighter.

2. Are the more conservative brackets better?

Generally, professional pikers end up with a consistent pattern with two low integers and two 1 seeds. Teams with higher seeds tend but it is not common for two or more 1 seeds to make the final. This does not fit with reality.

The most recent 11 Final Four seeding patterns












These good brackets that involve multiple 1 seeds have little basis in reality. We have seen two “1 seeds” once and three once and zero once. Your best bet is one of them. Brackets that include multiple aren’t just boring, they are wrong.

People would reject the green bracket (figure 1) out of hand, but it is clear in many models that Virginia was going to lose that game. SFA was a perfect trap for Baylor in Dallas as well, as was also predicted in some brackets or Winthrop for Wisconsin. We notice that the conservative outcome is itself a fiction. Or as legendary coach Herm Edwards argued once in a press conference: this is why we play the games.

Take away

  1. Formulas matter, this isn’t a black box. The representation of machine learning as a black box that produces answers is itself political. We pulled back the curtain and showed that the outcomes are fairly easily manipulated. Deep learning can be pretty shallow.

Associate Professor of Social Media. Oregon State University. Read my book: Selling Social Media (Bloomsbury Academic), 2018.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store