March Madness First Round Upsets

March Madness is almost upon us! Are you ready to fill out your brackets?

It’s hard for your bracket to do well if your picks lose in the first round. Although some approaches to bracket selection start with the Final Four and work backwards, we’ll start with the first round. How many upsets should you pick in the first round to maximize your chance of success? Read on to find out!

March is one of my favorite months, for many reasons. One of them is filling out NCAA tournament brackets with my kids.

In filling out a bracket, there’s always some tension between picking the teams you want to win, versus analyzing the matchups dispassionately. There’s also the challenge of competing against the crowd. If your picks are similar to everyone else’s picks, your bracket’s performance is likely to be average. You want to stand out from the crowd, but you need to pick upsets intelligently to maximize your chance of success.

This post will examine the historical data about first round upsets, and give some high-level suggestions about how many upsets to pick. We’ll look at this topic in more detail in the next few posts, prior to the start of the Big Dance.

Odds of Picking a Perfect Bracket

Let’s get one thing out of the way. You’re not going to pick a perfect bracket.

The odds of picking a perfect bracket are really, really low.

Not counting the play-in games, there are a total of 63 games played in the tournament. There are 32 games in the first round, 16 in the second, 8 in the third, 4 in the regional finals, 2 in the Final Four, and then the National Championship game.

That means there are 2^{63} = 9,223,372,036,854,775,808 possible brackets not including the play-in games. In case you’re wondering how to say that big number, it’s approximately 9.2 quintillion.

If each game were a 50-50 tossup, then the odds of getting a correct bracket would be 1 out of 9.2 quintillion. Of course, the games are not tossups, since the seeding is an attempt to use the teams’ records and expert judgment to rank the teams. The whole point of the seeding is to make it more likely that the better teams will survive to later in the tournament.

According to DePaul University Professor Jeff Bergen, if you factor in the likelihood of higher seeds advancing, the odds of picking a perfect bracket are more like 1 in 128 billion.

Skill and Luck in Bracket Selection

Clearly, randomness (also called luck) is going to play a big part in how your bracket does. How then should you make decisions in filling out your bracket?

There’s always a temptation to go with gut instinct in making emotional decisions, including bracket selections. But if you want to make better quality decisions, you need to use the scientific method wherever possible as a guide.

It’s called March Madness, but it’s important to have a method to it (with apologies to Shakespeare).

The right way to make decisions when randomness is involved is to use probability and statistics. You can look at the problem in two complementary ways: either maximize your expected outcome or minimize your risk of having a bad outcome. You can trade off these two approaches to arrive at a decision that feels best based upon your personal preferences.

The scientific approach isn’t necessarily incompatible with “gut instinct”. If you have a strong opinion about a pick, try to identify as precisely as possible why you feel that way. Maybe it’s based on your views about how a particular player on one team will match up defensively against the leading scorer of the opposing team. Then, see if you can find data on how analogous situations have played out in the past. This will force you to articulate your opinion and look for data that either supports or refutes it, which is really what the scientific approach is all about.

Using statistics doesn’t guarantee your outcome will always work out for the best. Sometimes gut instinct, guesswork and raw luck will, just by sheer chance, result in a better outcome than the scientific approach.

On the other hand, if your objective is to make the best possible decision based upon what you can know in advance, then statistics is the only way to go. Over the longer term, your decisions on average will do better than gut instinct and guesswork.

Statsketball Tournament 2018

Similar to last year, This is Statistics and the American Statistical Association are challenging high school students and college undergraduates to use statistics to make better NCAA bracket selections. You can read about the challenge here. This year, the deadline to enter the challenge is Wednesday, March 14 at noon Eastern Time.

You can read about last year’s Skatsketball winners here. This Wall Street Journal article [paywall] also described last year’s challenge, and the student winners.

One of the two contests is the Upset Challenge. In this challenge, you need to pick 32 winners of the first round games. Each correct pick receives 2 points, but correctly picked upsets receive bonus points based on the seed of the favored team.

As we will see in upcoming posts, the bonus points have a significant impact on how many upsets you should pick in the first round.

Getting Historical Data on First Round Games

There are many sources of historical NCAA tournament games. I obtained my data from the Washington Post’s NCAA Men’s Basketball Tournament History site. The site has game results going back to 1985, when the current tournament format was adopted. In particular, to get all of the first round games, you can use this search.

You can simply copy and paste the table into a spreadsheet, which is what I did. I then saved the spreadsheet as a CSV file for further analysis in Python.

The data set consists of 1056 games, with 33 years of tournament history and 32 games per year in the first round.

First Round Upsets

This table shows the historical record and win percentage by seed in the first round of all NCAA men’s tournaments going from 1985 to 2017. Remember, there are 4 regions, with 16 seeds in each region.

SeedWins-LossesWin Fraction
1132-01.000000
2124-80.939394
3111-210.840909
4106-260.803030
585-470.643939
683-490.628788
781-510.613636
867-650.507576
965-670.49242
1051-810.386364
1149-830.371212
1247-850.356061
1326-1060.196970
1421-1110.159091
158-1240.060606
160-1320.000000

First and Sixteenth Seeds

As you can see, no first seed has ever lost in the first round. Therefore, we are going to exclude the 1-16 matchup from further analysis.

Of course, if the NCAA Tournament continues with the same format for another 30, 50 or 100 years, eventually a sixteenth seed will advance. But unless you think the tournament committee has made a huge mistake in the seeding, don’t pick the sixteenth seed to advance.

Here’s a plot of the historical win percentage by seed, looking only at seeds 2 through 15.

NCAA Tournament First Round WIn Percentage by Seed

Eight and Ninth Seeds

At the other end of the spectrum, notice that the eight and ninth seed first round win frequencies are both roughly 50%.

This means that the 8-9 matchups look like coin tosses in the historical data, with only a slight bias in favor of the eighth seed.

Upsets per Year

Let’s define an upset as anytime the higher seed loses. Of course, it’s not really correct to view a ninth seed victory as an upset, since those games historically look like toss-ups.

The previous table shows the total number of upsets over the 33-year historical period, grouped by seed. Let’s look at the historical data grouped by year instead.

Count33
Mean8.09
Std Dev2.45
Min3
25%7
Median8
75%10
Max13

So, on average there are roughly 8 upsets per year (again, including the 8-9 matchup results in the upset category). However, what we really want is the distribution of upsets by seed within a given year.

Seed 2 3 4 5 6 7 8
count 33.000000 33.000000 33.000000 33.000000 33.000000 33.000000 33.000000
mean 0.242424 0.636364 0.787879 1.424242 1.484848 1.545455 1.969697
std 0.501890 0.652791 0.599874 0.867118 1.003781 0.904534 1.185455
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 1.000000 1.000000 1.000000 1.000000
50% 0.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2.000000
75% 0.000000 1.000000 1.000000 2.000000 2.000000 2.000000 3.000000
max 2.000000 2.000000 2.000000 3.000000 4.000000 4.000000 4.000000

Since a picture is worth a thousand words, let’s visualize the number of first round upsets by seed, by year.

First Round NCAA Tournament Upsets

The chart makes it easy to understand the overall variability of upsets from year to year, both in terms of the overall number and by seed.

We can also summarize the above data in a small table.

SeedUpsets per Year
01234
2266100
31515300
41020300
54151040
65131041
73141231
8310974

This frequency table will be useful for making specific decisions about how many first round upsets to pick.

How Many Upsets to Pick?

In summary, here are some preliminary conclusions based upon the above data and analysis. We’ll try to develop a more precise framework in a later post.

Second Seed Upsets

Although a second seed does occasionally lose in the first round (and in 2012, two second seeds lost), it’s rare. Don’t pick the fifteenth seed to advance, unless you have a very strong view that the tournament committee has made a mistake in either the second or the fifteenth seeds. If you do, however, the historical data say you’re unlikely to be correct.

Third and Fourth Seed Upsets

Things are more promising for calling upsets in the third and fourth seed games. On average, there is at least one upset among these 8 first round games in a given year. There is often one upset in each of the third and fourth seed games, but there are rarely more than 2 upsets overall in these 8 games.

The data suggest that you should pick an upset among the third and fourth seeds as a group. You should also try hard to identify another upset, for the seed which you didn’t pick in the first upset. In other words, if you already picked a third seed upset, try to pick a fourth seed upset, and vice versa.

Fifth Seed Upsets

Most tournaments have featured at least 1 fifth seed upset in the first round, and many years have two. You should definitely look to pick one upset. It’s probably reasonable to pick a second upset in this category if you have strong views about the matchup.

Sixth and Seventh Seed Upsets

The data tell a similar story for the sixth and seventh seed games. You should try to pick one sixth seed and one seventh seed upset in the first round. If you have strong views about particular matchups, it’s reasonable to look for additional potential upsets. However, you should keep in mind that the overall number of upsets (excluding the 8-9 games) rarely exceeds 8 in a given year.

In summary, among the second through seventh seeds, the historical data suggest you should aim to pick 5 or 6 upsets, and venture beyond that only if you have high conviction about a few additional games.

The 8-9 Matchups

As mentioned above, a ninth seed victory isn’t really an upset. There are years where all the eighth seeds advance, and years where all the ninth seeds advance.

For regular bracket selection, your goal is just to get as many teams as possible from your bracket into the second round. For that purpose, you should analyze the 8-9 games strictly on the merits of the matchups. In contrast, the Statsketball Upset Challenge awards bonus points for correctly picking an upset, defined as the lower seed beating the higher seed. With the possibility of bonus points, you have somewhat greater incentive to pick the ninth seed, even if the game is a true toss-up. We’ll study the impact of Upset Challenge bonus points in a future post.

How To Do This Yourself

If you’d like to see how I analyzed the NCAA historical data and got the results in this post, see this Jupyter notebook with the Python code and all the details. Feel free to download the notebook to your computer to run the code yourself, or modify it to run your own analysis.

In upcoming posts, we will incorporate other historical data, such as conference, coach experience and other team factors to improve our analysis. Try to think of other ways to incorporate additional historical data to improve your upset predictions.

about contact pp tos