Welcome!

Welcome to Practically Predictable. If you want to learn sports analytics, this site is meant for you!

Sports analytics has come a long way from the publication of Moneyball in 2003 (and the film’s 2011 release) to the recent World Series. Although analytics first gained widespread attention in baseball, it has also become important in basketball, the NFL, English Premier League football, tennis and many other sports. The increasing use of analytics has changed sports and sports journalism.

This site will teach you how to:

  • access the huge amount of sports data available on the internet;
  • create charts and graphs to get insight and tell interesting stories about the data; and
  • make useful sports predictions supported by the data.

Whether your goal is to improve your fantasy roster, pick better NCAA March Madness brackets, or just learn about the exciting field of sports analytics, I hope you find this site helpful and informative.

Why this Site?

There’s already an enormous amount of sports analytics content out there. Why did I decide to start this site, and how does it aim to be useful and different?

This site focuses on teaching the practical steps needed to create sports analytics. Some sites have great sports journalism, but aren’t in the business of teaching you how the site’s proprietary analytics work.

As an example, consider ESPN’s Basketball Power Index (BPI). We are now about 20% through the 2017-18 NBA season, and according to ESPN, the Celtics currently have around a 16% chance to win the NBA Finals, while the Warriors have around a 62% chance. How did ESPN compute these probabilities? According to ESPN, BPI depends mainly upon offensive and defensive ratings, computed from historical game scores. These scores are adjusted, however, for:

  • strength of opponent;
  • team pace;
  • game location (home or away);
  • rest and distance travelled between games; and
  • some other miscellaneous factors.

ESPN’s analysts have computer code which calculates BPI for every NBA team after every game. Then, based upon each team’s BPI and the remaining NBA schedule, ESPN projects win probabilities by simulating 10,000 NBA season outcomes.

It’s nice that ESPN tells us which factors go into computing BPI. One issue, however, is that we don’t have any real insight about how these factors interact. Which of the adjustment factors are most important? Unfortunately, we can’t easily figure out how much a change in one factor impacts the projections, even if the direction of the change seems clear.

Since ESPN hasn’t shown us the detailed rules it uses to construct BPI, we can’t replicate its results. Finally, other than this, I couldn’t find any evidence from ESPN about the quality of NBA predictions based upon BPI.

Create Your Own Sports Analytics

Here’s the exciting part. If you know how to get the necessary data, how to code, and a certain amount of probability and statistics, you can create your own sports analytics.

Most of the data needed to calculate something similar to ESPN’S BPI can be accessed, for free, using web scraping. This site will teach you how to access basketball and other sports data.

Being able to create your own analytics is great for several reasons:

  • Learn by doing: The best way to truly understand which analytics are useful is to play with the data and statistics yourself.
  • Understand uncertainty: A lot of sports predictions out there don’t have any context to let you know how confident you should be about the prediction. For instance, if the Celtics beat the Warriors in the NBA Finals, was ESPN’s BPI wrong, or was it just luck? If you do your own analysis, you can estimate how uncertain your predictions are and hopefully make more informed decisions.
  • Deepen your understanding of the game: Whatever sport you study, looking at data and crunching the numbers yourself will add to your insight. Successful analytics are grounded in a solid understanding of the game and its rules, but learning analytics can also help improve your knowledge of the game.
  • Be creative: Maybe you will discover something original!

The discussion of ESPN’s BPI above was not meant to pick on ESPN or BPI specifically. It’s just a well-known and useful example. We will explore a number of commonly-used analytics across various sports and learn how we can build our own versions using publicly available data.

But there is another reason I want to help you learn sports analytics.

A Fun Way to Learn Probability, Statistics and Machine Learning

Sports analytics is a great setting for people to learn probability, statistics and machine learning in a fun and practical way. Probability and statistics show up naturally in sports when you try to understand the impact of skill versus luck. Sports analytics also involves data science and machine learning methods to obtain, visualize and analyze data, and to make predictions. All of these steps also require coding skills to process large amounts of sports data and to tie together the various parts of the analytics.

The mathematics level on this site will generally be high school algebra and pre-calculus. (Sometimes we will link to more advanced mathematical content if you want to explore further.) The great thing about using computers to learn probability and statistics is you can use simulation, rather than advanced math, to understand the main results.

One of my main motivations for starting this site was to inspire my own middle- and high school aged children to take their math and coding knowledge to the next level. I hope that this site also kindles your interest in math and coding, and that it helps you to learn more about them in a fun and practical way.

Python

The code examples on this site are in Python. This site is not going to teach you Python from scratch. Rather, we will focus on showing you how to use Python to do useful and cool things. In upcoming posts, we will recommend some ways to install the right version of Python on your computer and learn the language if you aren’t already familiar with it.

Python is a great choice for learning sports analytics. First, Python is easy for people to write and to read. Second, you can write useful programs with relatively few lines of code. It’s no accident that Python has recently become the most popular introductory programming language at many top U.S. universities. Here is a great post from a professor at George Washington University about why she uses Python to teach her undergraduate students the computational skills to be successful students and future STEM professionals.

Python is also very popular among researchers and professionals. Companies such as Google, Facebook, Instagram, YouTube, Disney, Pixar, IBM and Dropbox all use Python. According to the Institute of Electrical and Electronics Engineers (IEEE), Python is now the top-ranked programming language. Also, a recent survey of professional data scientists by Kaggle (a subsidiary of Google) ranked Python as the most commonly-used tool overall.

Another reason for Python’s popularity is its enormous ecosystem of libraries. With a few simple commands, you can import powerful tools for data analysis, visualization and web scraping. Also, Python has hundreds of libraries for specialized applications, such as game development, cryptography, computational biology and astrophysics. If you learn Python, you will not outgrow it.

Time learning Python is time well spent.

Going Forward

You may have noticed that I didn’t mention yet which sports we will cover. I am personally most interested in basketball, baseball and tennis. Baseball and basketball both have a relatively long history of analytics, and a lot of available data. This makes them good candidates to start learning analytics. We will include tennis and other sports over time. I hope to include American football and English Premier League football down the road.

There are a few great academic and professional sports analytics sites that assume more advanced math than this site. A number of those sites also have code written in the specialized and powerful R statistics programming language. Occasionally, we will link to interesting advanced articles and teach the content at the appropriate mathematical level using Python.

Another interest I have is board games and card games. Games of chance are a fun way to learn the basics of probability. We will write code to model dice rolls and card shuffles and learn about probability. Games are also useful for learning about how to analyze strategy. Strategy is also relevant in sports, of course. Coaches and players can make choices to adapt to events in the game. We will start in upcoming posts to think about probability and strategy in the popular board game of Risk.

This site will include occasional technical guides. A number of the upcoming posts of this type cover topics such as setting up Python on your computer. These posts will be somewhat long and are intended for beginners. I hope these posts help you get started if you’re not familiar with coding in Python.

I hope you enjoy learning about sports analytics and these other topics with me. Please leave a comment below or contact me if you have interests in particular topics, suggestions or questions!

about contact pp tos