In this guide, I will show you how to set up a powerful Python environment to do sports analytics on your computer. If you follow these steps, you will have all the libraries you need to do web scraping, data analysis and visualization. Future posts on this site will assume that you have installed the necessary libraries.
I’ve written this post for Python beginners. If you have some experience with Python, have it working on your computer already, and know how to install the necessary libraries, feel free to keep doing what works for you.
Introduction
Depending on which computer you have, there are several ways to set up Python. In fact, Mac and Linux machines come with a version of Python pre-installed. If you want to use Python for web scraping, analytics and visualization, however, you need something more robust than the bare-bones Python that comes with your computer.
One of the most powerful aspects of Python is the huge number of available libraries. The tricky aspect in setting up Python is to make sure that you’ve installed all the libraries you need, and that the software versions are correct so everything works together.
The Python language is always evolving. Libraries also evolve over time, so there is no guarantee that the latest version of a library will work with the latest version of Python or other libraries you might want to use.
Use Python 3
You’ll notice that I referred to different versions of Python. There are two major flavors of Python in existence: Python 2 and Python 3. This site is only going to use Python 3.
Python 2.7 (the latest and last version of Python 2), will not be maintained after 2020. So, if you’re new to Python, it doesn’t make sense to learn Python 2 at this point.
Python 2 has been around a long time, and a lot of software was written using it. At this point, most libraries have versions which work with either Python 2 or Python 3. However, some libraries only work with one version of Python or the other. It would be a nightmare if you had to check each library version for the correct Python version.
The Anaconda Python Distribution
Fortunately, the people at Anaconda, Inc. have a convenient solution that’s become very popular.
Anaconda is a free distribution of Python that includes hundreds of the most popular libraries. So, if you want to install a particular version of Python with certain libraries, you can let Anaconda create an environment for you with all the things you need ready to go. Anaconda packages all the knowledge of which library versions work with which Python versions.
Anaconda also makes it easy to set up multiple Python environments on one computer. This is important, since different projects will often require different libraries. For example, a video game needs very different libraries compared to a sports analytics project. If each project has its own environment, you can safely add libraries to one project without breaking any other projects.
It would also be a pain if you had to update all of your projects each time some library or Python version changed. With each project in its own environment, you can leave old projects running prior versions of the software, and new Python or library versions won’t break those existing projects. Although there are a number of tools out there which provide this sort of Python environment control, Anaconda is one of the most popular and easiest to use.
Benefits of Using Anaconda
First, Anaconda works on the three major personal computing platforms: Mac, Windows and Linux.
Also, Anaconda works with the R language, a specialized statistical programming language. R is widely used in data science, including sports analytics. With Anaconda, you can work with Python and R on one consistent platform.
Since Anaconda has a very large user community, you can easily find tutorials or answers to technical questions by searching online, for instance at StackOverflow.
Fortunately, the basic Anaconda distribution is free and open source. (Anaconda is a freemium product, and Anaconda, Inc. offers related commercial products we will not use for this site.)
The remainder of this post will show you how to install Anaconda, create a Python environment and install the libraries we’re going to use for sports analytics.
Installing Anaconda
There are separate Anaconda installation instructions available for Mac, Linux and Windows.
Download the Python 3 Graphical Installer and start the installation process. If you are asked during the process if you want to make Anaconda your default Python installation, select yes. Choose the default response if any other dialog boxes appear.
Creating a Python Environment
Anaconda comes with a graphical user interface called Navigator. If you are comfortable using the command line on your computer (e.g., Terminal on Mac or Linux), it’s very easy to do everything from the command line. You can find an introduction to using Anaconda’s conda
command line tool here, and there are many tutorials online if you search the internet. For this post, we will focus on Navigator for people who aren’t as comfortable with the command line.
Open the Anaconda Navigator application and select Environments in the left-hand sidebar. The window should look similar to this:
Let’s create a brand new environment for Python 3. Click Create at the bottom of the window. The following dialog box will appear:
Choose a name for your new environment, make sure Python 3.6 is selected and click Create.
For sports analytics coding, you could choose a name like “sports_py36” to help you remember the purpose of the environment and the major Python version. Over time, as you create a lot of environments for different projects, it’s helpful to have them clearly named. Anaconda also makes it easy to clone environments, so when Python 3.7 is released, you can clone your Python 3.6 environment to create “sports_py37”, upgrade Python to version 3.7 in the new environment, and off you go.
Installing Packages
After you create the new environment, it will only have a few basic packages installed, which appear in the right pane of the window as shown below. One of those packages should be “python”, which is the Python interpreter. As of the time of this post, Anaconda installs Python 3.6.3.
To install packages, set the drop-down menu at the top-center to Not Installed. Now select “anaconda”. The Anaconda Navigator window should look similar to the screen shot below.
Click Apply. The Install Packages dialog box shown below will pop up with a list of more than 200 packages (libraries) to install.
Select Apply to proceed with the installation.
The package installation will take a while. After the installation finishes, make sure Not Installed is still selected, and choose the following 11 additional packages:
arrow
basemap
colorcet
csvkit
humanize
line_profiler
mypy
nb_conda
pandas-datareader
scrapy
tqdm
You can select the above packages more quickly by using the search box in the upper right of the window. Just type the first few letters of the package you want, then click the check box to the left of the package name. After you’ve selected the packages, you can sort the table by the check box (click in the table header about the check boxes). The Navigator window should look similar to the following screen shot.
Click Apply, and then select Apply again in the Install Packages dialog box that pops up again to confirm you want to install these additional packages and their many dependencies.
You can now set the drop-down menu to Installed to see the full list of packages in your new Python 3 environment.
That’s it! You now have a Python environment ready to do sports analytics using the same tools that the professionals use. In upcoming technical guides, we’ll show you how to create and run code in this environment.