Python project trying to facilitate and being a starting point for soccer analytics projects.

CleKraus CleKraus Last update: Apr 26, 2024

Soccer analytics

soccer_analytics is a Python project trying to facilitate and being a starting point for analytics projects in soccer.

  • Extensive number of helper functions for visualization and animation of soccer events
  • Calculation of relevant soccer KPIs for event data and tracking data
  • Pre-proccessed wyscout event data and Metrica tracking data allows you to dive into the analyses immediately
  • Detailed tutorials in form of notebooks that help you get started with this project and soccer analytics in general
  • Thought of as a starting point for projects rather than a "hidden" library
  • Set up in a way so that functions are easily extendable
  • All plots and animations are created with plotly and therefore easily integretable into dash dashboards
  • Supports python 3.6 - 3.8

Tutorial

This projects includes a number of notebooks that serve as tutorial on how to use the helper functions and might be a good starting point into soccer analytics in general. The notebooks can be found here and I recommend to go through them in the following order:

  1. Exploratory analysis event data: This notebook gives you an overview over the pre-processed wyscout data and runs rudimentary exploratory analysis using pandas-profiling

  2. Goal kick analysis: In this notebook we identify the best teams w.r.t goal kicks in the Bundesliga. On the way we learn how to

    • Use bar plots in plotly
    • Visualize events on a soccer field through graphs and animations
    • Draw heatmaps on a soccer field
  3. Passing analysis: We continue our journey by looking at passes between players and analyze one match in more detail. Technically, we learn how to use the helper function to:

    • Compute statistics efficiently
    • Draw position plots of players
    • Visualize passing lines and passing zones
  4. Expected goal model with logistic regression: While in the previous notebooks it was mostly about visualization, in this notebook we start looking into machine learning. We jointly build an expected goal model using logistic regression and learn about fundamentals of machine learning, e.g.:

    • Feature engineering
    • Multivariate analysis
    • Metrics
    • Model interpretation
  5. Challenges using gradient boosters: In this rather technical notebook we are going to look into some of the challenges that often arise in real-life situations when using gradient boosters such as lightGBM or XGBoost, such as:

    • Overfitting
    • Feature interpretation
    • Monotonicity
    • Extrapolation
  6. Introduction to tracking data: In this notebook we are going to start looking into tracking data provided by Metrica sports. We learn about the fundamentals of working with tracking data such as:

    • Visualizing tracking data in animations
    • Computing basic statistics based on tracking data
    • Adding helper tools to highlight certain aspects in animations
    • Deep-dive into packing
  7. Passing probability model: In this notebook we look at a pass probability model as proposed by Peralta et al. and Spearman et al. (see papers below) and use it for a first use case. More precisely this notebook includes:

    • Deep dive into the passing probability model to get a feeling on how the model works
    • Usage of the pass probability model to try to distinguish ground passes from air passes

Examples

Tracking data visualization

Event visualisation

Heatmap

Passing map

Polar charts

Installation

If you are new to Python and soccer analytics, I would recommend you to download the Anaconda distribution and follow the instructions under Conda.

Conda

  1. Open the Anaconda Prompt and cd to the project folder
  2. Create a new conda environment "soccer_analytics"
    conda create -n soccer_analytics python=3.7
  3. Activate the conda environment
    conda activate soccer_analytics
  4. Install all required packages
    pip install -r requirements.txt

Acknowledgements

Data sources

Event data: Wyscout

Tracking data: Metrica

Code

Great repositority on pitch control and pitch impact by Laurie Shaw here

Paper

Physics-Based Modeling of Pass Probabilities in Soccer by Spearman et al. here

Seeing in to the future: using self-propelled particle models to aid player decision-making in soccer by Peralta et al. here

Subscribe to our newsletter