Showing posts from May, 2024

Simple PWHL xG Model

  Introduction This document describes an expected goals model for the PWHL. Important features: the model is a Multivariate Adaptive Regression Splines model (a “MARS model”); and the variables used to predict goals are shot distance, shot angle, and shot rebound status. That’s not many variables, obviously. There isn’t much PWHL data available (72 regular season games) so I used only the most important variables for an xG model. Data for the model were pulled from the PWHL’s API (using functions that I posted on  GitHub ). There are anomalies and errors in the data. The steps I took to “fix” the data are set out in painful detail below. Most people will have no interest in those details and can skip them. However, if you plan to use data from the PWHL’s API then you must be aware of the issues with the “raw” data. Basic Setup Load the packages and the raw play-by-play data (this assumes the data are saved in the working directory). #install.packages("tidymodels") #install.