WHKYHAC: D-Zone xContributions Viz

 

The Viz Launchpad Competition

WHKYHAC and Sportlogiq announced a visualization competition using data from the 2023 Season of the Professional Women’s Hockey Players Association. The entry deadline for the competition is July 2, 2023. Prior to that deadline I’ll post some “trial” data visualizations here.

Today’s Data Viz

OK, it’s time to start taking advantage of the detailed play-by-play data supplied by Sportlogiq. This viz introduces something I’m calling “expected contributions” (or xContributions) to team goals from events that occur in the D-Zone. It piggybacks off the familiar xGoal concept, as explained below.

Expected Contributions To Team Goals From The D-Zone

I believe most xGoal models compute the probability of a shot attempt turning into a goal based on how frequently similar shot attempts turned into goals in the past. The xContributions to team goals does something similar for events that occurred in the D-Zone.

For this viz I computed the proportion of successful D-Zone events that occurred within 20 seconds of the team scoring a goal at the other end of the ice. In effect, I assigned different values to successful D-Zone events that can send the puck to the other end of the ice and, potentially, into the opponent’s net. There were 21 different types of D-Zone events included in that calculation. It makes sense intuitively that a successful D-Zone block is less likely to result in a goal-for than, say, a successful controlled zone exit. The xContributions metric attempts to put a value on that difference.

Huge Caveat: The amount of data available from the 2023 Season (40 games) is not nearly enough to build this type of model. Think of this as more of a “concept piece” rather than something that’s reliable.

I’ve included all the code below so you can dig into the details if you’re interested.

Let’s look at some of the results and get to the viz. Here are the Top 25 single game scores from the 2023 Season.

PlayerxContributionGame
Micah Zandee-Hart0.8410SCO-SON 2023-03-04
Megan Keller0.8318ADI-SCO 2023-03-10
Laura Fortino0.7482HAR-SON 2023-03-05
Jincy Dunne0.7377ADI-SON 2022-11-06
Jincy Dunne0.7368ADI-SCO 2022-11-05
Micah Zandee-Hart0.7293ADI-SON 2023-02-26
Lee Stecklein0.7052HAR-SCO 2023-03-12
Megan Keller0.6978ADI-SCO 2022-12-09
Jincy Dunne0.6923ADI-SON 2022-12-10
Megan Keller0.6913HAR-SCO 2023-02-10
Renata Fast0.6909ADI-SON 2023-02-24
Megan Keller0.6812HAR-SCO 2023-03-12
Lee Stecklein0.6723ADI-HAR 2023-03-04
Laura Fortino0.6714HAR-SON 2022-11-05
Savannah Harmon0.6706HAR-SON 2022-10-16
Jocelyne Larocque0.6688ADI-SCO 2022-11-05
Claire Thompson0.6624ADI-SON 2023-02-26
Jincy Dunne0.6571ADI-SON 2023-02-26
Savannah Harmon0.6549HAR-SCO 2023-03-12
Laura Fortino0.6540ADI-HAR 2023-03-04
Ella Shelton0.6523ADI-SCO 2023-03-05
Emily Brown0.6481SCO-SON 2022-10-15
Emily Brown0.6294ADI-SON 2022-11-06
Megan Keller0.6266SCO-SON 2023-01-22
Jessica DiGirolamo0.6247ADI-SON 2023-02-26

You can see that some players appear on the list multiple times. Here’s a breakdown of the number of Top 25 appearances.

PlayerTop 25 Finishes
Megan Keller5
Jincy Dunne4
Laura Fortino3
Emily Brown2
Lee Stecklein2
Micah Zandee-Hart2
Savannah Harmon2
Claire Thompson1
Ella Shelton1
Jessica DiGirolamo1
Jocelyne Larocque1
Renata Fast1

The model clearly likes what Megan Keller was doing in the D-Zone.

Now here’s the viz. It shows a player’s cumulative xContributions over the course of a single game. I selected the game in which Micah Zandee-Hart achieved the highest xContribiutions score in the 2023 Season.

The Code

Here’s the code for this data viz.

# SETUP ########################################################################

setwd("~/18_skaters/r_studio/whkyhac")

library(tidyverse)
library(stringr)
library(lubridate)
library(knitr)
library(kableExtra)

# LOAD DATA ####################################################################

raw_pbp_data <- read_csv("23_PBP_WHKYHAC_SPORTLOGIQ.csv", 
                         locale = locale(encoding = "ISO-8859-1"))

# EXPLORE DATA #################################################################

#print(str(raw_pbp_data))

# Players

player_names <- unique(raw_pbp_data$player)

# Events

event_names <- unique(raw_pbp_data$eventname)

event_outcomes <- unique(raw_pbp_data$outcome)

event_types <- unique(raw_pbp_data$type)

# Strength states

strength_states <- unique(raw_pbp_data$strengthstate)

# Plot successful / failed passes and dumps

plot_data_scep <- filter(raw_pbp_data, 
                         eventname == "controlledexit", 
                         type == "pass", 
                         outcome == "successful")
plot_scep <- ggplot(plot_data_scep) + 
        geom_point(aes(xadjcoord, 
                       yadjcoord)) + 
        theme_minimal()
#plot_scep # SUCCESSFUL CONTROLLED EXIT PASSES ARE LOCATED OUTSIDE D-ZONE

plot_data_fcep <- filter(raw_pbp_data, 
                         eventname == "controlledexit", 
                         type == "pass", 
                         outcome == "failed")
plot_fcep <- ggplot(plot_data_fcep) + 
        geom_point(aes(xadjcoord, yadjcoord)) + 
        theme_minimal()
#plot_fcep

plot_data_spo <- filter(raw_pbp_data, 
                        eventname == "pass", 
                        type == "outlet", 
                        outcome == "successful")
plot_spo <- ggplot(plot_data_spo) + 
        geom_point(aes(xadjcoord, yadjcoord)) + 
        theme_minimal()
#plot_spo

plot_data_spoob <- filter(raw_pbp_data, 
                          eventname == "pass", 
                          type == "outletoffboards", 
                          outcome == "successful")
plot_spoob <- ggplot(plot_data_spoob) + 
        geom_point(aes(xadjcoord, yadjcoord)) + 
        theme_minimal()
#plot_spoob

plot_data_sdi <- filter(raw_pbp_data, 
                        eventname == "dumpout", 
                        type == "ice", 
                        outcome == "successful")
plot_sdi <- ggplot(plot_data_sdi) + 
        geom_point(aes(xadjcoord, yadjcoord)) + 
        theme_minimal()
#plot_sdi

# CLEAN AND MANIPULATE DATA (AREAS OF INTEREST ONLY) ###########################

clean_pbp_data <- raw_pbp_data

# Fix name for Kristin O’Neill

clean_pbp_data$player <- str_replace_all(clean_pbp_data$player, "\\031", "’")

# Add game_id

clean_pbp_data$game_id <- paste(clean_pbp_data$game, clean_pbp_data$date)

# Add event_id

clean_pbp_data$event_id <- seq(1:length(clean_pbp_data$seasonstage))

# Add d-zone logical variable

clean_pbp_data <- mutate(clean_pbp_data, 
                         d_zone = ifelse(
                                 xadjcoord < -25, 
                                 TRUE, 
                                 FALSE))

# Add empty_net to opposing_goalie variable

clean_pbp_data$opposing_goalie <- ifelse(is.na(clean_pbp_data$opposing_goalie), "empty_net", clean_pbp_data$opposing_goalie)

# Tidy up (based on needs for this viz)

clean_pbp_data <- select(clean_pbp_data, c(30:31,1:29,32))
clean_pbp_data <- select(clean_pbp_data, -seasonstage,
                         -date,
                         -game,
                         -ishomegame,
                         -scoredifferential,
                         -teamskatersonicecount,
                         -opposingteamskatersonicecount,
                         -goalie,
                         -createsrebound,
                         -onetimer,
                         -shotaim,
                         -shottype)

# FUNCTIONS ####################################################################

get_event_types <- function(event_name) {
        
        working_data <- clean_pbp_data %>%
                filter(d_zone == TRUE,
                       playerprimaryposition == "D",
                       strengthstate == "evenStrength",
                       period < 4,
                       opposing_goalie != "empty_net",
                       eventname == event_name) %>%
                group_by(type) %>%
                summarise (sum_event_type = n())
        
        names(working_data)[1] <- "event_type"
        
        working_data$event_name <- event_name
        
        working_data <- select(working_data, c(3,1,2))
        
        return(working_data)
}

tag_successful_target_events <- function(event_name, event_type) {
        
        working_data <- clean_pbp_data %>%
                filter(d_zone == ifelse(
                        event_name == "controlledexit" & event_type == "pass",
                        FALSE, 
                        TRUE),
                       playerprimaryposition == "D",
                       strengthstate == "evenStrength",
                       period < 4,
                       opposing_goalie != "empty_net",
                       eventname == event_name,
                       type == event_type,
                       outcome == "successful")
        
        working_data$successful_target_event_id <- paste(event_name, event_type)
        
        return(working_data)
}

# IDENTIFY D-ZONE EVENTS OF INTEREST ###########################################

# Filtered for: even-strength; regulation time; not shooting on an empty net
# Only skaters who play at the defense position are included
# Only successful events are included
# No distinction between regular season and playoffs

d_zone_lpr_event_types <- get_event_types("lpr")

d_zone_pass_event_types <- get_event_types("pass")

d_zone_reception_event_types <- get_event_types("reception")

d_zone_puck_protection_types <- get_event_types("puckprotection")

d_zone_block_types <- get_event_types("block")

d_zone_check_event_types <- get_event_types("check")

d_zone_dumpout_event_types <- get_event_types("dumpout")

d_zone_controlled_exit_event_types <- get_event_types("controlledexit")

# Filter out rare events (< 200)

d_zone_lpr_event_types <- filter(d_zone_lpr_event_types, 
                                 sum_event_type >= 200)

d_zone_pass_event_types <- filter(d_zone_pass_event_types, 
                                  sum_event_type >= 200)

d_zone_reception_event_types <- filter(d_zone_reception_event_types, 
                                       sum_event_type >= 200)

d_zone_puck_protection_types <- filter(d_zone_puck_protection_types, 
                                       sum_event_type >= 200)

d_zone_block_types <- filter(d_zone_block_types, 
                             sum_event_type >= 200)

d_zone_check_event_types <- filter(d_zone_check_event_types, 
                                   sum_event_type >= 200)

d_zone_dumpout_event_types <- filter(d_zone_dumpout_event_types, 
                                     sum_event_type >= 200)

d_zone_controlled_exit_event_types <- filter(d_zone_controlled_exit_event_types, 
                                             sum_event_type >= 200)

# Combine events and assign ids

events <- bind_rows(d_zone_lpr_event_types,
                    d_zone_pass_event_types,
                    d_zone_reception_event_types,
                    d_zone_puck_protection_types,
                    d_zone_block_types,
                    d_zone_check_event_types,
                    d_zone_dumpout_event_types,
                    d_zone_controlled_exit_event_types)

events$target_event_id <- paste(events$event_name, events$event_type)

events <- select(events, c(4,1:3))

# TAG SUCCESSFUL TARGET EVENTS #################################################

# Loop through the events object to tag the target events in the play-by-play data

loop_length <- length(events$target_event_id)
loop_list <- list()

for (i in 1:loop_length) {
        
        loop_data <- tag_successful_target_events(as.character(events[i,2]), as.character(events[i,3]))
        
        loop_list[[i]] <- loop_data
}

tagged_successful_target_events <- bind_rows(loop_list)

# Join target_event_id to play-by-play data

tagged_successful_target_events <- select(tagged_successful_target_events,
                                          event_id,
                                          successful_target_event_id)

clean_pbp_data <- left_join(clean_pbp_data, 
                            tagged_successful_target_events, 
                            by = "event_id")

# IDENTIFY ELIGIBLE GOALS AND PRIOR EVENTS #####################################

# Filter the play-by-play data for eligible goals
# Find the time windows based on the times for eligible goals
# The time window starts 20 seconds before an eligible goal

eligible_goals <- clean_pbp_data %>%
        filter(strengthstate == "evenStrength",
               period < 4,
               opposing_goalie != "empty_net",
               goal == 1)

eligible_goal_ids <- eligible_goals$event_id

eligible_goals <- eligible_goals %>%
        group_by(game_id) %>%
        mutate(eligible_event_time_start = compiledgametime - 20) %>%
        select(game_id,
               event_id, 
               eligible_team = teamname, 
               eligible_event_time_start,
               eligible_event_time_end = compiledgametime) %>%
        ungroup() %>%
        select(-game_id)

# Join eligible goals to play-by-play data

clean_pbp_data$eligible_goal <- ifelse(
        clean_pbp_data$event_id %in% eligible_goal_ids, 
        TRUE, 
        FALSE )

# Identify eligible prior events

clean_pbp_data <- clean_pbp_data %>%
        left_join(eligible_goals,
                  by = "event_id")

clean_pbp_data <- clean_pbp_data %>%
        group_by(game_id) %>%
        fill(eligible_event_time_start, 
             eligible_event_time_end, 
             eligible_team,  
             .direction = "up") %>%
        ungroup()

clean_pbp_data <- clean_pbp_data %>%
        mutate(target_event_with_eligible_goal = ifelse(
                successful_target_event_id > 0 &
                eligible_team == teamname &
                compiledgametime >= eligible_event_time_start &
                compiledgametime <= eligible_event_time_end, 
                TRUE, 
                FALSE))

# SUMMARIZE DATA AND COMPUTE PROPORTIONS #######################################

# Get the total number of eligible target events

sum_eligible_target_events <- clean_pbp_data %>%
        group_by(successful_target_event_id) %>%
        summarise(sum = n())

sum_eligible_target_events <- sum_eligible_target_events %>%
        filter(successful_target_event_id > 0)

# Get the total number of eligible goals for each target event

sum_eligible_target_events_with_goals <- clean_pbp_data %>%
        filter(target_event_with_eligible_goal == TRUE) %>%
        group_by(successful_target_event_id) %>%
        summarise(goals = n())

# Compute the proportion of target events that lead to an eligible goal

target_event_proportion_data <- sum_eligible_target_events %>%
        left_join(sum_eligible_target_events_with_goals, 
                  by = "successful_target_event_id")

target_event_proportion_data <- target_event_proportion_data %>%
        mutate(proportion = goals / sum)

# ADD THE PROPORTIONS TO THE MAIN PLAY-BY-PLAY DATA ############################

clean_pbp_data <- clean_pbp_data %>%
        left_join(target_event_proportion_data, 
                  by = "successful_target_event_id")

clean_pbp_data$proportion[is.na(clean_pbp_data$proportion)] <- 0

# TAKE A LOOK AT SOME RESULTS ##################################################

# Find highest single game scores

top_game_score <- clean_pbp_data %>%
        filter(playerprimaryposition == "D") %>%
        group_by(game_id, 
                 player) %>%
        summarise(expected_contribution_to_goals = sum(proportion)) %>%
        arrange(desc(expected_contribution_to_goals)) %>%
        ungroup()

top_game_score$expected_contribution_to_goals <- round(top_game_score$expected_contribution_to_goals, 4)

# Put the top 25 game scores in fancy tables

top_25_table <- top_game_score %>%
        slice_head(n =25) %>%
        rename("Player" = player,
               "xContribution" = expected_contribution_to_goals,
               "Game" = game_id)

top_25_summary <- top_25_table %>%
        group_by(Player) %>%
        summarize(count = n()) %>%
        arrange(desc(count)) %>%
        rename("Top 25 Finishes" = count) %>%
        ungroup()

top_25_table <- top_25_table %>%
        select(c(2,3,1))

top_25_table <- kable(top_25_table, format = "html", 
                      align = c("l", "c", "c")) %>%
        kable_styling(position = "center",
                      full_width = FALSE)

#top_25_table

top_25_summary_table <- kable(top_25_summary, format = "html", 
                              align = c("l", "c")) %>%
        kable_styling(position = "center",
                      full_width = FALSE)

#top_25_summary_table

# PLOT DATA FOR A SINGLE GAME ##################################################

# Selected the game with the highest single game score
# Note: it would be easy to turn this into a function with game/team arguments

selected_game_id <- "SCO-SON 2023-03-04"
selected_team <- "Sonnet"

# Add cumulative expected contributions to team goals

plot_data_game <- clean_pbp_data %>%
        filter(game_id == selected_game_id,
               teamname == selected_team,
               playerprimaryposition == "D",
               strengthstate == "evenStrength",
               period < 4,
               opposing_goalie != "empty_net") %>%
        group_by(player) %>%
        mutate(cum_xcontribution = cumsum(proportion)) %>%
        rename("Defense" = player)

plot_data_game_goals <- clean_pbp_data %>%
        filter(game_id == selected_game_id,
               teamname == selected_team,
               goal == 1)

plot_data_game_goals <- plot_data_game_goals$compiledgametime

# Arrange players in descending order (used for the plot legend)

plot_limits <- plot_data_game %>%
        arrange(desc(cum_xcontribution))

plot_limits <- unique(plot_limits$Defense)

# Create a vector of period start / end times

plot_period_ends <-c(0, 1200, 2400, 3600)

# Max xContribution

plot_y_end <- max(plot_data_game$cum_xcontribution)

# Plot the single game data
        
plot_game <- ggplot() + 
        geom_step(data = plot_data_game, 
                  aes(x = compiledgametime, 
                      y = cum_xcontribution, 
                      group = Defense, 
                      colour = Defense),
                  alpha = 0.7,
                  linewidth = 2.5) +
        theme_minimal() +
        theme(axis.text.x = element_blank(),
              axis.title.x = element_blank(),
              panel.grid = element_blank(),
              plot.title.position = "plot",
              plot.title = element_text(size = 20,
                                        face = "bold"),
              plot.subtitle = element_text(size = 16),
              plot.caption = element_text(size = 12),
              plot.caption.position = "plot",
              axis.title.y = element_text(size = 15),
              legend.title = element_text(size = 17,
                                          face = "bold"),
              legend.text = element_text(size = 16)) +
        geom_segment(aes(x = plot_period_ends, 
                         y = 0, 
                         xend = plot_period_ends, 
                         yend = plot_y_end),
                     alpha = 0.3,
                     linewidth = 0.6) +
        geom_segment(aes(x = plot_data_game_goals, 
                         y = -0.005, 
                         xend = plot_data_game_goals, 
                         yend = plot_y_end),
                     alpha = 0.2,
                     linetype = 5) +
        geom_segment(aes(x = 0, 
                         y = 0, 
                         xend = 3600, 
                         yend = 0),
                     alpha = 0.3,
                     linewidth = 0.6) +
        geom_segment(aes(x = 0, 
                         y = plot_y_end, 
                         xend = 3600, 
                         yend = plot_y_end),
                     alpha = 0.3,
                     linewidth = 0.6) +
        geom_text(aes(x = plot_data_game_goals, 
                      y = -0.01), 
                  label = "G", 
                  size = 4, 
                  alpha = 0.5) +
        geom_text(aes(x = 200, 
                      y = -0.03), 
                  label = "Period 1 >>>", 
                  size = 4, 
                  alpha = 0.5) +
        geom_text(aes(x = 1400, 
                      y = -0.03), 
                  label = "Period 2 >>>", 
                  size = 4, 
                  alpha = 0.5) +
        geom_text(aes(x = 2600, 
                      y = -0.03), 
                  label = "Period 3 >>>", 
                  size = 4, 
                  alpha = 0.5) +
        labs(y = "Cumulative xContributions", 
             title = "Expected Contributions To Team Goals From Events In The D-Zone \n(Even Strength / Regulation Time)", 
             subtitle = paste("Game:", selected_game_id, " Team:", selected_team),
             caption = "Data by Sportlogiq, viz by 18 Skaters, #WHKYHAC") +
        scale_colour_viridis_d(limits = plot_limits)

#plot_game

# FURTHER WORK #################################################################

# Exclude goals where there is an intervening face-off?
# Add expected contributions to goals against?
# Build out contributions in other zones?
# GET WAY MORE DATA TO BUILD THE MODEL :)

The End

That’s it. I’ll post more data visualizations in the days leading up to the July 2 deadline for the Viz Launchpad Competition.

Mark (18 Skaters)