Friday, March 9, 2018

Evaluating my Shooter xG model

In my last post, I created a newer expected goals model that included a component for the shooter. That post just focused on creating the model itself so I'll now aggregate the numbers and test it out (by "test" I mean how well does it predict out of sample or future goals). All numbers in this piece are 5v5, score adjusted, and span from the 2007-2016 season (unless otherwise stated). Also in this piece I'll only be looking at skaters and teams. Goalies deserve a post all to themselves.


I'll be testing the team numbers like DTM did here. For each team and season, from values of k=5 to k=75 in steps of 5 (5, 10, 15...etc.) I randomly select k games and place that into one bucket, the rest of the games (82-k) are then placed into another bucket. I do this for every season for every team and take the correlation between the two buckets. I then do this 250 times and average the correlations for each value of k using Fisher Z Transformation. I excluded the 2012 season in this analysis.

I also chose to include the Expected Goal model over at Corsica (you can query data for teams on a game by game basis under custom query). A small issue is that he's missing some games for the 2007 season. Some teams are missing up to 10 games. For this reason I chose to exclude those teams with under 80 games of data from Corsica entirely from this analysis. This applies to 10 teams so instead of 270 observations we have 260 (Fwiw, I tested it out for my data on the full 270 and as you can probably guess it really doesn't change anything).

Before I start showing you some graphs I'll just define the less obvious stat names: xGF% is my standard xG model, corsica_xGF% is Manny's xG model, sh_xGF% is my shooter xG model that I developed in the past post, and wshF% is Weighted Shots.

** Note: Small mistake...should be "Out of Sample" instead of "Future" in the titles below

And this is very interesting. Mine and Manny's xG models are nearly identical and are both better than goals. Next is the shooter xG model. Then the top two, by far are Corsi and Weighted shots. I'll be honest and say I was skeptical about how good the xG models would be but I expected them to be better than this. Also CF% and wshF% are clearly better here than sh_xG which is not at all what DTM and Asmae found. Our numbers for Corsi and Goals are identical but sh_xG isn't even close to their xG (there isn't a comparison for the standard xG models).

I also decided to include a couple of other graphs for trying to predict GF60 and GA60:

I think these both make sense. For predicting GF60, sh_xGF60 and wshF60 are the two best with shooter xG pulling ahead after 30 games. Also regular GF60 is better than I thought it would be. For GA60, it really just comes down to weighted shots and corsi. 

I do, however, think there is one issue with the previous method (selecting k random games) for my shooter xG model. I guess the best way to explain it is with the most extreme example. Lets' say I'm up to k=40 and the split takes the last 40 games in a season for group one and the first 42 in a season for group two for some team (so we are trying to predict the GF% of the first 42 games with the last 40 games of sh_xGF%). As we know, sh_xGF% takes into account the shooting talent of the player for each shot which we derive from how much better that player was at scoring goals versus what we expected. This means how the team did during the first 42 games effect the sh_xGF% of the last 40. So, if the team as a whole shot better than expected in their first 42 then their respective multipliers will be higher in the last 40 (or vice-versa). So, in a sense, this is cheating as what we are trying to predict nudged the predictor in it's direction.

To account for this I decided to not use the shuffle and repeat method but to just test it chronologically like was done by Micah Blake McCurdy here. So for every team and season I use the first k games to predict the the last 82-k games. The previous method was convenient as it allowed us to gain a much better sample size but I'm kind of stuck here. So here are the previous three graphs redone (The only reason these are so much bumpier than the earlier ones is because those were done from k=5 in steps of 5. This was done form k=2 in steps of 2):

The only point of these graphs is to see how the shooter xG model changes in relation to everything else (for the other metrics the numbers produced by the previous method is superior) And it doesn't seem to do as well. In chronological order it's only slightly better than standard xG models. And for sh_xGF60 it doesn't pull ahead of the other metrics like it did before. I'm not really sure if this method does a better job than just choosing randomly (I already explained the issue with the shuffle method but the upside is a much bigger sample size) but it is something to think about it (it also goes without saying that there isn't a way to to run this test on DTM's and Asmae's xG).


We'll start off looking at how the individual metrics predict future G60. The following is the year over year correlations (r) with those with at least 400+ minutes in both seasons:

       Forwards      Defensemen
G60 0.372 0.313
ixG60 0.392 0.402
sh_ixG60 0.435 0.406
iCorsi60 0.372 0.384
iFen60 0.367 0.401

I think this aligns with what we expect. ixG60 is better than goals (at least on a season level) and sh_ixG60 is the best and better than just regular expected goals (more so for forwards because there is a bigger spread in talent for them).

Before we look at on-ice metrics I think it's important to talk about a concern regarding the use of the shooter xG model for on-ice numbers. Something that was voiced multiple times regarding DTM's numbers was the effect that the inclusion of the "shooter talent" variable would have on a player's on-ice numbers. This is because if you generally play with teammates who are good shooters your on-ice xG goes up (and the opposite). It's easy to see why this is a problem because any given player's numbers can look better or worse based on who he is put on the ice with (this of course applies to all numbers we use...but this is an extra problem on top of all that). One could even make the argument that this effects expected goals against for players who generally play against the other teams top line (but that's another issue all together).

So I think the way we deal with this is kind of simple. For a player's on-ice numbers we use the shooter xG model when the player himself shot it and the standard xG model when one of his teammates shot it. It's essentially a piecewise function that is defined as being my shooter xG model when the shot is taken by the given player and the standard model otherwise (if that helps you). If the player himself is a good/bad shooter we want to include that in his numbers. But we don't want to give the player credit (or take any away) if he plays with good/bad shooters. That's the only difference. His against and off-ice numbers all use the shooter xG model. So for the "shots for" portion of "sh_xGF%" and for "sh_xGF60" it's the shooter xG model for shots taken by the player and the regular xG model otherwise.

Ok, now we can get started. For this analysis, I also included the xG numbers over at Corsica like last section (would've included it for the last table but I forgot to grab it...if anyone is willing to run the numbers I'd update it). Specifically I'll be looking at how different relative metrics predict future goals. Below are the year over correlations (r not r^2) for forwards and defensemen with at least 400 TOI for Rel corsi, goals, expected goals, and weighted shots (Reminder that each stat is trying to predict the goal analog of it, so Rel CF60 predicts Rel GF60 and Rel xGA60 predicts Rel GA60):

Forwards      Defensemen
Rel GF% 0.262 0.077
Rel corsica_xGF% 0.296 0.198
Rel xGF% 0.32 0.176
Rel sh_xGF% 0.34 0.179
Rel CF% 0.371 0.232
Rel wshF% 0.381 0.216
Forwards      Defensemen
Rel GF60 0.488 0.139
Rel corsica_xGF60 0.511 0.243
Rel xGF60 0.512 0.238
Rel sh_xGF60 0.532 0.239
Rel CF60 0.533 0.268
Rel wshF60 0.559 0.261
Forwards      Defensemen
Rel GA60 0.129 0.088
Rel corsica_xGA60 0.181 0.175
Rel xGA60 0.166 0.151
Rel sh_xGA60 0.167 0.155
Rel CA60 0.135 0.141
Rel wshA60 0.159 0.146

Let's first look at the overall For% numbers. For forwards weighted shots is the best and for defensemen it's Corsi. xGF% is a bit behind for each. Also for forwards sh_xGF% is a slight upgrade over regular xG while it's basically the same for defensemen. On the offensive side we see weighted shots is on top for forwards and Corsi is for defensemen. Overall, we see the same trends as with For%. Lastly for defensive metrics, it's a little different. Expected Goals seems to be the best for both positions. There also doesn't seem to be any real difference between regular xG and shooter xG. Actually the only places the shooter xG model does better is on the offensive side for forwards (and then consequentially for the overall numbers).

I'd like to note that ideally I'd be using Rel TM instead of just Relative as @EvolvingWild notes here. Thankfully Corsica has them (It takes time to code and I'm lazy). So here are the numbers (I just threw them all together) under the same constraints (since it's from Corsica I'm limited to Corsi, Goals, and Manny's Expected Goals):
Forwards     Defensemen
RelT GF% 0.182 0.067
RelT corsica_xGF% 0.215 0.178
RelT CF% 0.268 0.216
RelT GF60 0.383 0.124
RelT corsica_xGF60 0.41 0.187
RelT CF60 0.428 0.238
RelT GA60 0.138 0.079
RelT corsica_xGA60 0.145 0.167
RelT CA60 0.142 0.148

I don't want to get to caught up in the numbers here. The only point I'm trying to make is that the same trends I observed with Relative numbers can be seen with Rel TM numbers too.    

Of course, we don't have to limit ourselves to just year over year correlations. One could make the (good) argument that while stats_x1 is better than stat_x2 with with one year of data, this will change given more years of data. To this point, I also ran correlations using 2 or 3 years to predict year x. For example, you can predict the 2010 season with: Just 2009, 2008 and 2009, or 2007-2009. I decided not to include any numbers from Corsica (so no Rel TM or his xG model) because it would be annoying to compile the numbers from the site. I'm not too concerned. The cutoff here is 400 minutes in the year we are trying to predict and 400*n for the first half (with n being the amount of year, so to qualify for the 3vs1 test you need 1200 minutes in that three year span). So here are the numbers (Note: "F"=Forwards, "D"=Defensemen, and the number is the amount of years being used):

   F-1 yr.    F-2 yrs.    F-3 yrs.    D-1 yr.    D-2 yrs.    D-3 yrs.
Rel GF% 0.262 0.299 0.343 0.077 0.117 0.151
Rel xGF% 0.32 0.333 0.35 0.176 0.211 0.215
Rel sh_xGF% 0.34 0.358 0.375 0.179 0.213 0.22
Rel CF% 0.371 0.38 0.393 0.232 0.237 0.237
Rel wshF% 0.381 0.389 0.411 0.216 0.232 0.238

Not much to say here. For forwards the same trends we observed with just one year stays the same. For defensemen, we see all the other metrics catch up to Corsi. Specifically weighted shots is on par with it for 2 and 3 years. Regular xG and shooter xG still lag behind after three years for both positions. I didn't expect that.

Here are the for F60 and A60 stats:

   F-1 yr.    F-2 yrs.    F-3 yrs.    D-1 yr.    D-2 yrs.    D-3 yrs.
Rel GF60 0.488 0.545 0.566 0.139 0.168 0.195
Rel xGF60 0.512 0.542 0.546 0.238 0.242 0.266
Rel sh_xGF60 0.532 0.562 0.565 0.239 0.243 0.26
Rel CF60 0.533 0.547 0.55 0.268 0.281 0.288
Rel wshF60 0.559 0.579 0.582 0.261 0.276 0.286

   F-1 yr.    F-2 yrs.    F-3 yrs.    D-1 yr.    D-2 yrs.    D-3 yrs.
Rel GA60 0.129 0.117 0.138 0.088 0.089 0.076
Rel xGA60 0.166 0.166 0.195 0.151 0.161 0.158
Rel sh_xGA60 0.167 0.169 0.198 0.155 0.167 0.162
Rel CA60 0.135 0.129 0.146 0.141 0.116 0.108
Rel wshA60 0.159 0.147 0.165 0.146 0.125 0.113

On the offensive side for forwards, sh_xGF60 overtakes Corsi after one year but still can't best wshF60. GF60 is on par with Corsi for 2 years and is actually the same as sh_xGF60 for 3 seasons. For defensemen, CF60 and wshF60 are still on top for all three. Goals stay well behind in last and sh_xGF60 doesn't show any advantage over xGF60. On defense we see the same thing as earlier with one season. xGA and sh_xGA do the best for both positions with corsi and weighted goals behind. 

Ok, there is a lot to digest from this section (or at least there is for me). So what are the takeaways from this part? Well we got a few:
  1. For forwards sh_xG60 is the best predictor of future G60 while for defensemen it's on par with regular xG60. 
  2. Instead of compiling on-ice numbers using just the shooter xG model we use a blend of that and the standard xG model. For shots for, we use shooter xG for shots taken by the player and regular xG for shots by his teammates. For shots against we use shooter xG. 
  3. For forwards, weighted shots is the best predictor of next year's goals with Corsi slightly behind it. Also sh_xGF% does better than xGF% but worse than Corsi.
  4. For defensemen, there is no difference between sh_xGF% and xGF%. CF% is the best predictor of next year's goals. 
  5. For predicting future goals against, xGA and sh_xGA are similar and both do the best for both positions.
  6. For CF%, xGF% (only Corsica's), and GF% the same relationships that we observe for Relative numbers hold true for Rel TM. 
  7. When using more than one year (two or three) to predict a single year the same general trends observed hold true with some changes (mainly being that for forwards GF60 and sh_xGF60 gain an edge over CF60). 

Discussion and Conclusion

The creation of (what I call) the shooter xG model is my attempt to replicate DTMAboutHeart and Asmae's expected goal model that took shooting talent into account. The last post focused on my methodology, which differed from theirs. This post was to test the model and see how well it did when compiling the numbers into metrics for teams and skaters. I observed a few differences. One is for the in season out of sample testing on the team level. While our values for CF% and GF% are similar they recorded much higher correlations for their xG model. While they were able to outperform CF% mine clearly wasn't able to. 

On the skater level the nature of our tests were very different so it is hard to compare. In their piece they showed that for in season out of sample predictions of G60 their xG model did better than G60 and iCF60. I found the same results when attempting to predict the next season's G60. So I guess our results are similar. For on-ice results, they ran in season out of sample testing for GF% for different raw metrics (regular CF%...). I predicted a given season x with n previous seasons for relative numbers. Due to the large differences in our testing (to be honest I don't agree with their methodology but that's besides the point) I don't think there is a basis of comparison between our skater on-ice tests. 

There are a few possible reasons that the results of my model, and DTM's and Asmae's model differ (I'm talking strictly about the results on the team level, as I said before I don't think I can make a fair comparison on the skater level):
  1. While I agree philosophically with how I accounted for shooter talent the fact that our methodology differed could play a role.
  2. Their model was trained and built for only even-strength data. My model was built on data from all-situations. It's possible that this may be why their model does better as all the numbers used in this analysis are 5v5.
  3. This is a smaller concern but my model was trained and tested on two more seasons (2015 and 2016). This could have an effect, I guess. 
  4. It could be the result of the different algorithm and inputs used to build our xG models. They used a logistic regression (I used a GBM). I also used more features in my model.  
Besides for that there isn't much else to say. Without the code and raw data used to create and test their model this is the best I can do (this isn't meant as a criticism...I'm just being honest as there isn't more I can do). Nevertheless, this is how my model performs. If you have any suggestions, comments, or criticisms you can let me know. 

** For those interested, the code and the data used for the project can be found here.

No comments:

Post a Comment