Pages

Friday, January 27, 2017

Rink Bias for SOG and Misses

Sadly, because the NHL still records their data in a shitty matter there are some errors to be found. We see this in the location of recorded events and even with the actual amount of events (http://objectivenhl.blogspot.com/2010/03/shot-recording-bias-part-n.html). For example, as talked about in that last link, some rinks may inflate or deflate the numbers of shots on goal that occurred (or really saves), Every arena has their own trackers and they all have their own certain biases so this is to be expected. This has been looked at previously by Macdonald and Schuckers, but those methods are beyond my grasp so I'll try to replicate the numbers in a simpler manner for both shots on goal and misses. All data used here is courtesy of Corsica.hockey. (Note: I think it goes without saying that all numbers used here are 5v5).

The way the rink factors (I'm lifting the term "factors" from baseball) will be calculated is very simple. I'm not looking for some perfect way of doing this but an easy way of getting a solid estimate. And what the factors will tell us is how much each rink over/under counts saves and misses. So if a factor is 1.2, that means it over counts that statistic by a factor of 1.2. So to adjust the statistic in question, we would multiply those that occur in that rink by 1/1.2. So what we'll do is compare home numbers to road numbers, take multiple years into account, and regress (similar to this method). The comparison of home to road numbers is meant to isolate the home rink effect, as it assumes that that the away numbers are indicative of the "true" numbers we'd expect. This of course isn't true because the road numbers won't perfectly even out as they'll be biases there too. But I think the road numbers should "mostly" even out so I'll let this be for now.

Another problem is that both the home and road numbers are just a one year sample (like looking at one year of a player's statistics). Therefore I'll take multiple years into account (If possible of course. This doesn't apply to teams that changed arenas). How many? Well I chose three. Why? Because beyond three years didn't really seem to add anything. So, for example, the rink factors for 2015-2016 will take into account the two previous years (they will each be weighted the same too). The last problem is that just like one year isn't indicative of the "true factor" each measurement will also contain some randomness. So each factor will have to be regressed a certain amount (because for all we know we are just measuring randomness).

Lastly, all numbers here will be score adjusted. Adjusting for score is done to account for the fact that a team may have trailed or lead more at home (or on the road). And in order to isolate any one effect we have to do our best to account for other possible one's. This will be done similarly to how Micah Blake McCurdy laid it out here. It differs in that while Micah calculates the coefficient for each state by using the average events for both teams in that game state, I used the average % in general not just the average for that state. For example, here's the (5v5) Sv% by state for the away team:

Road Lead       Sv%
-3+                  .9116
-2                    .9085
-1                    .9165
0                     .923
1                     .9225
2                     .9236
3+                   .9265
Average          .9202

So to arrive at the coefficient for each state we just divide Average/Sv%. I did this because while with shot metrics we care what the other team does as it effects both sides (how many shots Team A gets is how much Team B gives up). For something like Sv%, how much goals Team A gives up per shot is irrelevant to Team B. So I just related it back to the average. With all that said, I think adjusting for score here doesn't really change anything.

Shots on Goal

As Tore Purdy originally noted, when we say a rink may over/under count the number of shots, we really mean saves. Thankfully goals are impossible to miscount. Therefore he used sh% to better examine the issue. I'll be doing it in the same spirit but in a slightly different manner. I'm going to be using the ratio of saves to goals ( Sv%/(1-Sv%) ). The higher the ratio, the higher the Sv%.

So for every team I calculated the cumulative ratio (meaning we get the ratio for both teams combined) at home and the cumulative ratio on the road (Reminder: The numbers here are score adjusted). We then just divide Home by Away (Home/Away) to get our base "factor". If it's above 1 that means the ratio was higher on home than on the road, and vice versa if it's below 1. But, as we discussed before, we should really use more than one year. And also, for all we know these numbers are completely random so we'll need to check the repeatability of these "factors". So for each season (if possible of course) I'll predict the Save factor using the past season, the past 2 seasons, and the past 3. The correlation coefficients are below:

Years        Sv% Factor
1 Year:       .033
2 Years:     .066
3 Years:     .0985

As you can see, the numbers here are pretty low. But based on the work by Schuckers and Macdonald (linked in the beginning) we expected this. As they noted the effects for shots are, by and large, rather small. And it's important to note, that just because the correlation is small that doesn't mean it doesn't exist. There exists a relationship, it just has a lot of noise. Here are the 3 year regressed factors for last year:


Team Shot Factor
CGY 0.986
OTT 0.987
MIN 0.988
TOR 0.988
STL 0.990
COL 0.992
WPG 0.992
CBJ 0.994
BUF 0.997
S.J 0.997
NYI 0.998
DAL 0.999
L.A 1.000
PIT 1.000
DET 1.000
NYR 1.000
N.J 1.001
T.B 1.002
NSH 1.002
FLA 1.004
EDM 1.004
WSH 1.004
VAN 1.005
ARI 1.009
CAR 1.010
ANA 1.011
PHI 1.011
BOS 1.014
MTL 1.017
CHI 1.017

Most of the factors are rather small (especially compared to misses as we'll see soon). Most of them are really closely centered around 1. I won't say it doesn't matter......but it kind of suggest that it mostly doesn't. Also it's important to remember what these numbers mean. These are how much each rink over/under counts saves. So when adjusting shots, these only apply to saves. Not to total shots on goal.


Misses

For misses I calculated the cumulative ratio of misses to shots on goal at home and away for each team (score adjusted). Shots on goal here were rink adjusted as I detailed in the last section. I then divided the home ratio by away ratio to get each team's Miss home factor. And what we see is there is a lot more signal in these numbers. Here is predicting each year's factor for each team using the past season, the past two seasons, and three (the numbers below are the r not r^2):

Years       Miss% Factor
1 Year:           .754
2 Years:         .7912
3 Years:         .797

Here are the 3 year regressed factors for last year:

Team Miss Factor
CHI 0.803
N.J 0.846
PIT 0.857
DET 0.868
COL 0.909
FLA 0.922
VAN 0.922
WPG 0.926
CBJ 0.929
NSH 0.936
NYI 0.941
MTL 0.942
BOS 0.958
T.B 0.964
OTT 0.965
CGY 0.980
BUF 0.983
WSH 0.990
NYR 1.009
PHI 1.030
EDM 1.035
MIN 1.042
STL 1.064
S.J 1.097
ARI 1.154
ANA 1.179
DAL 1.213
L.A 1.223
TOR 1.243
CAR 1.244

     As you can see these are a lot more significant. They also align with what Schuckers and Macdonald found (go to the misses section of their study-Table 8): Toronto, Dallas, Carolina, and L.A tend to over count and Chicago and N.J tend to undercount. Of course a few stand from the factors they listed and mine. This seems to mostly be due to the fact that their numbers apply from the 2007-2013 seasons and mine above only take 2013-2016. Looking back (I posted all the factors back to 2007-2008 at the end of this post) at the factors from the time period they used, the numbers seem to align more with those posted by Schuckers and Macdonald. For example CBJ and BOS appear lower and Chicago's factor is closer to .6 (this is besides for N.J as they report numbers lower than mine for them). It's possible the official recorders at each rink have shaped up a little since then.

Effect of Adjusting

It's nice seeing the raw factors but a better way to really see the effect on rink bias is to see it in action. So I calculated the rink adjusted Sv% and Miss% for each goalie from 2007 until last season. That is each save they made, not just the the home numbers (these numbers will also be included in the Google Docs). To see the effect on rink adjusting, I looked at all goalie seasons in which the player played at least 30+ games. I then calculated the difference between their raw numbers and their adjusted one's. Below are the 10 largest differences for Sv%:




Goalie Year Sv%              Adjusted Sv% Difference
MIKE.CONDON 20152016 0.9138 0.9127 0.00109
JEAN-SEBASTIEN.GIGUERE 20102011 0.9130 0.9121 0.00090
ANTERO.NIITTYMAKI 20092010 0.9186 0.9179 0.00079
JOHAN.HEDBERG 20102011 0.9184 0.9192 -0.00074
MARTIN.BRODEUR 20092010 0.9248 0.9255 -0.00065
COREY.CRAWFORD 20152016 0.9332 0.9326 0.00062
MIKE.SMITH 20092010 0.9113 0.9107 0.00061
JONAS.GUSTAVSSON 20092010 0.9143 0.9137 0.00061
JOSH.HARDING 20112012 0.9205 0.9211 -0.00058
MARTIN.BRODEUR 20102011 0.9124 0.9130 -0.00057


As you can see even at the extremes the differences are small. The highest, Mike Condon, only moves 1% and the tenth highest (Brodeur) only moves a little over .5%. While I don't want to say these differences are irrelevant, they really don't move the needle too much. Let's now look at the 10 largest for Miss%:


Goalie Year Miss%        Adjusted Miss% Difference
CRISTOBAL.HUET 20092010 0.2633 0.2971 -0.03377
NIKOLAI.KHABIBULIN 20082009 0.2128 0.2447 -0.03196
JONATHAN.QUICK 20102011 0.3065 0.2773 0.02930
JONATHAN.BERNIER 20142015 0.3120 0.2835 0.02849
JONATHAN.QUICK 20132014 0.3117 0.2834 0.02835
NIKOLAI.KHABIBULIN 20072008 0.2070 0.2352 -0.02817
JONATHAN.QUICK 20142015 0.2959 0.2690 0.02687
CAM.WARD 20152016 0.3025 0.2768 0.02570
CRISTOBAL.HUET 20082009 0.2542 0.2799 -0.02565
ANTTI.NIEMI 20092010 0.2653 0.2898 -0.02443


The differences here are a lot bigger. To put these differences into perspective, let's look at Cam Ward's numbers on the above chart. Last season, among goalies who played at least 30+ Games, the mean Miss% was .2811 and the standard deviation was .0173. That means the z score for Cam Ward last year was 1.24 ( (.3025-.2811)/.0173). That means he was in the .8925 percentile in Miss% among goalies with 30+ Games. After adjusting for rink I did the same procedure. Ward's z score this time around was -.196. This puts him in about the .425 percentile. After adjusting he went from near the top of league to slightly below average. I know this is one of the more extreme examples but I think it shows how the rink bias can play a large role in regard to misses.

Conclusion
To conclude, in this post I represent a very simple way of calculating a rink's bias in regard to shots on goal and misses. The results were nothing new as this has been established previously by Schuckers and Macdonald. That is that while biases exists for both SOG and misses, it is a fair deal greater for misses.

I'd also argue that the subsequent example of Cam Ward shows how vital it is to adjust for rink bias in regards to misses. While I believe that both shots on goal and misses should be adjusted (well...ideally for shots, it really doesn't matter), the errors for misses seem much greater and warrant intervention. I hope that this post inspires more (and better) research into rink effects. I also hope to expand upon this post in the near future.

Here's a Google Docs with all the numbers.

**All Data here is courtesy of Corsica.Hockey