The way the rink factors (I'm lifting the term "factors" from baseball) will be calculated is very simple. I'm not looking for some perfect way of doing this but an easy way of getting a solid estimate. And what the factors will tell us is how much each rink over/under counts saves and misses. So if a factor is 1.2, that means it over counts that statistic by a factor of 1.2. So to adjust the statistic in question, we would multiply those that occur in that rink by 1/1.2. So what we'll do is compare home numbers to road numbers, take multiple years into account, and regress (similar to this method). The comparison of home to road numbers is meant to isolate the home rink effect, as it assumes that that the away numbers are indicative of the "true" numbers we'd expect. This of course isn't true because the road numbers won't perfectly even out as they'll be biases there too. But I think the road numbers should "mostly" even out so I'll let this be for now.
Another problem is that both the home and road numbers are just a one year sample (like looking at one year of a player's statistics). Therefore I'll take multiple years into account (If possible of course. This doesn't apply to teams that changed arenas). How many? Well I chose three. Why? Because beyond three years didn't really seem to add anything. So, for example, the rink factors for 2015-2016 will take into account the two previous years (they will each be weighted the same too). The last problem is that just like one year isn't indicative of the "true factor" each measurement will also contain some randomness. So each factor will have to be regressed a certain amount (because for all we know we are just measuring randomness).
Lastly, all numbers here will be score adjusted. Adjusting for score is done to account for the fact that a team may have trailed or lead more at home (or on the road). And in order to isolate any one effect we have to do our best to account for other possible one's. This will be done similarly to how Micah Blake McCurdy laid it out here. It differs in that while Micah calculates the coefficient for each state by using the average events for both teams in that game state, I used the average % in general not just the average for that state. For example, here's the (5v5) Sv% by state for the away team:
Road Lead Sv%
-3+ .9116
-2 .9085
-1 .9165
0 .923
1 .9225
2 .9236
3+ .9265
Average .9202
So to arrive at the coefficient for each state we just divide Average/Sv%. I did this because while with shot metrics we care what the other team does as it effects both sides (how many shots Team A gets is how much Team B gives up). For something like Sv%, how much goals Team A gives up per shot is irrelevant to Team B. So I just related it back to the average. With all that said, I think adjusting for score here doesn't really change anything.
Shots on Goal
As Tore Purdy originally noted, when we say a rink may over/under count the number of shots, we really mean saves. Thankfully goals are impossible to miscount. Therefore he used sh% to better examine the issue. I'll be doing it in the same spirit but in a slightly different manner. I'm going to be using the ratio of saves to goals ( Sv%/(1-Sv%) ). The higher the ratio, the higher the Sv%.
So for every team I calculated the cumulative ratio (meaning we get the ratio for both teams combined) at home and the cumulative ratio on the road (Reminder: The numbers here are score adjusted). We then just divide Home by Away (Home/Away) to get our base "factor". If it's above 1 that means the ratio was higher on home than on the road, and vice versa if it's below 1. But, as we discussed before, we should really use more than one year. And also, for all we know these numbers are completely random so we'll need to check the repeatability of these "factors". So for each season (if possible of course) I'll predict the Save factor using the past season, the past 2 seasons, and the past 3. The correlation coefficients are below:
Years Sv% Factor
1 Year: .033
2 Years: .066
3 Years: .0985
As you can see, the numbers here are pretty low. But based on the work by Schuckers and Macdonald (linked in the beginning) we expected this. As they noted the effects for shots are, by and large, rather small. And it's important to note, that just because the correlation is small that doesn't mean it doesn't exist. There exists a relationship, it just has a lot of noise. Here are the 3 year regressed factors for last year:
Team | Shot Factor |
CGY | 0.986 |
OTT | 0.987 |
MIN | 0.988 |
TOR | 0.988 |
STL | 0.990 |
COL | 0.992 |
WPG | 0.992 |
CBJ | 0.994 |
BUF | 0.997 |
S.J | 0.997 |
NYI | 0.998 |
DAL | 0.999 |
L.A | 1.000 |
PIT | 1.000 |
DET | 1.000 |
NYR | 1.000 |
N.J | 1.001 |
T.B | 1.002 |
NSH | 1.002 |
FLA | 1.004 |
EDM | 1.004 |
WSH | 1.004 |
VAN | 1.005 |
ARI | 1.009 |
CAR | 1.010 |
ANA | 1.011 |
PHI | 1.011 |
BOS | 1.014 |
MTL | 1.017 |
CHI | 1.017 |
Most of the factors are rather small (especially compared to misses as we'll see soon). Most of them are really closely centered around 1. I won't say it doesn't matter......but it kind of suggest that it mostly doesn't. Also it's important to remember what these numbers mean. These are how much each rink over/under counts saves. So when adjusting shots, these only apply to saves. Not to total shots on goal.
Misses
For misses I calculated the cumulative ratio of misses to shots on goal at home and away for each team (score adjusted). Shots on goal here were rink adjusted as I detailed in the last section. I then divided the home ratio by away ratio to get each team's Miss home factor. And what we see is there is a lot more signal in these numbers. Here is predicting each year's factor for each team using the past season, the past two seasons, and three (the numbers below are the r not r^2):
Years Miss% Factor
1 Year: .754
2 Years: .7912
3 Years: .797
Here are the 3 year regressed factors for last year:
Team | Miss Factor |
CHI | 0.803 |
N.J | 0.846 |
PIT | 0.857 |
DET | 0.868 |
COL | 0.909 |
FLA | 0.922 |
VAN | 0.922 |
WPG | 0.926 |
CBJ | 0.929 |
NSH | 0.936 |
NYI | 0.941 |
MTL | 0.942 |
BOS | 0.958 |
T.B | 0.964 |
OTT | 0.965 |
CGY | 0.980 |
BUF | 0.983 |
WSH | 0.990 |
NYR | 1.009 |
PHI | 1.030 |
EDM | 1.035 |
MIN | 1.042 |
STL | 1.064 |
S.J | 1.097 |
ARI | 1.154 |
ANA | 1.179 |
DAL | 1.213 |
L.A | 1.223 |
TOR | 1.243 |
CAR | 1.244 |
As you can see these are a lot more significant. They also align with what Schuckers and Macdonald found (go to the misses section of their study-Table 8): Toronto, Dallas, Carolina, and L.A tend to over count and Chicago and N.J tend to undercount. Of course a few stand from the factors they listed and mine. This seems to mostly be due to the fact that their numbers apply from the 2007-2013 seasons and mine above only take 2013-2016. Looking back (I posted all the factors back to 2007-2008 at the end of this post) at the factors from the time period they used, the numbers seem to align more with those posted by Schuckers and Macdonald. For example CBJ and BOS appear lower and Chicago's factor is closer to .6 (this is besides for N.J as they report numbers lower than mine for them). It's possible the official recorders at each rink have shaped up a little since then.
Effect of Adjusting
It's nice seeing the raw factors but a better way to really see the effect on rink bias is to see it in action. So I calculated the rink adjusted Sv% and Miss% for each goalie from 2007 until last season. That is each save they made, not just the the home numbers (these numbers will also be included in the Google Docs). To see the effect on rink adjusting, I looked at all goalie seasons in which the player played at least 30+ games. I then calculated the difference between their raw numbers and their adjusted one's. Below are the 10 largest differences for Sv%:
Goalie | Year | Sv% | Adjusted Sv% | Difference |
MIKE.CONDON | 20152016 | 0.9138 | 0.9127 | 0.00109 |
JEAN-SEBASTIEN.GIGUERE | 20102011 | 0.9130 | 0.9121 | 0.00090 |
ANTERO.NIITTYMAKI | 20092010 | 0.9186 | 0.9179 | 0.00079 |
JOHAN.HEDBERG | 20102011 | 0.9184 | 0.9192 | -0.00074 |
MARTIN.BRODEUR | 20092010 | 0.9248 | 0.9255 | -0.00065 |
COREY.CRAWFORD | 20152016 | 0.9332 | 0.9326 | 0.00062 |
MIKE.SMITH | 20092010 | 0.9113 | 0.9107 | 0.00061 |
JONAS.GUSTAVSSON | 20092010 | 0.9143 | 0.9137 | 0.00061 |
JOSH.HARDING | 20112012 | 0.9205 | 0.9211 | -0.00058 |
MARTIN.BRODEUR | 20102011 | 0.9124 | 0.9130 | -0.00057 |
As you can see even at the extremes the differences are small. The highest, Mike Condon, only moves 1% and the tenth highest (Brodeur) only moves a little over .5%. While I don't want to say these differences are irrelevant, they really don't move the needle too much. Let's now look at the 10 largest for Miss%:
Goalie | Year | Miss% | Adjusted Miss% | Difference |
CRISTOBAL.HUET | 20092010 | 0.2633 | 0.2971 | -0.03377 |
NIKOLAI.KHABIBULIN | 20082009 | 0.2128 | 0.2447 | -0.03196 |
JONATHAN.QUICK | 20102011 | 0.3065 | 0.2773 | 0.02930 |
JONATHAN.BERNIER | 20142015 | 0.3120 | 0.2835 | 0.02849 |
JONATHAN.QUICK | 20132014 | 0.3117 | 0.2834 | 0.02835 |
NIKOLAI.KHABIBULIN | 20072008 | 0.2070 | 0.2352 | -0.02817 |
JONATHAN.QUICK | 20142015 | 0.2959 | 0.2690 | 0.02687 |
CAM.WARD | 20152016 | 0.3025 | 0.2768 | 0.02570 |
CRISTOBAL.HUET | 20082009 | 0.2542 | 0.2799 | -0.02565 |
ANTTI.NIEMI | 20092010 | 0.2653 | 0.2898 | -0.02443 |
The differences here are a lot bigger. To put these differences into perspective, let's look at Cam Ward's numbers on the above chart. Last season, among goalies who played at least 30+ Games, the mean Miss% was .2811 and the standard deviation was .0173. That means the z score for Cam Ward last year was 1.24 ( (.3025-.2811)/.0173). That means he was in the .8925 percentile in Miss% among goalies with 30+ Games. After adjusting for rink I did the same procedure. Ward's z score this time around was -.196. This puts him in about the .425 percentile. After adjusting he went from near the top of league to slightly below average. I know this is one of the more extreme examples but I think it shows how the rink bias can play a large role in regard to misses.
Conclusion
To conclude, in this post I represent a very simple way of calculating a rink's bias in regard to shots on goal and misses. The results were nothing new as this has been established previously by Schuckers and Macdonald. That is that while biases exists for both SOG and misses, it is a fair deal greater for misses.
I'd also argue that the subsequent example of Cam Ward shows how vital it is to adjust for rink bias in regards to misses. While I believe that both shots on goal and misses should be adjusted (well...ideally for shots, it really doesn't matter), the errors for misses seem much greater and warrant intervention. I hope that this post inspires more (and better) research into rink effects. I also hope to expand upon this post in the near future.
Here's a Google Docs with all the numbers.
**All Data here is courtesy of Corsica.Hockey
No comments:
Post a Comment