A couple of weeks ago a very interesting post was published by @gentleputsch. Please give it a read. Ok, I'm assuming you read it. What interested me the most, as you can probably tell by the title of my post, was the team effect on Sv%. It was bigger than I expected and kind of caught me off guard. With that said, I had a few thoughts on the post (the main part of his post on considering shorthanded Sv% is another thing in itself so I'll leave it out for now) and I want to rerun the numbers a little differently.
First off, running the data just on the past three years seems odd. For all we know, running it on the previous three year stretch could bring back nothing. So we should really incorporate more data. Also, at first I didn't understand why he chose a three year period, but after thinking about it I agree. If we look at one year, we'll end up having a lot goalies who play 50 to 65 games, not leaving much of a team sample. This is lessened for a two year period and even better for a three year (I guess one could actually do it for 2 years but 3 seems like a better choice). With all that said, let's see the numbers.
Just in case some people don't remember, these numbers here are 5v5. The data here is from 2007-2008 until this year. I divided it up into three, three year periods: 2007-2010, 2010-2013, 2013-2016. To be included, a goalie needs to have played all three years with the same team. But instead of just choosing the goalie with the most minutes for each team, I chose goalies with at least 1250 total shots faced. It's arbitrary, I know. But I wanted to get at least ~50 games for each goalie. The total amount of goalies is 84 (so choosing this over the most minutes doesn't make a real difference, we end up with basically the same players anyway). We then run a correlation between a the player's save percentages and that of the team when he's not on the ice. The following numbers are r (not r^2).
n Low_Sv% Mid_Sv% High_Sv%
84 .25 .01 .264
The numbers here seem a bit different than what @gentlepush published. Low_Sv% is about the same, but High_Sv% is lower and Mid_Sv% is nonexistent. It's kind of odd that we get something for low but not mid. Running the numbers for 1 year and for 2 year spans bring back just about the same numbers for mid and high Sv%, Low was practically zero for those years but for some reason gets a spike here. Either way Low_Sv% doesn't matter because we already know that Low_Sv% means almost nothing. As I estimated in my last post it needs about ~13000 shots to regress it (and it's actually slightly higher because of the team effect). That means when a goalie has logged about that many low danger shots we know about half his talent. So we shouldn't even bother with it. With that and Mid_Sv%, which is in the clear, out of the way we now have to deal with High_Sv%.
So the question here is: How do we account for the team effects? I think the first step is thinking about what makes up a player's High_Sv% (or to be fair Sv% in any zone). We know a player's own skill matters, some luck, and as we covered here team effects. So it's kind of like this:
Observed High_Sv%= Talent + Luck + Team Effects
With this in hand, we can estimate the spread of High_Sv% due to talent.
Observed var = Talent var + Luck var + Team var
The var here is variance. Basically the spread in observed High_Sv% is made up of the spread in talent, luck, and team effects. So let's get some numbers down (those familiar with Tom Tango's work will recognize this type of equation):
One Standard Deviation of observed High_Sv%= .0178 (I thinks it's important to remind everyone I'm doing these numbers over a three year period, for 1 or 2 year the observed SD would be higher).
Luck= p*(1-p)/ Avg # of Shots (this would be the binomial variance)
p here is the average High_Sv% over the period, which is .832. So.....
(.832)(.168)/798.5 = .0000175 (this is the variance, one SD is therefore .0132)
Lastly, team here would just be our correlation multiplied by the observed spread. So we get one SD of Team effect is .0047.
So let's plug it in.
.0178^2 = Talent^2 + .0132^2 + .0047^2
Solving for Talent we get...... SD of Talent = .011
Now all these numbers are nice, but I don't think anyone really cares. But why does this all matter? Because our estimation of a goalie's High_Sv% talent was previously overinflated. We incorporated the team effects into the numbers. And now that we, presumably, got rid of it we could get a better estimate (which I tried two weeks ago until I saw....) as to how much do we need to regress High_Sv%. Mid_Sv% is the same, Low_Sv% doesn't matter, but High_Sv% needs to be adjusted. As usual Tom Tango provides a solution (it seems like his new and old blog are filled with dozens of ways to regress, each a little different)
So let's do it:
Talent SD^2/Observed SD^2= .378 (this would be r^2)
And now.............. (1-.378)/.378 *799 = 1314 Shots
My last estimate of how much to regress High_Sv% was 1121 shots. Now it's about 200 more. Makes sense to me.
At the end of the day, it seems like we were mistaking a little noise for signal in a goalie's High_Sv%. The SD of Talent is really smaller and we just have to regress a little more. I wonder what the numbers would look like for a goalie metric which instead of dividing up by danger zone assigns a probability to each shot (there are a couple of these models out there). I'd assume the team effect is smaller there.
***All Data courtesy (of the now defunct) War_On_Ice
No comments:
Post a Comment