Pages

Monday, February 27, 2017

A Few Stuff about Miss%

This post is a combination of a few stuff on Miss%. DTMAboutHeart originally wrote about Miss% and I wrote my own post in July. In my post I used only road numbers, noting that ideally I'd adjust for rink bias. I recently worked out some rink adjustments for misses so in the first part of this post I'll use that to conduct a better analysis on Miss%. This isn't to say that I think my method of adjusting is perfect, but I think it does a good enough job (and it's definitely better than just getting rid of half the data). In the second part of this post I talk about a few things concerning saves and misses.

(**All Data courtesy of Corsica.hockey**)

Reliability

In DTM's original post he showed how Miss% had good repeatability. I kept is simple here. Using the adjusted data I ran year over year correlations for goalies with 20+ and 40+ games in back to back seasons going back to 2007. I also included the corresponding values for Sv% from this piece by Emmanuel Perry. Here they are (numbers below are r^2 and only 5v5):

                      Miss%       Sv%
20+ Games:   .136          .042
40+ Games:   .231          .072

As you can see (and I assume already know) Miss% is easily more repeatable than Sv%. 

Team Effects

I checked for this in my original post. The methodology used there will be the same as here, expect now I won't be restricted to only road data. The numbers used here are only 5v5 and span from 2008-2009 to 2015-2016. In order to achieve a better sample for players with more shots, I coupled up the years. Years 2008-2009 and 2009-2010, 2010-2011 and 2011-2012 are paired up and so on. This is done for sample size reasons. For each pair, I gathered each player's numbers by team and matched it up with that team's numbers in that couplet. Then for each goalie I calculated that goalie's Miss% and the Miss% for the goalie's team when he wasn't on the ice.  And then, based on what sample I'm using, I run a correlation between what the goalie did for that team (and that team only, any numbers the goalie accumulated for other teams was excluded) and what the team did without him. 

The numbers below are r^2 and the sample restriction applies for both the goalie and without him (you may note that in my original post I divided it up by danger zone and here I'm not. Ideally I would do that but in order to do that there are a few things I need to account for. Hopefully, I'll be able to update these numbers with those values in the near future):

Sample    n        r^2
500+       306     .038
750+       255     .036
1000+     194     .035

The column n is the number of goalies in that sample. The correlations here, while higher than what I previously found, are rather small. There definitely is a relationship but it's not very big and not something that should be a big concern. Also adjusting for shot quality (like is done on Corsica) should cut down on this.

Value of a Miss

I've heard a few common things about goalies and misses (not referencing anyone particular here). One that I feel obliged to talk briefly about is the idea that since a missed shot has no chance of going in the net it shouldn't be looked at. This is because, since a miss is not on goal it never had a chance of actually being a goal (Misses go in 0% of the time). I don't see how this makes any sense at all. If a goalie can influence the amount of shots that miss the net that means a goalie can "force" shots wide. That means shots that would normally end up hitting the net are now missing the net. Is this not a positive thing?

The reason, I think, is because they are thinking too much in terms of shots on goal and not unblocked shots. Let's think of a player releasing a shot (let's assume here it won't be blocked). There are two important factors when a player releases a shot. What is the probability of it hitting the net? And if it does, what it the probability that it goes in? These two values are caked into any "Fenwick" based expected goal model. Multiplying those two probabilities gets us an expected goal amount for that shot. With Sv% we look at at the at the second factor. Once it's on goal, how good is the goalie at stopping it. But the first factor matters too. The shot has to hit the net to go in. So if a goalie can affect the probability of a shot hitting the net he thereby reduces the expected goal amount of that shot. Just like how a goalie can influence whether a SOG will go in, a goalie can also influence whether a unblocked shot can hit the net. This is important too and can't be ignored.

Saves and Misses

What's interesting about including misses into our evaluation of goalies is the consequence for saves. With Sv% an average save was worth about .078 goals (since league average Sh% is about 7.8%). But if we include misses and now look at Fenwick Sv%, then misses and saves are interchangeable. They both result in the same thing, no goal. So both a miss and a save are worth .056 goals (league average FSh% is about 5.6%) since they are in the same bucket. 

Again, we can break this down into the probability of a shot hitting the net and if it does the probability of it going in. The average Miss% is about 27.8% (so 72.2% of shots hit the net). So if a shot misses the net, that means a goalie stopped a SOG which carries a .078 value. But since the average shot hits the net 72.2% of the time, he only stopped .728 SOG's. He was expected on average to make .278 misses so he made .722 above average. And .728*.078 ~.056 goals. 

For shots on goal we used to have a save valued at .078. But on an average unblocked shot we expect 72.2% to hit the net. So allowing a SOG is more than we would expect (.278 more). Stopping the SOG is worth .078 but we have to penalize the goalie for allowing the SOG in the first place. So we do .078-(.278*.078)~.056 goals. 

With all that said, I don't think saves and misses are, on average, equal in terms of goals saved. The reason is simple. Imagine I said three shots happened in the past 5 minutes. One resulted in a goal, one resulted in a save, and another in a miss. If you had to guess where each shot most likely came from and the circumstances surrounding it, I'd imagine, you would expect the goal came the closest to the net, followed by the save and then the miss (and that the goal was the most likely to be a rebound or a rush shot). So if we had to guess how dangerous each shot likely was, the order would be: Goal->Save->Miss. So we can infer from the outcome that a save is likelier to be a more dangerous shot than a miss.

We can look at this. Using the PBP files graciously shared by Emmanuel Perry and his expected goal model, we can calculate the average expected goal amount for a save and a miss. This is good because it gives us an unbiased view of how dangerous the shot was without knowing the outcome. What we see is that the the average save has an xG of .0569 and the average Miss has .0488. The difference here is ~.008 goals (note: This is random but the adding up the total xG of every shot and calculating the expected xG per shot we get .0586 which suggest that this model slightly overrates the probability of a shot going in).

So saves tend to be more dangerous shots. So they aren't "equal". But any model that uses a shot quality component already has this factored in, so as long as one controls for quality this doesn't matter. So, yes, misses are less dangerous shots but this is something most models already account for (like DTM's model and Adj.Fsv% on Corsica). This would only be an issue with raw Fenwick Sv%. As an aside, I don't think the value of a save and a miss is as simple as this but that's for another post. 

Conclusion

To recap, using rink-adjusted numbers (as opposed to only road numbers) I corroborated the findings that goalie Miss% is more repeatable than Sv% and that the team effects affecting Miss% is minimal. I then followed that up with some assorted thoughts on saves and misses.


No comments:

Post a Comment