Fooled By Grittiness: Estimating Shooting Talent

I guess the first question here is, what do I mean when I say shooting talent? That's a fair question. To start off, let's think about shot quality. Shot quality is the probability of a shot going in. This quality is made up of a lot of different things. The distance of the shot from the goal, the angle, the type of shot, if it's a rebound, if it's a rush shot, the score.....etc. Another, sometimes overlooked, component is the the actual shooting talent of the shooter in question. Meaning in hockey parlance, "How good is his shot?". But how do we do this? (Dtmaboutheart's model tried to take this into account in his expected goals model here by regressing sh%, but I disagree with that method)

I think the best way to start to answer this, is to think about what makes up a player's shooting percentage. This as opposed to looking at sh%, because straight sh% isn't exactly shooting talent. A player could have a high sh% because of the quality of shots he takes or because of his actual shooting ability. Nevertheless, I think it's roughly like this:

Observed Sh%= Shot Quality* + Shooter Talent + Randomness

*When shot quality equals to the stuff I talked about in the first paragraph

So what we need to do: Is estimate the shot quality for each player, and regress for randomness and what should be left is our best estimate of a player's shooting talent. And the way to do this, is I think simple. First we'll estimate the quality using the expected goal model developed by Emmanuel Perry over at Corsica.hockey. And then we just do is:

Goals/Expected Goals= Shot Multiplier

Let's think about this. Let's compare two players: Both have the same expected goals but one exceeds in actual goals. So why is one player doing better than the other. Is this because of shooting talent or just randomness? If it's something real we would expect this measure to show some repeatability. If a player consistently scores more goals then we would expect, something is probably going one. But how much? This is obviously a sensitive measurement, so we need to be careful if we are measuring anything real or just pure randomness. So we'll run a regression.

First we have to split up forwards and defensemen. Those are two different positions and we should obviously expect forwards to have more talent here. And I'm going to run three regressions for each group. The first is a regular year over year regression, limited to players who have 500+ TOI in both seasons. But one year for this isn't the best judge. There is bound to be a good deal of randomness here, and it'll probably take a few years to get a better read. So my other two regressions will be 2 years vs. 2 years, and 3 vs. 3. The cutoff being 1000, and 1500 minutes respectively. These numbers are arbitrary and one can play around a little, but it still leaves us with a fine sample size for the analysis. Also there are obviously better ways to conduct this analysis, but this is a quick and easy way to do so. So here you go (the numbers are 5v5 data from 2007-2008- 2015-2016).

n	TOI	Forwards (r)**	Defensemen (r)**
1974/1213	500+	.17	.04
1333/784	1000+	.30	.10
782/452	1500+	.39	.13

** r being the correlation coefficient

So let's see what we got here. First off, on the left in the "n" column is the sample size ordered by forwards then defensemen for that particular regression. What we see is simple and intuitive. There is a good deal of randomness in the data and we see more of a signal for forwards than we do for defensemen. For defensemen, there doesn't seem to be much there. Even after three years it still regresses 87% to the mean. It doesn't mean it doesn't matter at all....but it doesn't seem to make much of a difference (specifically when you take into account the amount of goals you would expect these guys to score anyways).

Forwards are obviously a different story. As you can see, we do get a signal through. There still is a fine amount of noise (especially for one year). But considering, it's not bad. We have to be careful with our data and regress it. Given a few years, we can start to make an estimate of a player's shot multiplier.

That's not it though. There's still one question: What do we regress to? As shown first in this article by Eric T., we can use coaches decisions to find the right mean to regress too. Because players who average more TOI per game tend to shoot higher. Make sense. Coaches know how to evaluate talent, and will therefore give more ice time to players who are better skilled. But we can't just plot TOI/G vs. Sh% (as he notes). I would say most of the players given more ice time will be the more talented one's, but there's also the problem of coaches riding the hot hand. For example, imagine a player with regular shooting talent. But he goes off (like many do) for a portion of one year and starts scoring a ton of goals. It's possible that the coach will then bump him up a line and play him more. So getting lucky and over-exceeding one's talent will therefore be reflected in TOI/G. Because some of those players with high TOI/G will be guys who experienced a decent portion of luck and are now getting "rewarded" with more ice time. So it'll be more extreme then it actually is.

What we should do is plot TOI/G in year 1 vs. Shot Multiplier in year 2. This is because year 2 performance is a better estimation of the players talent. In year two those players who got "lucky" and were given more ice time will on average regress towards the mean, so the bias won't be there anymore (remember this also works the other way with players who got unlucky and were given less ice time). So let's look at it:

This is just for forwards. I'll spare posting the graph for defensemen as it's just a straight line. That's in line with what we know so far. Now this graph for forwards shows a clear upward slope. Those players who who get about 13+ minutes would be expected to regress to higher mean, and those below 13 minutes to a lower one. For any TOI/G all you have to do is stick it in the equation to figure out what we would expect.

Ok so there are a few stuff here, so let's look at a working example. Let's compare two players at the extremes: Steven Stamkos and Tanner Glass over the past three years.

G60 ixG60 Shot Multiplier
Steven Stamkos: 1.23 .86 1.46

Tanner Glass: .23 .40 .60

So Stamkos scores about a goal more per 60, but get knocked down about ~.4 goals per 60 in expected measures. So his multiplier is pretty big. Glass, on the other hand, is projected to score a little under double the amount of goals than he actually did. Resulting in a small multiplier. Let's now regress them.

TOI-Mean Regressed Multiplier
Steven Stamkos: 1.05 1.21

Tanner Glass: .86 .75

So the second column is the mean multiplier we would expect based on the players TOI and should therefore regress to. And the second column regresses their multiplier towards the mean using the numbers discussed in the first chart (it's three years so they are regressed 61% to the mean). Let's look at the final result

Regressed Multiplier x G60
Steven Stamkos: 1.04

Tanner Glass: .30

So the last step is just multiply expected goals by the multiplier. Remember, the multiplier is how much we would expect a player to exceed or fall short of their expected goals. So we would expect Stamkos's G60 to be a higher by a factor of 1.21, and for Glass's to be 75% of his expected G60. I'd like to add that this is only on three year data. If we used players career data and regressed that we would get a better estimate of their shot multiplier. Either way Stamkos here goes up .18 goals per 60 and Glass down .1. These are meaningful differences. These two players are at the extremes so we'll see some of the biggest differences with them, but it still should be looked at.

Conclusion

As expected goal measures become more popular, it's important to remember to "shot talent". Sh% includes both location measures and actual shooting talent. And I would say the best way to measure that in the present time is: Comparing it to results and regressing and using a coach's judgement. It's important to be careful with this, and while it won't capture everything and for a lot of players won't mean much, it's a better way of getting the full picture.

***All Data Courtesy of Corsica.Hockey

Fooled By Grittiness

Pages

Wednesday, April 13, 2016

Estimating Shooting Talent

No comments:

Post a Comment