Fooled By Grittiness: April 2016

Just over five years ago Eric Tulsky, who now works for the Carolina Hurricanes, published this article on assists. As you can see, he advocates dropping the secondary assist due to the amount of randomness in it (to be fair, I don't know his stance after publishing that). This is an older article, so I feel like I must as well update the numbers and the analysis and see what we find.

So I'm going to be using numbers from the past six years (2010-2016). I'm just going to do a simple year over year analysis like Eric did, but I'll split up between forwards and defensemen (I tried splitting up forwards between center and wingers but there was virtually no difference). Also the minimum amount players needed to have qualified for the analysis was 400 minutes for both years.

Repeatability

n	Position	A160	A260	A60
1402	Forwards	0.36	0.19	0.43
836	Defensemen	0.20	0.03	0.24

Here are the correlation coefficients for each metric. And I would say this all makes sense. First assists for both are more reliable than secondary assists. More so for forwards than for defensemen, for whom it's almost virtually random. And this is reflected in the total assists numbers. For defensemen it's only slightly higher than first assists. For forwards we see a a little bit more there.

Of course this isn't it. Let's split up between players who stayed on the same team versus those who switched teams. As Eric T. noted, teammates play a role in assists, so numbers on players who change teams might be closer to the truth. So let's see:

n	Repeatability	A160	A260	A60
1020	Forwards-same	0.36	0.21	0.44
382	Forwards-Diff.	0.32	0.10	0.35
	Difference	-0.04	-0.11	-0.09
604	Defensemen-same	0.24	0.12	0.26
232	Defensemen-Diff.	0.06	0.02	0.11
	Difference	-0.18	-0.10	-0.15

And you can see that in all categories, players who stayed on the same team showed better persistence. Let's look at forwards: For forwards, first assists goes slightly down. But you really see it in secondary assists, where it takes a bit of a tumble. And this of course results in assists as a whole grading out worse for forwards who change teams. Defensemen, on the other hand, really have their numbers lose a lot. Secondary assists are virtually random when changing teams. And surprisingly first assists take a big hit too. I really didn't expect that much of a difference.

One final thing though, let's look at what better predicts next years A60. First assists or total assists.

n	Predictivity	A160	A60
1020	Forwards-same	0.40	0.44
382	Forwards-Diff.	0.33	0.35
1402	All Forwards	0.39	0.43

604	Defensemen-same	0.27	0.26
232	Defensemen-Diff.	0.08	0.11
836	All Defensemen	0.23	0.24

As you would imagine total assists edges out first assists in all cases except, oddly, defensemen who stay on the same team. I won't put much thought into that, it's nothing. Also, the edge in each case is really small. I would imagine that this being only one year worth of data contributes to this. As you would imagine, after getting a few years of playing time we can make a better estimate of secondary assists.

Conclusion

The numbers here are, overall, close to the one's shown by Eric T. five years back. And I think the best bet when looking at assists is focusing primarily on primary assists. But secondary assists still matter a little. One year of assists tells us more than primary assists. Not by much, but there is something. And given a few years, we'll get a better judge of a players secondary assist "talent". Just looking at the leaderboards for secondary assist over the past few years will tell you that it means something. Secondary assists may contain a lot of randomness, but they still matter. The gain over primary assists is minimal, but they shouldn't just be discarded as noise (just mostly noise).

**All data courtesy of Corsica.hockey

***Update:

This is a good article on secondary assist-http://fivethirtyeight.com/features/some-nhl-stars-get-more-assists-at-home-than-they-deserve/. The next step would probably be adjusting for rink bias. Also this

@tangotiger SD of z-scores is 1.02, histogram suggests moderate skew/shift right pic.twitter.com/E2bf1jQYCH
— Michael Lopez (@StatsbyLopez) April 22, 2016

is a good chart showing how small the spread of talent is in secondary assists. Even though it would have been better to do a separate chart for forwards and defensemen.

I guess the first question here is, what do I mean when I say shooting talent? That's a fair question. To start off, let's think about shot quality. Shot quality is the probability of a shot going in. This quality is made up of a lot of different things. The distance of the shot from the goal, the angle, the type of shot, if it's a rebound, if it's a rush shot, the score.....etc. Another, sometimes overlooked, component is the the actual shooting talent of the shooter in question. Meaning in hockey parlance, "How good is his shot?". But how do we do this? (Dtmaboutheart's model tried to take this into account in his expected goals model here by regressing sh%, but I disagree with that method)

I think the best way to start to answer this, is to think about what makes up a player's shooting percentage. This as opposed to looking at sh%, because straight sh% isn't exactly shooting talent. A player could have a high sh% because of the quality of shots he takes or because of his actual shooting ability. Nevertheless, I think it's roughly like this:

Observed Sh%= Shot Quality* + Shooter Talent + Randomness

*When shot quality equals to the stuff I talked about in the first paragraph

So what we need to do: Is estimate the shot quality for each player, and regress for randomness and what should be left is our best estimate of a player's shooting talent. And the way to do this, is I think simple. First we'll estimate the quality using the expected goal model developed by Emmanuel Perry over at Corsica.hockey. And then we just do is:

Goals/Expected Goals= Shot Multiplier

Let's think about this. Let's compare two players: Both have the same expected goals but one exceeds in actual goals. So why is one player doing better than the other. Is this because of shooting talent or just randomness? If it's something real we would expect this measure to show some repeatability. If a player consistently scores more goals then we would expect, something is probably going one. But how much? This is obviously a sensitive measurement, so we need to be careful if we are measuring anything real or just pure randomness. So we'll run a regression.

First we have to split up forwards and defensemen. Those are two different positions and we should obviously expect forwards to have more talent here. And I'm going to run three regressions for each group. The first is a regular year over year regression, limited to players who have 500+ TOI in both seasons. But one year for this isn't the best judge. There is bound to be a good deal of randomness here, and it'll probably take a few years to get a better read. So my other two regressions will be 2 years vs. 2 years, and 3 vs. 3. The cutoff being 1000, and 1500 minutes respectively. These numbers are arbitrary and one can play around a little, but it still leaves us with a fine sample size for the analysis. Also there are obviously better ways to conduct this analysis, but this is a quick and easy way to do so. So here you go (the numbers are 5v5 data from 2007-2008- 2015-2016).

n	TOI	Forwards (r)**	Defensemen (r)**
1974/1213	500+	.17	.04
1333/784	1000+	.30	.10
782/452	1500+	.39	.13

** r being the correlation coefficient

So let's see what we got here. First off, on the left in the "n" column is the sample size ordered by forwards then defensemen for that particular regression. What we see is simple and intuitive. There is a good deal of randomness in the data and we see more of a signal for forwards than we do for defensemen. For defensemen, there doesn't seem to be much there. Even after three years it still regresses 87% to the mean. It doesn't mean it doesn't matter at all....but it doesn't seem to make much of a difference (specifically when you take into account the amount of goals you would expect these guys to score anyways).

Forwards are obviously a different story. As you can see, we do get a signal through. There still is a fine amount of noise (especially for one year). But considering, it's not bad. We have to be careful with our data and regress it. Given a few years, we can start to make an estimate of a player's shot multiplier.

That's not it though. There's still one question: What do we regress to? As shown first in this article by Eric T., we can use coaches decisions to find the right mean to regress too. Because players who average more TOI per game tend to shoot higher. Make sense. Coaches know how to evaluate talent, and will therefore give more ice time to players who are better skilled. But we can't just plot TOI/G vs. Sh% (as he notes). I would say most of the players given more ice time will be the more talented one's, but there's also the problem of coaches riding the hot hand. For example, imagine a player with regular shooting talent. But he goes off (like many do) for a portion of one year and starts scoring a ton of goals. It's possible that the coach will then bump him up a line and play him more. So getting lucky and over-exceeding one's talent will therefore be reflected in TOI/G. Because some of those players with high TOI/G will be guys who experienced a decent portion of luck and are now getting "rewarded" with more ice time. So it'll be more extreme then it actually is.

What we should do is plot TOI/G in year 1 vs. Shot Multiplier in year 2. This is because year 2 performance is a better estimation of the players talent. In year two those players who got "lucky" and were given more ice time will on average regress towards the mean, so the bias won't be there anymore (remember this also works the other way with players who got unlucky and were given less ice time). So let's look at it:

This is just for forwards. I'll spare posting the graph for defensemen as it's just a straight line. That's in line with what we know so far. Now this graph for forwards shows a clear upward slope. Those players who who get about 13+ minutes would be expected to regress to higher mean, and those below 13 minutes to a lower one. For any TOI/G all you have to do is stick it in the equation to figure out what we would expect.

Ok so there are a few stuff here, so let's look at a working example. Let's compare two players at the extremes: Steven Stamkos and Tanner Glass over the past three years.

G60 ixG60 Shot Multiplier
Steven Stamkos: 1.23 .86 1.46

Tanner Glass: .23 .40 .60

So Stamkos scores about a goal more per 60, but get knocked down about ~.4 goals per 60 in expected measures. So his multiplier is pretty big. Glass, on the other hand, is projected to score a little under double the amount of goals than he actually did. Resulting in a small multiplier. Let's now regress them.

TOI-Mean Regressed Multiplier
Steven Stamkos: 1.05 1.21

Tanner Glass: .86 .75

So the second column is the mean multiplier we would expect based on the players TOI and should therefore regress to. And the second column regresses their multiplier towards the mean using the numbers discussed in the first chart (it's three years so they are regressed 61% to the mean). Let's look at the final result

Regressed Multiplier x G60
Steven Stamkos: 1.04

Tanner Glass: .30

So the last step is just multiply expected goals by the multiplier. Remember, the multiplier is how much we would expect a player to exceed or fall short of their expected goals. So we would expect Stamkos's G60 to be a higher by a factor of 1.21, and for Glass's to be 75% of his expected G60. I'd like to add that this is only on three year data. If we used players career data and regressed that we would get a better estimate of their shot multiplier. Either way Stamkos here goes up .18 goals per 60 and Glass down .1. These are meaningful differences. These two players are at the extremes so we'll see some of the biggest differences with them, but it still should be looked at.

Conclusion

As expected goal measures become more popular, it's important to remember to "shot talent". Sh% includes both location measures and actual shooting talent. And I would say the best way to measure that in the present time is: Comparing it to results and regressing and using a coach's judgement. It's important to be careful with this, and while it won't capture everything and for a lot of players won't mean much, it's a better way of getting the full picture.

***All Data Courtesy of Corsica.Hockey

Fooled By Grittiness

Pages

Friday, April 15, 2016

To Secondary Assist or not to Secondary Assist

Wednesday, April 13, 2016

Estimating Shooting Talent