## Tuesday, April 29, 2014

### When does Small Sample Size stabilize - Hitters K%

As Aaron has touched on already, early in the season it's easy to get wrapped up in a poor start and think the sky is falling and forget that there's still a lot of season left for things to turn around. So, when do we reach the point where what we see is what we get? Is it a third of the way through the season, as we've heard the broadcast team say so many times already this season? Is it a month? Do we wait until the All-Star break? Well, at the risk of sounding like a cop-out, it depends. Thankfully, people much smarter than us have done the research and found the following stabilization points for various rate stats. You can't argue with science, right?

Baseball Prospectus has an excellent post that gets into the messy, mathy stuff, if you're so inclined. If you don't have a B-P subscription what's wrong with you FanGraphs has a more concise post on the same research. Here's the gist, lifted directly from FanGraphs:

• 60 PA: Strikeout rate
• 120 PA: Walk rate
• 240 PA: HBP rate
• 290 PA: Single rate
• 1610 PA: XBH rate
• 170 PA: HR rate
• 910 AB: AVG
• 460 PA: OBP
• 320 AB: SLG
• 160 AB: ISO
• 80 BIP: GB rate
• 80 BIP: FB rate
• 600 BIP: LD rate
• 50 FBs: HR per FB
• 820 BIP: BABIP
• 70 BF: Strikeout rate
• 170 BF: Walk rate
• 640 BF: HBP rate
• 670 BF: Single rate
• 1450 BF: XBH rate
• 1320 BF: HR rate
• 630 BF: AVG
• 540 BF: OBP
• 550 AB: SLG
• 630 AB: ISO
• 70 BIP: GB rate
• 70 BIP: FB rate
• 650 BIP: LD rate
• 400 FB: HR per FB
• 2000 BIP: BABIP
Generally speaking, until we reach the stabilization point for a given rate stat, we can expect a player to "regress" towards their career averages going forward. With those numbers in mind, I'm starting a series that will look at these stats as we reach the above listed thresholds to see what, if any, meaningful numbers pop out.

Today we'll look at K% for hitters, as seven Astros have reached 60 PA and another four are between 50 and 60.

 2013 K% 2014 K% Difference Career MLB Average During Career Altuve 12.7% 7.8% -4.9% 11.9% 19.5% Dominguez 16.3% 20.2% 3.9% 16.6% 19.5% Fowler 21.3% 22.3% 1.0% 22.3% 18.8% Carter 36.2% 35.9% -0.3% 34.8% 19.3% Castro 26.5% 27.8% 1.3% 23.5% 19.3% Villar 29.5% 28.2% -1.3% 29.1% 20.0% Presley 20.0% 19.1% -0.9% 19.3% 19.3% Krauss 30.8% 30.5% -0.3% 30.7% 20.0% Grossman 24.3% 30.9% 6.6% 25.4% 20.0% Springer 27.3% 28.3% 1.0% 28.3% 20.8% Guzman 24.8% 33.3% 8.5% 21.7% 19.0%
I threw career and MLB average rates in there for context. Also, since there's no history at the major league level, Springer's 2013 is from the minors. As we can see, most everyone is around 1% of last season and pretty close to their career average, with a few exceptions.

Altuve stands out for very positive reasons. He's cut way back on swings on pitches outside the zone this season, which is good as he's seeing fewer first-pitch strikes. Getting ahead in the count early is forcing pitchers to throw more hittable pitches in the zone.

Dominguez is on the other side of the pendulum. He's swinging at more pitches outside the zone, fewer pitches inside the zone, and making less contact on any of them. He's seeing a high percentage of first pitch strikes, getting behind early, and hasn't been able to make adjustments during the at bat.

Grossman is particularly worrisome. After seeming to figure things out after rejoining the club in July last season, he's seemingly gotten very passive. He's swinging at fewer pitches, and making less contact when he does swing. It was often noted after his return last year that he was more aggressive at the plate; it would seem that his struggles again relate to being too passive.

Which leaves us with Guzman. Guzman's K% has gone up every season he's been in the big leagues. This season his contact rate is actually up, but he's swinging at far fewer pitches, particularly in the zone. Opposing pitchers have already clued into this, and they're pumping 71% strikes in on the first pitch, putting Guzman behind from the get-go.

Anonymous said...

Sometimes, stats are just descriptions of more basic phenomenon, and more insight is gained by not using stats. For example, if a batter utilizes a new approach, gains a new skill or refines an existing skill, stat watchers will miss the new phenomenon as they focus on the expected "regression" to the old player.

The Batguy said...

I don't disagree, but you still need to reach a decent sized number of plate appearances before you can have real idea of the effects of that change.

That's also why I went a little deeper on the four players I highlighted to see if anything changed in their underlying plate discipline numbers. Grossman and Guzman, in particular, are taking more pitches but their lack of aggression is putting them in difficult counts where the pitcher gains the advantage.

Anonymous said...

True. To me, your statement using stats on Grossman and Guzman is an example where stats provide interesting insights or points of discussion, both because it is insightful by nature and because it doesn't involve the mystical regression concept.

The Batguy said...

I don't think regression needs to be mystical. At it's core, the concept simply means that all players have an inherent talent level that, given enough opportunities, they will reach. Our friends over at The Crawfish Boxes have a good post on that that just went up today.

But, like you said, if a player has made some significant change in their approach, that could result in a shift in talent level, in which case regression could be a little fuzzy.

(Not Hank) Aaron said...

Jose Bautista is a good example. Between 2009 and 2010, Bautista changed his approach and his batted ball profile changed significantly. He started hitting more balls in the air, and more of those were going for home runs. At the beginning of the season, you could chalk those up to SSS and expect regression. Thats why understanding when stats stablize is so important. Because they continued after those rates stablized, you can reasonably expect that to be a change in true talent level, which he has confirmed over the last few years.

Anonymous said...

Oh, I expect lots of stats people expected Bautista to regress, even after the period of stat stabilization. However, people with knowledge who watched him knew that the Bautista they were observing had nothing in common with the historical statistical record.

And, to the Batguy. I agree that that is a higher understanding of regression. What I might quibble with, in some instances, is an assumption of "inherent ability." I also think you are being charitable to many who use the concept of regression. Many people use regression as a almost mythical force drawing phenomenon to an asserted norm. It's not. It is nothing more than someone's expectation of what is probable.

The Batguy said...

I think both statheads and scouts would agree with some sort of inherent ability. Statheads point to, well, stats. Scoutheads (new term!) will point say things like "projectable" or compare a player's style to another, well-established player. Both are giving their idea of how a player should be expected to perform.

As to how others use regression, I have no control over that. For myself, I don't think of it as mythical or mysterious. It's simply another tool to use to analyze a player's production; using past performance to help get a reasonable expectation of future performance.

Really, the concept of regression isn't new. How many times, before statistical analysis got so big, did we hear that someone's "due" or "hot"? Regression's just another way to say a guy probably won't keep this up, good or bad, for long.