Over the summer, I've been taking an incredibly informative course online called Sabermetrics 101. Put together by a Boston University professor, I'd recommend it to anyone looking to get an in depth peek at the world or Sabermetrics. Even if you don't want to commit that kind of time, the course is organized into different tracks, or aspects of Sabermetric knowledge, so you can pick and choose what you want to learn. For me, the most valuable parts of the course were the comp sci lessons, which focus on making database queries in mySQl, and doing data analysis/creating graphics in R Studio. What follows is ostensibly informative and hopefully interesting, but is really just me working through the R language.
The basic premise of what I'm trying to do is get a look at the basic events of the game, and see how the per-game rates shift over time, in order to see how the game has changed, and possibly gaining some understanding into what watching a game was like in past eras. All stats are per game, even if they don't say they're per-game, as I just noticed on my first graph. Whoops. The stats were gathered by averaging per game team totals for the entire league each year. Statistics are from the Lahman Database, accessed through the BU SQL Sandbox.
For my first graph I went basic and obvious. Everyone knows strikeout rates have been rising for a long time, and it's pretty damn obvious in this graph. The data is spotty and unstable, and partially non-existent up until the 20's, but the general increasing trend of K's per game begins there. The most interesting thing to be gleaned here is the hump that peaks around 1968, the year of the pitcher, and how the general increasing trend was reversed for a time, famously by the changes to the mound and strike zone after the '68 season, but just as significantly by the leagues' addition of 4 teams in '69.
From a fan's perspective, its interesting to note that the current rate of K's per game is nearly doubling the prevailing rate from the 1920's to the 1940's. This, obviously, is due in part to the increased emphasis on avoiding contact in pitching strategy, the improving talent of pitchers, and the expansion of bullpens, but it's interesting to wonder how much of the increase in Ks has to do with the de-emphasis within baseball culture of how 'bad' strikeouts are, and the acceptance of trading more K's for more power.
Moving on, I got real tricky with my R skillz by graphing homers per game, and changing the color of the line to green. I also changed the thickness of the line. Yessir, if you need the thickness of the line in your graph changed, I'm qualified. Anyway, I got another line that showed a trend that I expected. As it should, the HRs per game jumps at the start of the 20's with the emergence of Babe Ruth, the 'live' ball, and the game's new emphasis on power. The two things I found interesting were the huge dip during in dingers during the war years, which I will discuss further with my final graph, and the volatility of HRs per game during the early PED years.
The huge peak in the late 80s is 1987, in which McGwire hit 49, as did Andre Dawson. Albert Belle and Dale Murphy also hit over 40. What's interesting is how early this outlier season begins, a full half-decade or more before the beginning of the Steroid Era, depending on who you ask. In this season, as well as '86, Steroid Era numbers are replicated in an environment which is only seeing steroids enter the scene. Even Jose Canseco, who is widely known to have disseminated steroids throughout baseball, and supplied drugs to McGwire himself, acknowledges that Mark's first campaign was done clean.
So, assuming that 1987 was a fairly clean year, this season shows that a pre-steroid league was capable in that run environment of creating Steroid Era home run numbers. Even more interesting is how the Home Run rate dipped back down for a few years, even as steroids became more prevalent in baseball. Now I know that home runs aren't a comprehensive measure of offensive production, and I know I should research this a lot more in depth but... meh. I'll blindly state my claim. What am I getting at? The fact that a steroid free league replicated home run rates, and that Steroid Era home run rates began consistently in 1994, the year after the expansion of the league, Steroid Era home run rates were in large part caused by the depletion of pitching talent due to expansion, in a league whose top (clean) hitters were already capable of putting up numbers of more than a homer per game.
Obviously, there were some insane, 'roid fueled outlier performances. Barry Bonds could have hit a whiffle ball out of Pac-Bell Park in 2001. Yet the continuation of around a homer-per-game rate in the post-PED era and the HR rates in the '86 and '87 seasons further proves my point that expansion, a new generation of power-first hitters, and smaller ballparks were the root cause of the general upward shift in home run rates. Man, I should write more about this, and get some evidence and stuff.
Complete Games = Red Line Home Runs = Blue Line |
If I have to change the font back to Trebuchet one more time Blogger, I swear to god, I'll turn this car around. Anyway, two lines. TWO LINES! And more! Two vertical ones as well! The basic conception of this one was to graph something that was going to be on relatively the same scale as home runs per game. Complete games seemed like a good candidate. And an interesting question arose as to when, as a fan, did it become more likely for you to go to a game and see a home run than a complete game. There are, in fact, many answers to this question, as the lines intersect multiple times in the 30's and 40's, but the first answer is 1929, and the last is 1946, (as shown by my expertly placed text). What is most interesting about this graph, however, is the incredibly steady, slow decline of the rate of complete games. Since complete games are one statistic influenced in large part my managers, this steady decline has some intriguing implications. The fact that there are no major peaks, valleys, or deviations on the CG line reflects how difficult it is for managers to change their strategies. If a manager deviates too far from the norm either way, he will feel pressure from the press, his owner, his team's fans and his team. Keep his starters in a lot, and he's wearing down the rotation. Always pull his starters early, and he's not getting full value from his starters. This public perception is based entirely on past precedent, and not what's best for the team to win, but as the curve shows, never has the league as a whole deviated from the general trend of pulling starters gradually earlier as time advances.
Here I've got the different types of hits per game, in an effort to see the differences in watching a game in different eras. Starting on the left, I think the most interesting aspect is the huge jump in total hits, along with power numbers, from the Deadball to the Live Ball Era. This graph really shows how big of a change occurred between these two eras, as not only did homers emerge as a common occurrence, but the entire rate of hits per game increase by almost two, resulting in a hits-per-game rate never again seen in history. The second thing I found interesting was the disappearance of home runs during the War. Though the hits per game during the War era is relatively the same as the following Integration Era, the number of homers per game is much less. This says something interesting about talent in the majors. During the war, the league lost many of its best players, depleting the pitching and hitting talent. Yet the number of hits per game is exactly the same as that in the Integration Era. Why? I would argue that since both the talent of batters and pitchers got worse at about the same rate, resulting in similar Hits-per-game. The worse war-time hitters, however, did not have the raw power tools of the elite hitters that were off in the war. This shows how different hitting skills were distributed, at least in the '40s, among the player population. The hit tool was distributed incrementally, with bad players having bad hit tools, mediocre players having mediocre hit tools, etc. This is what allowed the sustaining of hits-per-game during the War Era. The power tool, however, was concentrated among a small group of best players in the MLB, and when these players went off to the war, the home runs dried up. For further proof, notice how the number of singles and doubles per game during the War Era his higher than either the Live Ball or Integration Eras. This demonstrates how the second-rate players during the war years were able to hit for average well against inferior pitching, but simply did not have the raw power ability to hit many home runs.
Thanks for reading, I hope my ramblings didn't distract from the interesting data on display. I didn't even get to everything I wanted, like the loss of a full single per game over time, but this is getting way too long, so I guess that's it for now.
No comments:
Post a Comment