Batting Stats 5: wOBA

Here it is- wOBA.  I’ve been building up to this for a while and I’m excited to get going.  For the full details on wOBA, check out Fangraphs‘ page.  The short version is that wOBA takes On Base Percentage and weights each of the components based on the run expectancy for each of the component events in each league year.  It’s a very good measure of the offensive contribution of a player in a specific league setting.  It does not normalize the data across years and ballparks – we will get to some stats that do that later on.  But it does acknowledge that a double is worth more than a walk – and it quantifies how much more.

The formula takes the woba weights that we calculated in Run Table 2 and multiplies them by the number of walks, singles, doubles, etc. that a player accumulated during a season:

, round((r.wobaBB*b.bb + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
       r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
       /(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba

I’ve already plugged this formula into the table and I’m getting results.  Before I start comparing my results to the game and pulling my hair out, I am going to establish success criteria.  I mentioned in the Run Environment posts that our formulae contain constants that I don’t fully understand.  Moreover, I don’t know if the game uses the same constants.  So, to expect a perfect match between my database and the game may be unreasonable.

Let’s think about what a reasonable margin of difference would be.  Any difference less than 5 points is negligible.  I can’t really make a distinction between two players with wOBAs of .320 and .325.  They’re essentially equal.  Can I say the same thing about 20 points?  No.  There’s clearly a difference between .320 and .340.  What about 10?  Iffy.  If I want to give myself as much wiggle room as possible without losing faith in my metrics, I can’t go higher than 10.  Let’s set 10 as the absolute limit and see what we get.   I am going to take a sample of 35 player years and see how close I get:

So, that’s 21 where the difference is less than 10 points.  7 where the difference is between 10 and 20 points, and 7 where the difference is greater than 20.  Not perfect – 40% outside of the range I set for myself.  However, still better than I feared, and actually better than it looks.  For example, Dan Wasielewski’s 2012 season is off by 20 points.  But that 2012 season only had 6 plate appearances.  I am totally OK having wacky numbers for very small sample sizes.  The greatest difference (.029), similarly came from a season with barely over 100 plate appearances.

Without digging in to how to determine my own constants and without being able to pick Markus’s or Matt’s brains on how they derive these formulae, I think I’m pretty OK with my stats.  Until I decide to relentlessly pursue perfection, that is…

Leave a Reply

Your email address will not be published. Required fields are marked *