Batting Stats 9: wRAA

Weighted Runs Above Average (wRAA) takes Weighted Runs Created (wRC) and puts it in the context of comparing it to the average player.  Below, we see one of our test subjects, Jorge Acosta.

In 2015, Acosta accumulated 56.9 wRC.  Is that good?  As with all counting stats, it depends.  Like other counting stats, the number of plate appearances matters.  (I know I have AB shown here, but bear with me.) A player who racks up 50 RBIs over a full season isn’t terribly impressive.  One who does it in 200 plate appearances is amazing.  Same with wRC.  Fangraphs gives us some general guidelines on what’s good, great, average, etc. over the course of a full season.  Still, that’s a number that is devoid of any context.  How does that stack up against other players in the league?

That’s where wRAA comes in.  League average wRAA is always 0.  So, looking at Acosta with his -7.3 wRAA, we see that his 56.9 wRC is not impressive at all: 7% below league average.  In 2015, Acosta played like a scrub.

We derive wRAA thusly:
(wOBA-lg_woba)/wOBAscale)*PA

It does, and should, look very similar to wRC.

My testing paramaters are based on the idea that there’s generally around 10 points separating the middle categories of poor to above average.  In order that my stats stay close to the game’s, I started off with a threshold of 3 points.  All of my results were within that range, so I tightened up a bit.  I changed my “good” zone to 1.0.  Here are my results:

Again, I’m really happy with how this turned out.  I don’t think I need to do any tweaking or further investigation.  Ready to move on!

Our script with the addition of wRAA:

#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
    , b.ci
    , @BA := round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
    , round(100*(@OBP/r.woba),0) as OBPplus
    , @SLG := round((b.h+b.d+2*b.t+3*b.hr)/b.ab,3) as slg
    , round(@OBP+@SLG,3) as ops
    , round(@SLG-@BA,3) as iso
    , round((b.h-b.hr)/(b.ab-b.k-b.hr+b.sf),3) as babip
    , @woba := round((r.wobaBB*(b.bb-b.ibb) + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
       r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
       /(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba
    , round(((@woba-r.woba)/r.wOBAscale)*@PA,1) as wRAA
    , round((((@woba-r.woba)/r.wOBAscale)+(lro.totr/lro.totpa))*@PA,1) as wRC
    /* NOT yet modified for OOTP and MySQL
    , ((([wRAA]/[PA] + l.RperPA) + (l.RperPA - t.bpf*l.RperPA))/(lb.wRC/lb.PA))*100 AS [wRC+]
    */
    FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
      INNER JOIN vLeagueRunsPerOut lro ON b.year=lro.year AND b.league_id=lro.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

Batting Stats 8: wRC

Weighted Runs Created (wRC) is a counting stat that sums the weighted total offensive production of a player in terms of runs.  It tells you how many runs a player produced through his offense in a given time period.  It’s based on the same linear weights as wOBA.  In fact, it’s derived completely from wOBA.  Essentially, it takes the difference between the player’s wOBA and the league wOBA divided by the wOBAscale, plus the league runs per PA, and multiplies it by the player’s plate appearances:

round((((@woba-lg_woba)/wOBAscale)+(lg_.totr/lg_.totpa))*PA,1)

The results I got in my testing were pretty good.  This makes some sense as my wOBA results are also pretty good, and this stat is derived from that one.  Still, it’s a bit of a relief.  I decided that my testing threshold for this stat would be 3.5.  Fangraphs suggests that increments of 10 wRC separate the awful from the poor, and the average from the above average.  I figure that by keeping my threshold well below half of that, I can be reasonably confident that my stats will be in the right neighborhood as those the game generates.

I’m really pretty pleased with this.  Note that this particular sample is pretty representative, and doesn’t contain any players that are at the top of the league.  The fringy cases may still be problematic, but I am good to move on.

The script as it stands now:

#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
    , b.ci
    , @BA := round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
    , round(100*(@OBP/r.woba),0) as OBPplus
    , @SLG := round((b.h+b.d+2*b.t+3*b.hr)/b.ab,3) as slg
    , round(@OBP+@SLG,3) as ops
    , round(@SLG-@BA,3) as iso
    , round((b.h-b.hr)/(b.ab-b.k-b.hr+b.sf),3) as babip
    , @woba := round((r.wobaBB*(b.bb-b.ibb) + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
       r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
       /(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba
    , round((((@woba-r.woba)/r.wOBAscale)+(lro.totr/lro.totpa))*@PA,1) as wRC
    /* NOT yet modified for OOTP and MySQL
    , ((([wRAA]/[PA] + l.RperPA) + (l.RperPA - t.bpf*l.RperPA))/(lb.wRC/lb.PA))*100 AS [wRC+]
    */
    FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
      INNER JOIN vLeagueRunsPerOut lro ON b.year=lro.year AND b.league_id=lro.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

Batting Stats 7: wOBA Corrected

I think I found one of the culprits: Intentional Walks.  Our two guys, Sink and McDonough, have IBB’s in that season well over the average.  I queried the database to find their IBB’s as well as league average for those seasons:

SELECT b.player_id
  , concat(p.first_name, " ", p.last_name) as player 
  , l.abbr
  , b.year
  , b.league_id as league 
  , b.ibb
  , x.lg_avg_ibb
  , b.ibb-x.lg_avg_ibb as diff
FROM (  	
	SELECT b.year
	 , b.league_id
	 , round(avg(b.ibb),0) as lg_avg_ibb
	FROM CalcBatting b
	INNER JOIN players p ON b.player_id-p.player_id
	WHERE b.ab>200 AND p.position<>1 
	GROUP BY b.year, b.league_id
	) as x
  INNER JOIN CalcBatting b ON x.year=b.year AND x.league_id=b.league_id
  INNER JOIN players p ON b.player_id=p.player_id
  INNER JOIN leagues l ON b.league_id=l.league_id
 WHERE b.player_id IN (2574, 472) AND b.year=2015;

You’ll see that both were well above average:

Sink is about 1 standard deviation above average while McDonough is off the charts at almost 6.

I took a look at the 10 highest IBB variances to see if we can say with any certainty that we’re handling IBB’s incorrectly:

Looking at this, I think it’s pretty clear that if nothing else, IBB is at least contributing to the differences in wOBA by weighting it too highly.

It turns out that it’s not necessarily the weighting, but rather for formula for wOBA itself.  I removed IBB from the formula and recalculated wOBA for these 10 player-seasons:

Every single one of them improved.  Not yet perfect, but now that I have the outliers at least within the medium range, I feel comfortable moving on for real.  So, here’s the revised script for CalcBatting with the corrected formula for wOBA:

#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
    , b.ci
    , @BA := round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
    , round(100*(@OBP/r.woba),0) as OBPplus
    , @SLG := round((b.h+b.d+2*b.t+3*b.hr)/b.ab,3) as slg
    , round(@OBP+@SLG,3) as ops
    , round(@SLG-@BA,3) as iso
    , round((b.h-b.hr)/(b.ab-b.k-b.hr+b.sf),3) as babip
    , round((r.wobaBB*(b.bb-b.ibb) + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
       r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
       /(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba
    /* NOT yet modified for OOTP and MySQL
    , round((([woba]-r.lgwoba)/r.wobascale)*[PA],1) as wRAA
    , round(((([woba]-r.lgwoba)/r.wobascale)+(l.totr/l.totpa))*[PA],0) as wRC
    , ((([wRAA]/[PA] + l.RperPA) + (l.RperPA - t.bpf*l.RperPA))/(lb.wRC/lb.PA))*100 AS [wRC+]
    */
    FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

Batting Stats 5: wOBA

Here it is- wOBA.  I’ve been building up to this for a while and I’m excited to get going.  For the full details on wOBA, check out Fangraphs‘ page.  The short version is that wOBA takes On Base Percentage and weights each of the components based on the run expectancy for each of the component events in each league year.  It’s a very good measure of the offensive contribution of a player in a specific league setting.  It does not normalize the data across years and ballparks – we will get to some stats that do that later on.  But it does acknowledge that a double is worth more than a walk – and it quantifies how much more.

The formula takes the woba weights that we calculated in Run Table 2 and multiplies them by the number of walks, singles, doubles, etc. that a player accumulated during a season:

, round((r.wobaBB*b.bb + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
       r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
       /(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba

I’ve already plugged this formula into the table and I’m getting results.  Before I start comparing my results to the game and pulling my hair out, I am going to establish success criteria.  I mentioned in the Run Environment posts that our formulae contain constants that I don’t fully understand.  Moreover, I don’t know if the game uses the same constants.  So, to expect a perfect match between my database and the game may be unreasonable.

Let’s think about what a reasonable margin of difference would be.  Any difference less than 5 points is negligible.  I can’t really make a distinction between two players with wOBAs of .320 and .325.  They’re essentially equal.  Can I say the same thing about 20 points?  No.  There’s clearly a difference between .320 and .340.  What about 10?  Iffy.  If I want to give myself as much wiggle room as possible without losing faith in my metrics, I can’t go higher than 10.  Let’s set 10 as the absolute limit and see what we get.   I am going to take a sample of 35 player years and see how close I get:

So, that’s 21 where the difference is less than 10 points.  7 where the difference is between 10 and 20 points, and 7 where the difference is greater than 20.  Not perfect – 40% outside of the range I set for myself.  However, still better than I feared, and actually better than it looks.  For example, Dan Wasielewski’s 2012 season is off by 20 points.  But that 2012 season only had 6 plate appearances.  I am totally OK having wacky numbers for very small sample sizes.  The greatest difference (.029), similarly came from a season with barely over 100 plate appearances.

Without digging in to how to determine my own constants and without being able to pick Markus’s or Matt’s brains on how they derive these formulae, I think I’m pretty OK with my stats.  Until I decide to relentlessly pursue perfection, that is…

Batting Stats 4: BABIP

We’ll look at one last easy stat before the hard stuff.

BABIP is a very interesting stat because it acts as a kind of sanity check on all of the other stats a player produces.  It shows Batting Average on Balls in Play and average in most professional leagues is around .300.  A player with a career average well above that generally hits the ball hard.  A player with a career BABIP well below that mark often makes weak contact.  That’s an OK measure in itself, but it serves a much better purpose. If, during the course of a season, a player is putting up exceptional numbers, checking his BABIP against his career average will give you a sense of whether he’s getting lucky and likely to regress or if he’s turned a corner and upped his game.

The formula for BABIP is pretty straightforward:
(H - HR) / (AB - K - HR + SF)

The whole Megillah up to this point, then, is:

#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
    , b.ci
    , @BA := round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
    , round(100*(@OBP/r.woba),0) as OBPplus
    , @SLG := round((b.h+b.d+2*b.t+3*b.hr)/b.ab,3) as slg
    , round(@OBP+@SLG,3) as ops
    , round(@SLG-@BA,3) as iso
    , round((b.h-b.hr)/(b.ab-b.k-b.hr+b.sf),3) as babip
    /* NOT yet modified for OOTP and MySQL
    , round((r.wobaBB*nz(b.bb,0) + r.wobahb*nz(b.hbp,0) + r.woba1b*(b.h-b.[2b]-b.[3b]-b.hr) + r.woba2b*b.[2b] + r.woba3b*b.[3b] + r.wobahr*b.hr)/(b.ab+nz(b.bb,0)-nz(b.ibb,0)+nz(b.sf,0)+nz(b.hbp,0)),3) as woba
    , round((([woba]-r.lgwoba)/r.wobascale)*[PA],1) as wRAA
    , round(((([woba]-r.lgwoba)/r.wobascale)+(l.totr/l.totpa))*[PA],0) as wRC
    , ((([wRAA]/[PA] + l.RperPA) + (l.RperPA - t.bpf*l.RperPA))/(lb.wRC/lb.PA))*100 AS [wRC+]
    */
    FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

 

Batting Stats 2: Correction

I’ve been shown the light!  Matt Arnold, friendly neighborhood OOTP dev, reminded me that the calculation for OBP isn’t quite what I had remembered.  In fact, the denominator isn’t Plate Appearances.  Rather, it’s PA minus Sacrifice Hits minus Catcher Interference.  Having made that correction, OBP now ties with the game.  Our new code is:

#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
    , b.ci
    , round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
    , round(100*(@OBP/r.woba),0) as 'OBP+'
    /* NOT YET CONVERTED TO OOTP AND MYSQL
    , round((b.h+[b.2b]+2*b.[3b]+3*b.hr)/b.ab,3) as slg
    , [obp]+[slg] as ops
    , [slg]-[avg] as iso
    , round((r.wobaBB*nz(b.bb,0) + r.wobahb*nz(b.hbp,0) + r.woba1b*(b.h-b.[2b]-b.[3b]-b.hr) + r.woba2b*b.[2b] + r.woba3b*b.[3b] + r.wobahr*b.hr)/(b.ab+nz(b.bb,0)-nz(b.ibb,0)+nz(b.sf,0)+nz(b.hbp,0)),3) as woba
    , round((([woba]-r.lgwoba)/r.wobascale)*[PA],1) as wRAA
    , round(((([woba]-r.lgwoba)/r.wobascale)+(l.totr/l.totpa))*[PA],0) as wRC
    , ((([wRAA]/[PA] + l.RperPA) + (l.RperPA - t.bpf*l.RperPA))/(lb.wRC/lb.PA))*100 AS [wRC+]
    */
    FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

Batting Stats 2: Easy Calculated Stats

I’ll be adding to the scripts started yesterday working towards a complete script for all batting stats.  I’ll post the relevant bits of the code I’m working on in the main body and post the complete script as it stands beneath the fold.

I’ll be tackling the easy calculated stats in this post, starting with batting average.  Easy and not worth a discussion.

Next are K% and BB%.  These stats measure the percentage a player’s plate appearances end in a strikeout or walk, respectively.

I made an interesting discovery while coding these easy stats.  As PA is a calculated stat, I was not looking forward to repeating the definition of PA (AB+BB+IBB+HP+SF+SH). A minor annoyance, but for a reason that promises larger annoyances down the road.  I thought about how, in Wyers’s Run Value tables, we used variables that were defined and used in expressions in the same statement.  I figured it was worth a try.  I created a view with PA defined as @PA := AB+BB+IBB+HP+SF+SH AS PA and used @PA as the denominator in the expressions. I got an error telling me I couldn’t do that. Figures. But why? It worked for Wyers – so what was the difference? The only one I could think of was that he used his variables in creating a table and I used mine creating a view. So, I switched from a view to a table. It worked! Here are the relevant lines of code:

DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

   SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb+b.ibb)/@PA,3) as bbrate
   FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON      b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
   WHERE b.ab<>0 AND b.split_id=1
   ORDER BY b.player_id, b.year

 

On Base Percentage is next and, since I’ll be using that stat again in this statement, I decided to define it as a variable as well: @OBP := round((b.h + b.bb + b.hp)/@PA,3) AS obp. This worked nicely, giving me identical results to when done without a variable.

Next up is OBP+.  Not a very often used stat, and not present in OOTP, but I like it.  It presents a player’s OBP in relation to the rest of the league average which is set to 100.  So, a player with a 103 OBP+ is 3% higher than league average.  wOBA is always relative to what the current league OBP is, so for any nuanced understanding of a player’s wOBA you need to have an understanding of what the current league OBP is.  Don’t get me wrong, wOBA is one of the first offensive stats I look at, but for a quick glance at the rate a player gets on base vs the rest of the world, OBP+ is great.

The formula for OBP+ is: 100 * (On Base Percentage / League On Base Percentage).

Remember, we calculated League wOBA in the Intermediate Run Values Table and referenced it in the table RunValues2, so we can call that up pretty easily. We’ll have to add another join to our statement, and that will give us:

DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/@PA,3) AS obp
    , round(100*(@OBP/r.woba),0) as 'OBP+'
FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

I’m going to stop here for now as I’ve hit a little snag.  My calculations for OBP aren’t matching up 100% against the ones that the game gives us.  Only off a couple points for full-season samples but much more with smaller ones.  I’d like to at least understand what’s happening, if not correct it, before moving on much further.  I posted about it here, and maybe the community can help me figure out the issue.  Spot checking a few players in both my database and the game, evaluating it in Excel has shown me that my database is calculating the formula correctly.  So, maybe the game is using a different formula?

Continue reading “Batting Stats 2: Easy Calculated Stats”

Batting Stats 1: The Easy Stuff

The goal here is to create a table (or view) for all offensive stats for each player-year-stint-split.  Let’s go back and parse that briefly:  Stats are collected for each player who accumulates them.  Each player gets his own row.  For each year that a player accumulates stats, a new row of data is created for that player.  For each team that a player plays in a given year (stint), a new row of data is created for that player.  Stats are accumulated and placed into three splits for each player-year-stint: Overall, vs. Left, and vs. Right.

I will most likely create views for each split, but for the moment I am going to focus only on Overall (split_id=1).

As this view is where I have gotten tripped up in the past, I’m going to take it slowly here, checking against game data periodically.  So, easy bit first: Let’s get the counting stats and the more traditional rate stats out of the way:

#Calculated batting stats for OOTP
    CREATE OR REPLACE VIEW CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove this
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , b.ab+b.bb+b.sh+b.sf+b.hp as PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
        FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

We’ll pick a random player from in-game and make sure we line up.  Let’s pick player #14 from 2015 and compare to OOTP.  Not expecting any problems on the easy stuff.  Still, just to be sure.

Output from calcbatting
Frank Garcia

So far so good.  Next post will finish up with the standard stats and some of the easier advanced stats.

Tables 9: Individual Batting Stats, States, and Team Affiliations

Three small tables (at least as far as column count goes) that don’t have impact on my first goal of producing a lot of statistics.  However, I do see some use for them down the road.

Creating players_individual_batting_stats table

This one pairs batters and pitchers (‘player_id’ and ‘opponent_id’, respectively) and sums the results of each AB in terms of Hits and HR.  Significantly, this is not Plate Appearances and, therefore, does not include Walks, Sacrifice Hits, or Sacrifice Flies.

CREATE TABLE `players_individual_batting_stats` (
  `player_id` int(11) NOT NULL, #Batter
  `opponent_id` int(11) NOT NULL, #Pitcher
  `ab` smallint(6) DEFAULT NULL,
  `h` smallint(6) DEFAULT NULL,
  `hr` smallint(6) DEFAULT NULL,
  PRIMARY KEY (`player_id`,`opponent_id`),
  INDEX `pibs_ix1` (`opponent_id`)    
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

Creating states table

Logically, this table belongs further up with nations and cities.  It simply defines states, which reside in nations.  Cities defines cities, which reside in states, which reside in nations.  Nations reside in continents, but I have not imported that table and don’t see the need to.

CREATE TABLE `states` (
  `state_id` int(11) NOT NULL,
  `nation_id` int(11) NOT NULL,
  `name` varchar(50) DEFAULT NULL,
  `abbreviation` varchar(50) DEFAULT NULL,
  `population` int(11) DEFAULT NULL,
  `main_language_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`state_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

Creating team_affiliations table

This is essentially a join table that establishes a relationship with an affiliate team.  The parent team (i.e. the Major League team) is the ‘affiliated_team_id’ and the minor league team is the ‘team_id’ team.

Actually, it could be reversed – due to an oversight I didn’t include this data in the dump – just set up the table.  It won’t be used until all the stats are done anyway – and it will be used to help identify useful players in trade negotiations.  For example, let’s say I am close to getting a deal done with Seattle and want to find one more pitching prospect from their farm system.  I’d use this table to find players on teams affiliated with Seattle.  Will have to come back to this one later, obviously.

CREATE TABLE `team_affiliations` (
  `team_id` int(11) NOT NULL,
  `affiliated_team_id` int(11) NOT NULL,
  PRIMARY KEY (`team_id`,`affiliated_team_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Tables 8: Player Career Pitching Stats

Same breakdown here as the previous career stats tables, with indexes on player_id, year, split_id, stint, and team_id.  Similar to the batting stats table, we have a couple calculated statistics here: WPA (though I am not sure how this works for pitchers…is it per AB?), and WAR.  We also have a column that I haven’t been able to identify: ‘li’.

I will come back to this in another section, but this is the table we’ll reference when aggregating data for Run Environments as we don’t have a reliable way to group league history pitching data by subleague or of calculating outs.  On this table, outs are conveniently recorded in the ‘outs’ column.

CREATE TABLE `players_career_pitching_stats` (
  `pcps_id` int(11) NOT NULL AUTO_INCREMENT,    
  `player_id` int(11) NOT NULL,
  `year` smallint(6) NOT NULL,
  `team_id` int(11) NOT NULL,
  `game_id` int(11) DEFAULT NULL,
  `league_id` int(11) DEFAULT NULL,
  `level_id` smallint(6) DEFAULT NULL,
  `split_id` smallint(6) NOT NULL,
  `ip` smallint(6) DEFAULT NULL,
  `ab` smallint(6) DEFAULT NULL,
  `tb` smallint(6) DEFAULT NULL,
  `ha` smallint(6) DEFAULT NULL,
  `k` smallint(6) DEFAULT NULL,
  `bf` smallint(6) DEFAULT NULL,
  `rs` smallint(6) DEFAULT NULL,
  `bb` smallint(6) DEFAULT NULL,
  `r` smallint(6) DEFAULT NULL,
  `er` smallint(6) DEFAULT NULL,
  `gb` smallint(6) DEFAULT NULL,
  `fb` smallint(6) DEFAULT NULL,
  `pi` smallint(6) DEFAULT NULL,
  `ipf` smallint(6) DEFAULT NULL,
  `g` smallint(6) DEFAULT NULL,
  `gs` smallint(6) DEFAULT NULL,
  `w` smallint(6) DEFAULT NULL,
  `l` smallint(6) DEFAULT NULL,
  `s` smallint(6) DEFAULT NULL,
  `sa` smallint(6) DEFAULT NULL,
  `da` smallint(6) DEFAULT NULL,
  `sh` smallint(6) DEFAULT NULL,
  `sf` smallint(6) DEFAULT NULL,
  `ta` smallint(6) DEFAULT NULL,
  `hra` smallint(6) DEFAULT NULL,
  `bk` smallint(6) DEFAULT NULL,
  `ci` smallint(6) DEFAULT NULL,
  `iw` smallint(6) DEFAULT NULL,
  `wp` smallint(6) DEFAULT NULL,
  `hp` smallint(6) DEFAULT NULL,
  `gf` smallint(6) DEFAULT NULL,
  `dp` smallint(6) DEFAULT NULL,
  `qs` smallint(6) DEFAULT NULL,
  `svo` smallint(6) DEFAULT NULL,
  `bs` smallint(6) DEFAULT NULL,
  `ra` smallint(6) DEFAULT NULL,
  `cg` smallint(6) DEFAULT NULL,
  `sho` smallint(6) DEFAULT NULL,
  `sb` smallint(6) DEFAULT NULL,
  `cs` smallint(6) DEFAULT NULL,
  `hld` smallint(6) DEFAULT NULL,
  `ir` double DEFAULT NULL,
  `irs` double DEFAULT NULL,
  `wpa` double DEFAULT NULL,
  `li` double DEFAULT NULL,
  `stint` smallint(6) NOT NULL,
  `outs` smallint(6) DEFAULT NULL,
  `war` double DEFAULT NULL,
  PRIMARY KEY (`pcps_id`),
  INDEX `pcps_ix1` (`league_id`),
  INDEX `pcps_ix2` (year),
  INDEX `pcps_ix3` (`player_id`),
  INDEX `pcps_ix4` (`team_id`),
  INDEX `pcps_ix5` (`split_id`)  
) ENGINE=InnoDB DEFAULT CHARSET=latin1;