Batting Stats 11: wRC+ Revisited

The results that we got from testing the wRC+ stat weren’t great.  In testing some other stuff, I realized that we’re getting duplicate entries for batters on the CalcBatting table:

Clearly the same player, same year, stint, team, and stats.  Well, all stats except for wRC+.  The issue is not being able to return a unique subleague for each row.  To deal with this, we’re going to make a couple of adjustments:

First, we’re going to change the sub_league_history_batting table to mirror the structure of the sub_league_history_pitching table.  This doesn’t directly solve the problem, but on reflection, I didn’t like how this table was calling data from a table it was sending data to.  It seems like a circular reference to me, even if the data is static.

DROP TABLE IF EXISTS sub_league_history_batting;
CREATE TABLE IF NOT EXISTS sub_league_history_batting AS

SELECT
       year
     , league_id
     , sub_league_id
     , slg_PA
     , slg_r
     
     FROM  (        
     SELECT p.year
          , p.league_id
          , t.sub_league_id
          , sum(pa) AS slg_PA
          , sum(r) AS slg_r
     FROM players_career_batting_stats AS p INNER JOIN team_relations AS t ON p.team_id=t.team_id
         INNER JOIN players ON p.player_id=players.player_id
     WHERE p.split_id=1 AND players.position<>1
     GROUP BY year, league_id, sub_league_id
      ) AS x ;

Now we have the subleague data pulling from the game-generated players career batting stats table.  A quick check shows that subleague runs are unchanged by this, but PA counts have changed a tiny bit – less than 10 over tens of thousands.  Not sure why, but I think the raw data from players_career_batting_stats table is more accurate.

Next, we’re going to add a sub_league column to CalcBatting, but we are going to do it in a way that avoids circular references. We’re actually going to alter the players_career_batting_stats table to include a subleague.  Then, we’ll get fancy and write a trigger that adds subleague to new records while leaving old ones unchanged.  Let’s start with altering the existing records.

We’ll add the subleague column thusly:

ALTER TABLE players_career_batting_stats
ADD COLUMN sub_league_id INT AFTER league_id;

Then, populate it:

UPDATE players_career_batting_stats2 AS b
INNER JOIN team_relations AS t ON b.league_id=t.league_id AND b.team_id=t.team_id
SET b.sub_league_id=t.sub_league_id;

We’ll come back to the trigger in another post.  We include the sub_league_id column near the top of CalcBatting returning the sub_league column we just included above:

USE mystruggle;
#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint 
    , b.split_id 
    , b.team_id 
    , b.sub_league_id
    , b.g
    , b.ab
Etc, etc, etc

You’ll notice I did a little other cleanup as well; I removed the league_abbr and team_abbr from the table.  We don’t need these columns taking up space in the table when we can easily pull them in when we need them.

Finally, and the cause of the duplicate records problem, was adding a missing JOIN element between players_career_batting_stats and sub_league_history_batting.  I had joined on league and sub-league, but not year.  Adding year brought me back to normal.

After I made a few more adjustments that I will talk about in  a later post, I re-ran a test for wRC+, adjusting my Happy Zone down to 5 points.

  • 20 out of 30 are within 5 points of the game
  • 24 out of 30 are within 7 points of the game
  • 27 out of 30 are within 10 points of the game
  • 3 missed by more than 10, the highest being 14 points.

14 points is really a lot.  However, a 90% pass rate is really pretty good considering where I was before I straightened out the sub-league situation.

 

Pitching Stats 7: FIP-

FIP- does for FIP just what ERA- does for ERA: it scales it to 100 and accounts for park factors and league run environment.  I am still searching for a definitive formula- but I know that to begin with, I will need to calculate the league FIP for comparison purposes.  I’ll go back and adjust the sub-league-pitching-stats table to include it.

In fact, I am going one step further, and generating a FIP for each sub-league.  I am guessing that this will cause me to deviate a bit from the game’s generated scores, but in this case, I think this will lead me to more accurate results.

The revised sub-league-pitching-stats table now looks like this:

DROP TABLE IF EXISTS sub_league_history_pitching;
CREATE TABLE IF NOT EXISTS sub_league_history_pitching AS

SELECT
       year
     , league_id
     , sub_league_id
     , round((totER/totIP)*9,2) AS slgERA 
     , round((adjHRA + adjBB + adjHP - adjK)/totIP+FIPConstant,2) AS slgFIP
     #FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant
FROM  (        
     SELECT p.year
          , p.league_id
          , t.sub_league_id
          , ((sum(ip)*3)+sum(ipf))/3 AS totIP
          , sum(er) AS totER
          , 13*sum(hra) AS adjHRA
          , 3*sum(bb) AS adjBB
          , 3*sum(hp) AS adjHP
          , 2*sum(k) AS adjK
          , f.FIPConstant
     FROM CalcPitching AS p INNER JOIN team_relations AS t ON p.team_id=t.team_id
          INNER JOIN FIPConstant AS f ON p.year=f.year AND p.league_id=f.league_id
     GROUP BY year, league_id, sub_league_id
      ) AS x ;

The calculation for FIP- is exactly the same as ERA-:

FIP Minus = 100*((FIP + (FIP – FIP*(PF/100)) )/ AL or NL FIP)

We’ve already got all of the data points we need, so let’s plug it in and see what happens.

Pretty good. 25 of 30 within 5 points.  Two that were ridiculously off and 3 that are meh.  I can rely on this stat to be game equivalent 85% of the time; in the right ballpark 93% of the time; so ridiculously off that I will be able to spot it immediately 6% of the time.  I wouldn’t want my real life money riding on this, maybe, but it’s good enough for video games.

The script for CalcPitching table is now:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , @FIP := round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
    , round(100*(slg.slgERA/@ERA)*p.avg,0) AS ERAplus
    , round(100*((@FIP + (@FIP - @FIP*(p.avg)))/slg.slgFIP),0) AS FIPminus
    FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 6: ERA+

This one may prove to be tricky, if only because there are a couple of ways to calculate it.  Baseball-Reference says they calculate it one way, Wikipedia says that bb-ref used to calculate it that way, but then they changed.  So, we may see some variation here.  Frankly, I’m not even sure why I’d want to use this counter-intuitive + stat anyway.  However, it’s in the game, and I’d like to be able to use it as a sanity check if nothing else.

Here’s the first way I am going to try it.  Defined by Wikipedia as the way bb-ref currently does the calculation:

ERA+ = 100 * (2 - (ERA/lgERA) * 1/ParkFactor)

No additional joins are needed for this, so we can just plug it in.  Let’s do it and check our results.

Awful.  Just awful.  1/3 of the result set was hugely off, and only about half was within 5 percent.

We’ll try the original recipe for this stat and see if we get better luck.  That one is:

ERA+ = 100 * (lgERA/ERA) * ParkFactor

And the results:

Much, much better.  24 within 5 points and only 1 more than 10.  And that one is also one that was way off on the first try.  I am good to keep this version and even to use it for evaluative purposes.  I think the difference between this and the game is probably down to park factors.  Here’s the CalcPitching table to this point:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
    , round(100*(slg.slgERA/@ERA)*p.avg,0) AS ERAplus
    
    FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 5: ERA-

It’s that time, once again, to try to deal with park adjusted stats.  Again, and against counsel, I will be pulling the park factors from the teams table rather than doing the calculations myself.  I got within spitting distance of a good result set for wRC+, so I am hoping for similar with these park-adjusted pitching stats.

First up is ERA-.  ERA- takes a pitcher’s ERA and puts it in the context of his league and his home park.  This makes it possible to compare players across eras and leagues, essentially normalizing the data.  100 is league average.  Every point below 100 is 1 percent better than average.

The formula is pretty straight-forward:
ERA Minus = 100*((ERA + (ERA – ERA*(PF/100)) )/ AL or NL ERA)

A few things have to happen in order to run this calc.  First, we’ll need sub-league ERA’s.  As mentioned in the first FIP post, we sort of do but really don’t have this on the league_history_table.  Better to roll our own from players_career_pitching_stats table.  We’ll do this in the same manner that we did it for batting- joining to the team relations table to get subleague.

Here’s how:

DROP TABLE IF EXISTS sub_league_history_pitching;
CREATE TABLE IF NOT EXISTS sub_league_history_pitching AS

SELECT
       year
     , league_id
     , sub_league_id
     , round((totER/totIP)*9,2) AS slgERA 
FROM  (        
     SELECT p.year
          , p.league_id
          , t.sub_league_id
          , ((sum(ip)*3)+sum(ipf))/3 AS totIP
          , sum(er) AS totER
     FROM CalcPitching AS p INNER JOIN team_relations AS t ON p.team_id=t.team_id
     GROUP BY year, league_id, sub_league_id
      ) AS x ;

Before we move on to the park factor, we have to make sure that we can associate a player’s team with his sub-league.  As usual, I’m sure that there’s a more elegant way to go about this than where I landed.  The problem I needed to solve was that sub-leagues do not have unique identifiers; they are uniquely identified only as composites of league_id and sub_league_id.  So, it’s not enough to refer to a sub-league as sub-league-1.  There are as many sub-league-1’s as there are leagues.  To make matters more complicated, the teams table does not carry a sub-league field.  That’s why we had to refer to the team_relations table.  Unfortunately, the team_relations table is the only table that contains all three necessary data points to pin down a team/sub-league relationship.  When I tried to let the database do the thinking for me by joining to it, it wasn’t consistently choosing the correct sub-league for each team.

I decided to add sub-league as a field to the already-crowded CalcPitching table.  It worked in testing, correctly pulling the right slgERA for each league-sub_league-year.  Like I said, I bet there’s a way to do this only with joins, but I wasn’t able to figure it out.  I am going to go back to the CalcBatting table and do the same thing.  Here’s the code for the new joins:

INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id

The next thing is to return the park factor for each pitcher-stint-year.  We’ll do this by joining to the teams table, then to the parks table:

INNER JOIN teams AS t ON i.team_id=t.team_id
INNER JOIN parks AS p ON t.park_id=p.park_id

With all that done, we’ve got to go back and define ERA as a variable so that we can reference it here without elaborating it.  Then, the formula is simple.  OOTP doesn’t track this stat either, so it’s hard to say with any certainty how well this works or how badly I’m getting bad results from using hard-coded park factors.  I did a quick sniff test, looking at ranges of ERA’s in my league and sniffing the ERA- stats for each.  It looks OK, I guess?

OOTP uses ERA+ instead, which seems to be more or less the same stat scaled up from 100 rather than down.  I will tackle that one next.

Here’s the full script for CalcPitching so far:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
      
FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

 

Pitching Stats 4: xFIP

xFIP is almost the same thing as FIP, just with something ‘xtra’.  The idea is that while pitchers are responsible for the 3 True Outcomes (HR, BB/HP, and K), home runs can also be subject to luck.  For example, a fence-scraper over the short right porch in Fenway might not be a home run through the marine layer at Dodger Stadium.  What does this tell us about the pitcher’s expected performance?

Well, to account for the vagaries of chance, xFIP takes all of a pitchers fly balls and multiplies them by the league average HR/FB rate.  Basically, it assumes a number of HR a pitcher would have given up based on the number of fly balls their opponents hit rather than the number of HR they actually did give up.

It feels like splitting hairs to me, but hey.  That’s baseball.  The formula for xFIP is just like FIP with that one change:
xFIP = ((13*(Fly balls * lgHR/FB%))+(3*(BB+HBP))-(2*K))/IP + constant

The constant is the same FIPConstant we calculated for FIP.  So, this one is pretty straight-forward, except that we need the HR/FB% for the league.  We’ll go back to our FIPConstant table and add it there for each league year.  Our FIPConstant table now looks like this:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
      year
    , league_id
    , hra_totals/fb_totals AS hr_fb_pct
    , @HRAdj := 13*hra_totals AS Adjusted_HR
    , @BBAdj := 3*bb_totals AS Adjusted_BB
    , @HPAdj := 3*hp_totals AS Adjusted_HP
    , @KAdj  := 2*k_totals AS Adjusted_K
    , @InnPitch := ((ip_totals*3)+ipf_totals)/3 AS InnPitch
    , @lgERA := round((er_totals/@InnPitch)*9,2) AS lgERA
    , round(@lgERA - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch),2) AS FIPConstant
FROM (
         SELECT year
                , league_id
                , sum(hra) as hra_totals
                , sum(bb) as bb_totals
                , sum(hp) as hp_totals
                , sum(k) as k_totals
                , sum(er) as er_totals
                , sum(ip) as ip_totals
                , sum(ipf) as ipf_totals
                , sum(fb) as fb_totals
          FROM players_career_pitching_stats
          GROUP BY year, league_id
      ) AS x;

I added the formula above to the CalcPitching table and we’re done.  OOTP doesn’t track xFIP (at least in v18), so there’s nothing to compare it to.  This one’s done.

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 3: FIP – The Conclusion

I redid the FIPConstant table to pull summed data from the players_career_pitching_stats table.  That table now looks like this:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
      year
    , league_id
    , hra_totals
    , bb_totals
    , hp_totals
    , k_totals
    , er_totals
    , ip_totals
    , ipf_totals
    , @HRAdj := 13*hra_totals AS Adjusted_HR
    , @BBAdj := 3*bb_totals AS Adjusted_BB
    , @HPAdj := 3*hp_totals AS Adjusted_HP
    , @KAdj  := 2*k_totals AS Adjusted_K
    , @InnPitch := ((ip_totals*3)+ipf_totals)/3 AS InnPitch
    , @lgERA := round((er_totals/@InnPitch)*9,2) AS lgERA
    , round(@lgERA - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch),2) AS FIPConstant
FROM (
         SELECT year
                , league_id
                , sum(hra) as hra_totals
                , sum(bb) as bb_totals
                , sum(hp) as hp_totals
                , sum(k) as k_totals
                , sum(er) as er_totals
                , sum(ip) as ip_totals
                , sum(ipf) as ipf_totals
          FROM players_career_pitching_stats
          GROUP BY year, league_id
      ) AS x;

And how did it work?  Better.

9 within 0.05; 26 within 0.11.  I’m still curious as to why I’m not matching up even better.  I still have a lingering suspicion that HBP is behind this, but I am going to let it lie for now unless it comes back to bite me on other calculations.

Our CalcPitching table to this point:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS fip
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1;

Pitching Stats 2: FIP – The False Start

FIP, or Fielding Independent Pitching, is based on the idea that pitchers are only in control of the “3 true outcomes” of a plate appearance: Strikeouts, Home Runs, and Free Passes (HBP and BB’s).  Everything else relies on defense which is largely beyond the pitcher’s control.  FIP is scaled, through the use of a constant, to a league’s ERA.

The formula to derive FIP is:

FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant

and the formula for the deriving the constant is similar:

FIP Constant = lgERA – (((13*lgHR)+(3*(lgBB+lgHBP))-(2*lgK))/lgIP)

We’re going to make a quick table to calculate the FIPConstant for each league year that we’ll reference when calculating FIP for each player stint.  Happily, the game gives us league ERA in the league_history_pitching_stats table, so we’ve been spared a step.  Because I am, apparently, not very good with the order of operations and parentheses, I have spent the last hour pulling my hair out trying to get a FIP Constant that looks reasonable.  In an attempt to save some of my last remaining hairs, I made a very inelegant table.  Behold my genius:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
    lhps_id
    , year
    , league_id
    , level_id
    , hra
    , bb
    , hp
    , k
    , @HRAdj := 13*hra AS Adjusted_HR
    , @BBAdj := 3*bb AS Adjusted_BB
    , @HPAdj := 3*hp AS Adjusted_HP
    , @KAdj  := 2*k AS Adjusted_K
    , @InnPitch := ((ip*3)+ipf)/3 AS InnPitch
    , era
    , era - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch) AS FIPConstant
FROM league_history_pitching_stats;

On the CalcPitching table, we’re adding FIP and disregarding left/right splits for the moment.  Our table script now looks like this:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round(i.er/@InnPitch,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS fip
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1;

So, how did it go?  Not great.  I took a random sample from my database and compared it to the game’s generated stats.  I wanted my FIP calculations to be within .05 of the game’s.

While most were in the medium range, it seems that there’s something different in the way the game calculates FIP.  Our numbers are close enough that it can’t be a major difference.  I’m going to follow a hunch and guess that it’s Hit By Pitch.  I will remove HBP as a factor in both the FIPConstant and FIP calculations and see what that does to our results.

I got about a third of the way through the revised calcs when I noticed a problem with the FIPConstant table.  This table pulls data from the league_history_pitching_stats table.  The problem is there.  You see, as I mentioned in the table setup posts and then promptly forgot about, there are a couple of columns in the league_history tables that attempt to distinguish between subleagues but do not give any indication of which is which. (They are the team_id and game_id columns.)  What this does create two records for each league (one for each subleague) with different totals but no way to identify the subleague being referenced.  This is no good.

My new hunch is that HBP is not the issue.  The formula is probably fine, I will just have to change the FIPConstant table to sum data from players_career_pitching_stats.  I’m going to publish this post as a testament to my naiveté and get to work on the revised table.

Pitching Stats 1: The Easy Stuff

Here’s the same explanation of how the stats tables are organized as we used in the first Batting Stats post:

Stats are collected for each player who accumulates them.  Each player gets his own row.  For each year that a player accumulates stats, a new row of data is created for that player.  For each team that a player plays in a given year (stint), a new row of data is created for that player.  Stats are accumulated and placed into three splits for each player-year-stint: Overall, vs. Left, and vs. Right.

As we did for the batting stats, we’ll be creating a new table for all of the pitching stats together in one place; counting stats provided by the game and calculated stats that we’ll derive here.

We’re carrying over all of the counting stats, plus WPA and WAR.  The calculated stats we’re adding in this post fall in the category of Easy Stuff:

  • InnPitch – I set this as a variable to avoid having to elaborate every time. This is the IP integer plus the IPF (innings pitched fraction) x 0.33
    round(IP + (IPF * .33),1).
  • All of the “x9” stats: K/9, BB/9 etc.
  • WHIP
  • GB/FB – Ground Ball/Fly Ball outs
  • BABIP (see the batting post for more on this)
  • ERA

Here’s the code:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := round(i.ip + (i.ipf*.33),1) AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    
    
FROM players_career_pitching_stats AS i;

Batting Stats 9: wRAA

Weighted Runs Above Average (wRAA) takes Weighted Runs Created (wRC) and puts it in the context of comparing it to the average player.  Below, we see one of our test subjects, Jorge Acosta.

In 2015, Acosta accumulated 56.9 wRC.  Is that good?  As with all counting stats, it depends.  Like other counting stats, the number of plate appearances matters.  (I know I have AB shown here, but bear with me.) A player who racks up 50 RBIs over a full season isn’t terribly impressive.  One who does it in 200 plate appearances is amazing.  Same with wRC.  Fangraphs gives us some general guidelines on what’s good, great, average, etc. over the course of a full season.  Still, that’s a number that is devoid of any context.  How does that stack up against other players in the league?

That’s where wRAA comes in.  League average wRAA is always 0.  So, looking at Acosta with his -7.3 wRAA, we see that his 56.9 wRC is not impressive at all: 7% below league average.  In 2015, Acosta played like a scrub.

We derive wRAA thusly:
(wOBA-lg_woba)/wOBAscale)*PA

It does, and should, look very similar to wRC.

My testing paramaters are based on the idea that there’s generally around 10 points separating the middle categories of poor to above average.  In order that my stats stay close to the game’s, I started off with a threshold of 3 points.  All of my results were within that range, so I tightened up a bit.  I changed my “good” zone to 1.0.  Here are my results:

Again, I’m really happy with how this turned out.  I don’t think I need to do any tweaking or further investigation.  Ready to move on!

Our script with the addition of wRAA:

#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
    , b.ci
    , @BA := round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
    , round(100*(@OBP/r.woba),0) as OBPplus
    , @SLG := round((b.h+b.d+2*b.t+3*b.hr)/b.ab,3) as slg
    , round(@OBP+@SLG,3) as ops
    , round(@SLG-@BA,3) as iso
    , round((b.h-b.hr)/(b.ab-b.k-b.hr+b.sf),3) as babip
    , @woba := round((r.wobaBB*(b.bb-b.ibb) + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
       r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
       /(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba
    , round(((@woba-r.woba)/r.wOBAscale)*@PA,1) as wRAA
    , round((((@woba-r.woba)/r.wOBAscale)+(lro.totr/lro.totpa))*@PA,1) as wRC
    /* NOT yet modified for OOTP and MySQL
    , ((([wRAA]/[PA] + l.RperPA) + (l.RperPA - t.bpf*l.RperPA))/(lb.wRC/lb.PA))*100 AS [wRC+]
    */
    FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
      INNER JOIN vLeagueRunsPerOut lro ON b.year=lro.year AND b.league_id=lro.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

Batting Stats 8: wRC

Weighted Runs Created (wRC) is a counting stat that sums the weighted total offensive production of a player in terms of runs.  It tells you how many runs a player produced through his offense in a given time period.  It’s based on the same linear weights as wOBA.  In fact, it’s derived completely from wOBA.  Essentially, it takes the difference between the player’s wOBA and the league wOBA divided by the wOBAscale, plus the league runs per PA, and multiplies it by the player’s plate appearances:

round((((@woba-lg_woba)/wOBAscale)+(lg_.totr/lg_.totpa))*PA,1)

The results I got in my testing were pretty good.  This makes some sense as my wOBA results are also pretty good, and this stat is derived from that one.  Still, it’s a bit of a relief.  I decided that my testing threshold for this stat would be 3.5.  Fangraphs suggests that increments of 10 wRC separate the awful from the poor, and the average from the above average.  I figure that by keeping my threshold well below half of that, I can be reasonably confident that my stats will be in the right neighborhood as those the game generates.

I’m really pretty pleased with this.  Note that this particular sample is pretty representative, and doesn’t contain any players that are at the top of the league.  The fringy cases may still be problematic, but I am good to move on.

The script as it stands now:

#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
    , b.ci
    , @BA := round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
    , round(100*(@OBP/r.woba),0) as OBPplus
    , @SLG := round((b.h+b.d+2*b.t+3*b.hr)/b.ab,3) as slg
    , round(@OBP+@SLG,3) as ops
    , round(@SLG-@BA,3) as iso
    , round((b.h-b.hr)/(b.ab-b.k-b.hr+b.sf),3) as babip
    , @woba := round((r.wobaBB*(b.bb-b.ibb) + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
       r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
       /(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba
    , round((((@woba-r.woba)/r.wOBAscale)+(lro.totr/lro.totpa))*@PA,1) as wRC
    /* NOT yet modified for OOTP and MySQL
    , ((([wRAA]/[PA] + l.RperPA) + (l.RperPA - t.bpf*l.RperPA))/(lb.wRC/lb.PA))*100 AS [wRC+]
    */
    FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
      INNER JOIN vLeagueRunsPerOut lro ON b.year=lro.year AND b.league_id=lro.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year