Pitching Stats 7: FIP-

FIP- does for FIP just what ERA- does for ERA: it scales it to 100 and accounts for park factors and league run environment.  I am still searching for a definitive formula- but I know that to begin with, I will need to calculate the league FIP for comparison purposes.  I’ll go back and adjust the sub-league-pitching-stats table to include it.

In fact, I am going one step further, and generating a FIP for each sub-league.  I am guessing that this will cause me to deviate a bit from the game’s generated scores, but in this case, I think this will lead me to more accurate results.

The revised sub-league-pitching-stats table now looks like this:

DROP TABLE IF EXISTS sub_league_history_pitching;
CREATE TABLE IF NOT EXISTS sub_league_history_pitching AS

SELECT
       year
     , league_id
     , sub_league_id
     , round((totER/totIP)*9,2) AS slgERA 
     , round((adjHRA + adjBB + adjHP - adjK)/totIP+FIPConstant,2) AS slgFIP
     #FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant
FROM  (        
     SELECT p.year
          , p.league_id
          , t.sub_league_id
          , ((sum(ip)*3)+sum(ipf))/3 AS totIP
          , sum(er) AS totER
          , 13*sum(hra) AS adjHRA
          , 3*sum(bb) AS adjBB
          , 3*sum(hp) AS adjHP
          , 2*sum(k) AS adjK
          , f.FIPConstant
     FROM CalcPitching AS p INNER JOIN team_relations AS t ON p.team_id=t.team_id
          INNER JOIN FIPConstant AS f ON p.year=f.year AND p.league_id=f.league_id
     GROUP BY year, league_id, sub_league_id
      ) AS x ;

The calculation for FIP- is exactly the same as ERA-:

FIP Minus = 100*((FIP + (FIP – FIP*(PF/100)) )/ AL or NL FIP)

We’ve already got all of the data points we need, so let’s plug it in and see what happens.

Pretty good. 25 of 30 within 5 points.  Two that were ridiculously off and 3 that are meh.  I can rely on this stat to be game equivalent 85% of the time; in the right ballpark 93% of the time; so ridiculously off that I will be able to spot it immediately 6% of the time.  I wouldn’t want my real life money riding on this, maybe, but it’s good enough for video games.

The script for CalcPitching table is now:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , @FIP := round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
    , round(100*(slg.slgERA/@ERA)*p.avg,0) AS ERAplus
    , round(100*((@FIP + (@FIP - @FIP*(p.avg)))/slg.slgFIP),0) AS FIPminus
    FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 6: ERA+

This one may prove to be tricky, if only because there are a couple of ways to calculate it.  Baseball-Reference says they calculate it one way, Wikipedia says that bb-ref used to calculate it that way, but then they changed.  So, we may see some variation here.  Frankly, I’m not even sure why I’d want to use this counter-intuitive + stat anyway.  However, it’s in the game, and I’d like to be able to use it as a sanity check if nothing else.

Here’s the first way I am going to try it.  Defined by Wikipedia as the way bb-ref currently does the calculation:

ERA+ = 100 * (2 - (ERA/lgERA) * 1/ParkFactor)

No additional joins are needed for this, so we can just plug it in.  Let’s do it and check our results.

Awful.  Just awful.  1/3 of the result set was hugely off, and only about half was within 5 percent.

We’ll try the original recipe for this stat and see if we get better luck.  That one is:

ERA+ = 100 * (lgERA/ERA) * ParkFactor

And the results:

Much, much better.  24 within 5 points and only 1 more than 10.  And that one is also one that was way off on the first try.  I am good to keep this version and even to use it for evaluative purposes.  I think the difference between this and the game is probably down to park factors.  Here’s the CalcPitching table to this point:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
    , round(100*(slg.slgERA/@ERA)*p.avg,0) AS ERAplus
    
    FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 5: ERA-

It’s that time, once again, to try to deal with park adjusted stats.  Again, and against counsel, I will be pulling the park factors from the teams table rather than doing the calculations myself.  I got within spitting distance of a good result set for wRC+, so I am hoping for similar with these park-adjusted pitching stats.

First up is ERA-.  ERA- takes a pitcher’s ERA and puts it in the context of his league and his home park.  This makes it possible to compare players across eras and leagues, essentially normalizing the data.  100 is league average.  Every point below 100 is 1 percent better than average.

The formula is pretty straight-forward:
ERA Minus = 100*((ERA + (ERA – ERA*(PF/100)) )/ AL or NL ERA)

A few things have to happen in order to run this calc.  First, we’ll need sub-league ERA’s.  As mentioned in the first FIP post, we sort of do but really don’t have this on the league_history_table.  Better to roll our own from players_career_pitching_stats table.  We’ll do this in the same manner that we did it for batting- joining to the team relations table to get subleague.

Here’s how:

DROP TABLE IF EXISTS sub_league_history_pitching;
CREATE TABLE IF NOT EXISTS sub_league_history_pitching AS

SELECT
       year
     , league_id
     , sub_league_id
     , round((totER/totIP)*9,2) AS slgERA 
FROM  (        
     SELECT p.year
          , p.league_id
          , t.sub_league_id
          , ((sum(ip)*3)+sum(ipf))/3 AS totIP
          , sum(er) AS totER
     FROM CalcPitching AS p INNER JOIN team_relations AS t ON p.team_id=t.team_id
     GROUP BY year, league_id, sub_league_id
      ) AS x ;

Before we move on to the park factor, we have to make sure that we can associate a player’s team with his sub-league.  As usual, I’m sure that there’s a more elegant way to go about this than where I landed.  The problem I needed to solve was that sub-leagues do not have unique identifiers; they are uniquely identified only as composites of league_id and sub_league_id.  So, it’s not enough to refer to a sub-league as sub-league-1.  There are as many sub-league-1’s as there are leagues.  To make matters more complicated, the teams table does not carry a sub-league field.  That’s why we had to refer to the team_relations table.  Unfortunately, the team_relations table is the only table that contains all three necessary data points to pin down a team/sub-league relationship.  When I tried to let the database do the thinking for me by joining to it, it wasn’t consistently choosing the correct sub-league for each team.

I decided to add sub-league as a field to the already-crowded CalcPitching table.  It worked in testing, correctly pulling the right slgERA for each league-sub_league-year.  Like I said, I bet there’s a way to do this only with joins, but I wasn’t able to figure it out.  I am going to go back to the CalcBatting table and do the same thing.  Here’s the code for the new joins:

INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id

The next thing is to return the park factor for each pitcher-stint-year.  We’ll do this by joining to the teams table, then to the parks table:

INNER JOIN teams AS t ON i.team_id=t.team_id
INNER JOIN parks AS p ON t.park_id=p.park_id

With all that done, we’ve got to go back and define ERA as a variable so that we can reference it here without elaborating it.  Then, the formula is simple.  OOTP doesn’t track this stat either, so it’s hard to say with any certainty how well this works or how badly I’m getting bad results from using hard-coded park factors.  I did a quick sniff test, looking at ranges of ERA’s in my league and sniffing the ERA- stats for each.  It looks OK, I guess?

OOTP uses ERA+ instead, which seems to be more or less the same stat scaled up from 100 rather than down.  I will tackle that one next.

Here’s the full script for CalcPitching so far:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
      
FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

 

Pitching Stats 4: xFIP

xFIP is almost the same thing as FIP, just with something ‘xtra’.  The idea is that while pitchers are responsible for the 3 True Outcomes (HR, BB/HP, and K), home runs can also be subject to luck.  For example, a fence-scraper over the short right porch in Fenway might not be a home run through the marine layer at Dodger Stadium.  What does this tell us about the pitcher’s expected performance?

Well, to account for the vagaries of chance, xFIP takes all of a pitchers fly balls and multiplies them by the league average HR/FB rate.  Basically, it assumes a number of HR a pitcher would have given up based on the number of fly balls their opponents hit rather than the number of HR they actually did give up.

It feels like splitting hairs to me, but hey.  That’s baseball.  The formula for xFIP is just like FIP with that one change:
xFIP = ((13*(Fly balls * lgHR/FB%))+(3*(BB+HBP))-(2*K))/IP + constant

The constant is the same FIPConstant we calculated for FIP.  So, this one is pretty straight-forward, except that we need the HR/FB% for the league.  We’ll go back to our FIPConstant table and add it there for each league year.  Our FIPConstant table now looks like this:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
      year
    , league_id
    , hra_totals/fb_totals AS hr_fb_pct
    , @HRAdj := 13*hra_totals AS Adjusted_HR
    , @BBAdj := 3*bb_totals AS Adjusted_BB
    , @HPAdj := 3*hp_totals AS Adjusted_HP
    , @KAdj  := 2*k_totals AS Adjusted_K
    , @InnPitch := ((ip_totals*3)+ipf_totals)/3 AS InnPitch
    , @lgERA := round((er_totals/@InnPitch)*9,2) AS lgERA
    , round(@lgERA - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch),2) AS FIPConstant
FROM (
         SELECT year
                , league_id
                , sum(hra) as hra_totals
                , sum(bb) as bb_totals
                , sum(hp) as hp_totals
                , sum(k) as k_totals
                , sum(er) as er_totals
                , sum(ip) as ip_totals
                , sum(ipf) as ipf_totals
                , sum(fb) as fb_totals
          FROM players_career_pitching_stats
          GROUP BY year, league_id
      ) AS x;

I added the formula above to the CalcPitching table and we’re done.  OOTP doesn’t track xFIP (at least in v18), so there’s nothing to compare it to.  This one’s done.

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 3: FIP – The Conclusion

I redid the FIPConstant table to pull summed data from the players_career_pitching_stats table.  That table now looks like this:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
      year
    , league_id
    , hra_totals
    , bb_totals
    , hp_totals
    , k_totals
    , er_totals
    , ip_totals
    , ipf_totals
    , @HRAdj := 13*hra_totals AS Adjusted_HR
    , @BBAdj := 3*bb_totals AS Adjusted_BB
    , @HPAdj := 3*hp_totals AS Adjusted_HP
    , @KAdj  := 2*k_totals AS Adjusted_K
    , @InnPitch := ((ip_totals*3)+ipf_totals)/3 AS InnPitch
    , @lgERA := round((er_totals/@InnPitch)*9,2) AS lgERA
    , round(@lgERA - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch),2) AS FIPConstant
FROM (
         SELECT year
                , league_id
                , sum(hra) as hra_totals
                , sum(bb) as bb_totals
                , sum(hp) as hp_totals
                , sum(k) as k_totals
                , sum(er) as er_totals
                , sum(ip) as ip_totals
                , sum(ipf) as ipf_totals
          FROM players_career_pitching_stats
          GROUP BY year, league_id
      ) AS x;

And how did it work?  Better.

9 within 0.05; 26 within 0.11.  I’m still curious as to why I’m not matching up even better.  I still have a lingering suspicion that HBP is behind this, but I am going to let it lie for now unless it comes back to bite me on other calculations.

Our CalcPitching table to this point:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS fip
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1;

Pitching Stats 2: FIP – The False Start

FIP, or Fielding Independent Pitching, is based on the idea that pitchers are only in control of the “3 true outcomes” of a plate appearance: Strikeouts, Home Runs, and Free Passes (HBP and BB’s).  Everything else relies on defense which is largely beyond the pitcher’s control.  FIP is scaled, through the use of a constant, to a league’s ERA.

The formula to derive FIP is:

FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant

and the formula for the deriving the constant is similar:

FIP Constant = lgERA – (((13*lgHR)+(3*(lgBB+lgHBP))-(2*lgK))/lgIP)

We’re going to make a quick table to calculate the FIPConstant for each league year that we’ll reference when calculating FIP for each player stint.  Happily, the game gives us league ERA in the league_history_pitching_stats table, so we’ve been spared a step.  Because I am, apparently, not very good with the order of operations and parentheses, I have spent the last hour pulling my hair out trying to get a FIP Constant that looks reasonable.  In an attempt to save some of my last remaining hairs, I made a very inelegant table.  Behold my genius:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
    lhps_id
    , year
    , league_id
    , level_id
    , hra
    , bb
    , hp
    , k
    , @HRAdj := 13*hra AS Adjusted_HR
    , @BBAdj := 3*bb AS Adjusted_BB
    , @HPAdj := 3*hp AS Adjusted_HP
    , @KAdj  := 2*k AS Adjusted_K
    , @InnPitch := ((ip*3)+ipf)/3 AS InnPitch
    , era
    , era - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch) AS FIPConstant
FROM league_history_pitching_stats;

On the CalcPitching table, we’re adding FIP and disregarding left/right splits for the moment.  Our table script now looks like this:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round(i.er/@InnPitch,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS fip
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1;

So, how did it go?  Not great.  I took a random sample from my database and compared it to the game’s generated stats.  I wanted my FIP calculations to be within .05 of the game’s.

While most were in the medium range, it seems that there’s something different in the way the game calculates FIP.  Our numbers are close enough that it can’t be a major difference.  I’m going to follow a hunch and guess that it’s Hit By Pitch.  I will remove HBP as a factor in both the FIPConstant and FIP calculations and see what that does to our results.

I got about a third of the way through the revised calcs when I noticed a problem with the FIPConstant table.  This table pulls data from the league_history_pitching_stats table.  The problem is there.  You see, as I mentioned in the table setup posts and then promptly forgot about, there are a couple of columns in the league_history tables that attempt to distinguish between subleagues but do not give any indication of which is which. (They are the team_id and game_id columns.)  What this does create two records for each league (one for each subleague) with different totals but no way to identify the subleague being referenced.  This is no good.

My new hunch is that HBP is not the issue.  The formula is probably fine, I will just have to change the FIPConstant table to sum data from players_career_pitching_stats.  I’m going to publish this post as a testament to my naiveté and get to work on the revised table.

Pitching Stats 1: The Easy Stuff

Here’s the same explanation of how the stats tables are organized as we used in the first Batting Stats post:

Stats are collected for each player who accumulates them.  Each player gets his own row.  For each year that a player accumulates stats, a new row of data is created for that player.  For each team that a player plays in a given year (stint), a new row of data is created for that player.  Stats are accumulated and placed into three splits for each player-year-stint: Overall, vs. Left, and vs. Right.

As we did for the batting stats, we’ll be creating a new table for all of the pitching stats together in one place; counting stats provided by the game and calculated stats that we’ll derive here.

We’re carrying over all of the counting stats, plus WPA and WAR.  The calculated stats we’re adding in this post fall in the category of Easy Stuff:

  • InnPitch – I set this as a variable to avoid having to elaborate every time. This is the IP integer plus the IPF (innings pitched fraction) x 0.33
    round(IP + (IPF * .33),1).
  • All of the “x9” stats: K/9, BB/9 etc.
  • WHIP
  • GB/FB – Ground Ball/Fly Ball outs
  • BABIP (see the batting post for more on this)
  • ERA

Here’s the code:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := round(i.ip + (i.ipf*.33),1) AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    
    
FROM players_career_pitching_stats AS i;

Run Environment 2: Run Values Part 1

I am not a statistician by any stretch of the imagination.  Nor am I a sabermetrician.  I can barely follow along with the conversation threads with the big brains on Tango’s website.  As a result, I have to take some things on faith.  Sometimes I feel good about that, sometimes not.  The things I am taking on faith for this first Run Values view are giving me heartburn.  Here’s what’s happening:

In the previous view, we established a run environment for each league year.  That is, we determined how many runs were scored for every plate appearance and for every out.  In this next view, we are adding a run value to the runs per out for every non-out batting outcome.  For example, let’s say that in 2009 the RperOut for my league was .172.  That’s basically saying that, no matter the outcome, stepping up to the plate in 2009 was worth at least .172 runs on average.  That RperOut establishes a baseline to which successful outcomes will add value.  How much value?  Here’s where the heartburn comes in.

The value added is a constant for each event.  Across all years and leagues.  A walk is worth 0.14 runs more than an out in every year, league, planet, park, etc.  It’s difficult to accept for real world historical MLB and it’s even harder to accept in OOTP where the baseball environment can be much different.  I have been looking, and will continue to look, for an explanation of how these constants were derived.  Haven’t found one yet and I’m anxious to move forward, so I will accept them for now.  However, when I start getting wacky results for my OOTP stats, this is the first place I am going to look.

It’s to do with Run Expectancy, and this Tango post and this article talk about it, but I was having some trouble parsing it, so I am leaving it for now, but with some misgivings.  So to recap, this view we are adding an expected run value to the runs per out for every non-out batting outcome.  Here’s that view:

CREATE OR REPLACE VIEW vRunValues AS
SELECT l.year
, l.league_id
, l.rperout
, l.rperout+0.14 AS runBB
, l.rperout+0.14+0.025 AS runHB
, l.rperout+0.14+0.155 AS run1b
, l.rperout+0.14+0.155+0.3 AS run2b
, l.rperout+0.14+0.155+0.3+0.27 AS run3b
, 1.4 AS runHR
, 0.2 AS runSB
, (2*l.RperOut)+0.075 AS runCS
FROM vLeagueRunsPerOut AS l;

Run Environment 1: League Stats

Here’s where things start to get fun. I’m going to be setting up a number of views that will allow me to calculate some of the more advanced statistics. The ones that I am most interested in are Weighted On-Base Average (wOBA), Weighted Runs Created (wRC), Weighted Runs Above Average (wRAA), and Weighted Runs Created+ (wRC+). If this blog ever becomes a book or major motion picture, I will come back and fill in some detail about what these stats are and why they are meaningful. Until then, I will leave the links to do the explaining.

I am basing all of this Run Environment work on a couple of posts and formulas posted by Tom Tango on his blog.

wOBA is the gatekeeper to all of these other stats, and it can’t be derived from a player’s individual stats alone.  It requires some baseline statistics about the run environment in which a batter plays.  The first step is to determine, for each league year, the number of Runs, Outs, Plate Appearances, Runs per Out, and Runs per PA.  I don’t have a good way, yet, to separate out sub_leagues, and Tango doesn’t seem to care at this point, so I leave it alone.

The other thing that Tango does and that I have decided to omit is limiting the summed pitching stats to those accumulated by players whose primary position is pitcher.  My experience is that non-pitchers pitch so rarely in OOTP that it is not worth creating a new view to determine primary position.

I create a view, as such:

CREATE OR REPLACE VIEW vLeagueRunsPerOut AS
SELECT p.year
, p.league_id
, SUM(p.r)/sum(p.outs) AS "RperOut"
, sum(p.r) AS "totR"
, sum(p.outs) AS "totOuts"
, sum(p.outs)+sum(p.ha)+sum(p.bb)+ sum(p.iw)+ sum(p.sh)
   + sum(p.sf) AS "totPA"
, round(sum(p.r)/(sum(p.outs)+sum(p.ha)+sum(p.bb)+ sum(p.iw)+ sum(p.sh)
   + sum(p.sf)),8) AS "RperPA"
FROM players_career_pitching_stats AS p
GROUP BY p.year, p.league_id;

Tables 8: Player Career Pitching Stats

Same breakdown here as the previous career stats tables, with indexes on player_id, year, split_id, stint, and team_id.  Similar to the batting stats table, we have a couple calculated statistics here: WPA (though I am not sure how this works for pitchers…is it per AB?), and WAR.  We also have a column that I haven’t been able to identify: ‘li’.

I will come back to this in another section, but this is the table we’ll reference when aggregating data for Run Environments as we don’t have a reliable way to group league history pitching data by subleague or of calculating outs.  On this table, outs are conveniently recorded in the ‘outs’ column.

CREATE TABLE `players_career_pitching_stats` (
  `pcps_id` int(11) NOT NULL AUTO_INCREMENT,    
  `player_id` int(11) NOT NULL,
  `year` smallint(6) NOT NULL,
  `team_id` int(11) NOT NULL,
  `game_id` int(11) DEFAULT NULL,
  `league_id` int(11) DEFAULT NULL,
  `level_id` smallint(6) DEFAULT NULL,
  `split_id` smallint(6) NOT NULL,
  `ip` smallint(6) DEFAULT NULL,
  `ab` smallint(6) DEFAULT NULL,
  `tb` smallint(6) DEFAULT NULL,
  `ha` smallint(6) DEFAULT NULL,
  `k` smallint(6) DEFAULT NULL,
  `bf` smallint(6) DEFAULT NULL,
  `rs` smallint(6) DEFAULT NULL,
  `bb` smallint(6) DEFAULT NULL,
  `r` smallint(6) DEFAULT NULL,
  `er` smallint(6) DEFAULT NULL,
  `gb` smallint(6) DEFAULT NULL,
  `fb` smallint(6) DEFAULT NULL,
  `pi` smallint(6) DEFAULT NULL,
  `ipf` smallint(6) DEFAULT NULL,
  `g` smallint(6) DEFAULT NULL,
  `gs` smallint(6) DEFAULT NULL,
  `w` smallint(6) DEFAULT NULL,
  `l` smallint(6) DEFAULT NULL,
  `s` smallint(6) DEFAULT NULL,
  `sa` smallint(6) DEFAULT NULL,
  `da` smallint(6) DEFAULT NULL,
  `sh` smallint(6) DEFAULT NULL,
  `sf` smallint(6) DEFAULT NULL,
  `ta` smallint(6) DEFAULT NULL,
  `hra` smallint(6) DEFAULT NULL,
  `bk` smallint(6) DEFAULT NULL,
  `ci` smallint(6) DEFAULT NULL,
  `iw` smallint(6) DEFAULT NULL,
  `wp` smallint(6) DEFAULT NULL,
  `hp` smallint(6) DEFAULT NULL,
  `gf` smallint(6) DEFAULT NULL,
  `dp` smallint(6) DEFAULT NULL,
  `qs` smallint(6) DEFAULT NULL,
  `svo` smallint(6) DEFAULT NULL,
  `bs` smallint(6) DEFAULT NULL,
  `ra` smallint(6) DEFAULT NULL,
  `cg` smallint(6) DEFAULT NULL,
  `sho` smallint(6) DEFAULT NULL,
  `sb` smallint(6) DEFAULT NULL,
  `cs` smallint(6) DEFAULT NULL,
  `hld` smallint(6) DEFAULT NULL,
  `ir` double DEFAULT NULL,
  `irs` double DEFAULT NULL,
  `wpa` double DEFAULT NULL,
  `li` double DEFAULT NULL,
  `stint` smallint(6) NOT NULL,
  `outs` smallint(6) DEFAULT NULL,
  `war` double DEFAULT NULL,
  PRIMARY KEY (`pcps_id`),
  INDEX `pcps_ix1` (`league_id`),
  INDEX `pcps_ix2` (year),
  INDEX `pcps_ix3` (`player_id`),
  INDEX `pcps_ix4` (`team_id`),
  INDEX `pcps_ix5` (`split_id`)  
) ENGINE=InnoDB DEFAULT CHARSET=latin1;