FIP and xFIP

Fielding independent pitching is an alternative-pitching statistic to ERA. Because ERA relies so much on the quality of fielding behind the pitcher, and on the scorekeeper’s objective opinion about what constitutes and error and what does not, it is not the best stat available to evaluate pitching performance. In 1999, Voros McCracken attempted to determine why Greg Maddux’s ERA ballooned from 2.22 in 1998 to 3.57 in 1999. McCracken figured out that ERA is not predictive year in and year out, because batting average on balls in play (BABIP) fluctuate wildly from year to year, even when the pitcher is pitching the same way. Balls that may have been caught the year before may start dunking in for hits the next year, and inevitably some of those base runners will come around to score.

What McCracken determined was that once a ball was put into play, a pitcher no longer had control over what happened to it. Because of this, it is unfair to reward or penalize the pitcher for what happens after the ball is put into play. McCracken developed a formula that credits the pitcher with only the events that he can reasonably influence. FIP is determined by using the formula: (Home Runs x 13 +(walks + hit by pitches-intentional walks) x3-Strikeoutsx2/Innings Pitched + 3.2)

This formula takes away anything that could reasonably be determined to be a factor of luck, and credits or debits the pitcher with only what he could control (walks, strikeouts, home runs). J.C Bradbury sums up McCracken’s findings, “The noise of earned runs generated on balls put in play, which were randomly turned into hits or outs by the fielders, actually hindered the identification of the pitcher’s true ability. It turns out that the real reason Greg Maddux is so good is that, though he is not an overpowering strikeout pitcher, he rarely walks batters or gives up home runs. This makes DIPS (FIP) a valuable tool for disentangling responsibility for preventing runs.”

FIP and Defensive Independent Pitching (DIPS, another similar formula to FIP) fail to account for park factors. A home run over the Green Monster in Fenway Park could be a routine fly ball in 29 other stadiums. Since not every ballpark has the exact same dimensions, it is unwise to use FIP to compare pitchers across different teams, as their home runs allowed may be attributed to their ballpark.

There is, however, a version of FIP that does account for park and league factors.  It is called park adjusted FIP (xFIP).  Because unadjusted FIP fails to account for the varying dimensions of major league ballparks, it is unfair to weight home runs as heavily as it does.  xFIP takes care of this problem by adding a step to the FIP formula.  The formula is: ((Fly balls x.11) x 13 + (walks +HBP –IBB) x 3 –(strikeouts) x 2)/ innings pitched).

By replacing home runs allowed with fly balls allowed (and the rate at which they leave the ball park) xFIP has taken a great stride in park adjusting a pitching statistic. xFIP is a far more accurate depiction of a pitcher’s performance than ERA and should be used accordingly.  FIP and xFIP are more indicative of a pitcher’s actual performance and also more predictive of a pitcher’s future performance.

Leave a comment

Filed under Baseball sabermetrics


If you don’t know how to credit the fielder for what happens after a ball is put into play, you also, by definition don’t know how to debit the pitcher.  And therefore, you would never be able to say with any real certainty how good any pitcher was.” –Michael Lewis

Now that we have established the major problem with using errors to evaluate fielders, we must analyze every statistic that errors effects.  This includes ERA.  In the interest of full disclosure, ERA is probably the least flawed of the “traditional statistics”.  It is infinitely more indicative of performance than wins and losses.  A pitcher with an ERA of 5 has not pitched as well as a pitcher with an ERA of 3 over the course of the season.  The ERA may be slightly inflated because his defense was lousy, but rarely by as many as 2 runs per game.

The main problem with ERA is that it depends almost exclusively on the use of errors.  A pitcher has very little influence on what happens to the ball once it leaves his hand.  A pitcher’s ERA is as much a reflection on the players around him as it is a reflection on him.

It is also possible, although rare, for a pitcher to have a great win-loss record, and a sparkling ERA and still not have had a good season. In 2008, Boston Red Sox pitcher Daisuke Matsuzaka had an 18-3 record with a 2.9 ERA. Both of these numbers indicate that he had a Cy Young caliber season. However a deeper look at his numbers that don’t show up on a traditional stat sheet, his peripherals, shows a much different story. His walk rate of 5.05 is incredibly high, meaning he was pitching far too often with men on base. His batting average on balls in play (BABIP) of .267 was unsustainably low. Often a pitcher can sustain a stretch of lucky play like this where he gets by on smoke and mirrors. However, inevitably, his peripherals regress towards the mean. Those balls in play that were outs will start dunking in for hits and many of the men who he issued free passes to will come around to score. Incredibly, however, Matsuzaka never had his regression. He managed to ride his luck out all season. His ERA in 2008 was not indicative of his actual performance, and certainly not predictive of his future performance, a forgettable 2009 campaign where his regression hit hard.

Statistics such as BABIP, WHIP, and more specifically FIP and xFIP are far more indicative of a pitcher’s performance than ERA.  Perhaps more importantly, these statistics are more predictive of a pitchers future performance, which is very useful for both General Managers and fantasy managers alike.

Leave a comment

Filed under Traditional Stats

The problem with Errors

“It is, without exception, the only major statistic in sports which is a record of what an observer thinks should have been accomplished…A talent for avoiding obvious failure is no great trait in a big league baseball player; the easiest way to not make an error is to be too slow to reach the ball in the first place.  You have to do something right to get an error, even if the ball is hit right at you, then you were standing in the right place to begin with.”- Bill James

To understand why pitching statistics are so flawed, beyond the obvious flaws with win-loss record, we must first understand why fielding statistics are flawed.  Because pitching statistics like earned runs, ERA, and even WHIP are based so heavily on what a fielder does, if the statistics used to rate fielders are flawed (and they are) then it must follow that pitching statistics are also flawed.  The error is currently the cornerstone for all fielding statistics.  A player’s fielding percentage is determined by the number of balls he successfully converts without making an error and a pitcher’s ERA is determined by the number of runs that he gives up without an error.  According to JC Bradbury, “errors are the most commonly used statistics to judge fielders and it’s the third line in the box score after runs and hits.  But the difference between an error and a hit is a highly subjective decision which an official game score keeper determines…A line drive to the gap that bounces off a fast outfielders glove may be scored an error, while that same ball played by a slow outfielder, who doesn’t come within ten feet of the ball may be scored a hit.

The problem with errors are that often a player is penalized for not making a play on a ball that other players would not have even gotten to.  The poster child for the problem with errors is Derek Jeter.  Jeter has three gold gloves, awarded to him in large part because of his outstanding fielding percentages.  The untold story, however, is that Jeter’s range is consistently amongst the worst in the majors.  To put it another way, Jeter doesn’t make many errors because he doesn’t get to enough balls to make the play in the first place.  A shortstop with superior range, like Elvis Andrus, will make more errors because he has so many more opportunities to make errors.  Jeter’s patented jump throw is compensation for his slow first step and limited range.  However, to Jeter’s credit, his UZR, and Range factor have all improved in each of the last two seasons.  I don’t have the statistics to back it up, but I bet Michael Kay’s words “past a diving Jeter” and “Jeter with the jump throw” have gone down in direct inverse to Jeter’s advanced rates going up.

Admittedly, fielding statistics are still not perfect.  However, there are several marked improvements over errors and fielding percentages.  Statistics like ultimate zone rating (UZR and UZR-150), Range factor, and error runs above average (ErrR) are an enormous step in the right direction.

Leave a comment

Filed under Traditional Stats

Win-Loss Record

“To many fans, then, the primary way to measure a starting pitcher’s success is his won-lost record. Any pitcher worth his salt should win more than he loses, and a 20-win season is the hallmark of excellence. Except that there are two parts to winning a game: having your team score runs and preventing your opponent from doing so. In theory, pitchers can affect only half the equation by preventing runs. But since defense makes up a significant portion of run prevention, pitchers actually influence a fair bit less than half the equation” –Keith Woolner

A starting pitcher’s goal, whenever he takes the mound, is to get his team a win that day.  When a pitcher successfully holds his opponent to fewer runs than his team scored that day, he is credited with a win.  When he is unsuccessful, he is credited with a loss.  Of course, as with any other statistic, it is not that simple.  A pitcher may get staked a big lead, allow five runs and still earn a win.  On the other hand, a pitcher may give up one run in a complete game and lose 1-0.  Did the first pitcher pitch better than the second?  Obviously not, but simply according to each pitcher’s win-loss record, he was more successful.

Thankfully, many media members and fans are coming around to this way of thinking.  The 2009 AL Cy Young voting was evidence of that.  Had the same season played out in the early 90’s, CC Sabathia would have run away with the award.  There is even debate about whether or not he would have gotten the award in 2009 had he successfully won 20 games instead of 19.  A pitcher on the Yankees, with their juggernaut offense, is often afforded more latitude, knowing that his team will score plenty of runs, whereas a pitcher for the Royals has to treat every pitch as if it’s Game 7 of the World Series.  In 2009, Sabathia won 19 games with an ERA of 3.37 (not that ERA is a great statistic to use, but we’ll get into that later).  Zack Greinke, on the other hand, only won 16 games for the Royals while sporting an otherworldly 2.16 ERA.  Basically, wins and losses rely far too heavily on things that are outside the pitcher’s control to be taken seriously as a statistic.

Another problem with win-loss record is what happens after a starting pitcher exits the game.  In today’s game very few pitchers can throw complete games with any kind of regularity, meaning that for a period of time the game is out of their hands completely.  A pitcher may exit a game after a masterful 7 innings having given up 0 earned runs and in line for a win (as happened to John Lackey against the Yankees the other night) only to have the bullpen come in and blow the game.  Instead of picking up the win, through no fault of their own, the pitcher now has a no-decision.

Instead of examining win-loss records, which are largely a matter of luck, it is far better to observe a pitcher’s peripheral statistics. A much better idea of a pitcher’s success can be garnered by observing his FIP or xFIP, WHIP, or even ERA.

Leave a comment

Filed under Traditional Stats

Introduction to Advanced Pitching Metrics

Now that we have discussed advanced offensive metrics, it is important to note that pitching and defensive statistics are just as archaic and outdated as AVG and RBI’s.  For instance, ESPN columnist Keith Law drew the ire and scorn of the mainstream media for voting for Javier Vazquez to place second in the NL Cy Young voting, which may have cost Chris Carpenter the award.  Columnists and fans alike pointed to Carpenter’s outstanding Win-Loss record and sparkling earned run average (ERA) as reasons that Carpenter deserved the award, or at least that he deserved to be ranked higher than Vazquez.  This was especially hard to figure out because those same columnists and fans almost universally agreed that Zack Greinke deserved the AL Cy Young despite his low win-loss totals.  It was almost like they agreed to use sabermetrics in the American League but totally disregarded them in the National League.

In discrediting RBI’s we discussed the fact that they were far too team dependant to be taken seriously.  This is, essentially, the major flaw with every currently used pitching statistic too.  Wins and losses, obviously, rely far too heavily on how much run support your offense provides for you and also on how well your defense performs and how well the pitcher’s who relieve you do.  ERA is a better indicator of success, but still depends way too much on your team’s defense.

Instead of looking at ERA or wins and losses, we should focus instead on things that the pitcher is directly responsible for.  Some sabermetrics that are far better alternatives are: fielding independent pitching and (FIP), park adjusted FIP (xFIP), and walks and hits per innings pitched (WHIP).

Leave a comment

Filed under Baseball sabermetrics, Traditional Stats

Value Over Replacement Player (VORP)

The final strictly offensive statistic that I want to discuss is value over replacement player (VORP).  VORP is one of the best statistics available to analyze a player’s offense and also serves as one of the best tools for player comparison.  While most of the statistics that we have discussed so far have been fairly straight forward, VORP is admittedly more complicated.  These numbers take some long calculations, but once they are determined, they can illustrate which players have the most value to a team or who contributed most to a team’s success.  General Managers who are considering which free agents to sign or which players to trade for should be using VORP and its sister’s stat, wins above replacement (WAR).

In order to understand how to calculate a player’s value above a “replacement player”, we must first define what a replacement level player is.  Keith Woolner, of the Baseball Prospectus team of experts, defines replacement level as, “the expected level of performance a major league team will receive from one or more of the best available players who can be obtained with minimal expenditure of team resources to substitute for a suddenly unavailable player at the same position.”

Basically, “replacement level” determines the performance a team can reasonably expect if their normal starter gets injured or traded.  For example, when Alex Rodriguez was injured in April of the 2009 season, the Yankees opted to promote Cody Ransom from within their system rather than sign an expensive free agent or make a trade for a high caliber third basemen.  Obviously, the Yankees didn’t expect Ransom to match Rodriguez’s statistics, instead they expected close to replacement level statistics from Ransom. As it turns out, Ransom wasn’t able to even replicate replacement level production, posting an abysmal –3.2 VORP, which basically means that he accounted for 3.2 less runs than a replacement level player would have. Rodriguez, on the other hand, produced a 52.3 VORP.

All of this begs the question; how do we determine “replacement level”?  Keith Woolner gives a step-by-step demonstration: “For a position with a replacement level of R percent (R=80 percent for most positions), subtract P points from the position’s average AVG/OBP/SLG, using the following formula (seen below).  Let’s simplify the math with an example.  Suppose we want to find replacement level for left field, where the league-average LF hits .270/.340/.430.  Left field is an 80 percent replacement-level position, so we plug R=80 percent into the formula and find that P is equal to 33 points.  Left field in this hypothetical league would have a replacement level of .237/.307/.397, which is 33 points below the position’s average AVG, OBP, and SLG.”

When comparing an established player’s, like Rodriguez, statistics with a replacement level player, like Ransom, General Manager’s can make informed decisions about free agents and trades.  The replacement level statistics can be converted into runs created (RC or RC/27) to determine RC over replacement level.  For every 10 runs created above replacement level, a team gains about one win.

VORP is arguably the best available offensive statistic in use today.  The biggest problem with VORP is that it only accounts for offense.  This is ok if the players are similar on defense, or both designated hitters, but if there is a wide chasm in defense, we should look at WAR because it accounts for both offense and defense.

Leave a comment

Filed under Baseball sabermetrics

Runs Created (RC and RC/27)

“With regard to an offensive player, the first key question is how many runs have resulted from what he has done with the bat and on the base paths.  Willie McCovey hit .270 in his career, with 353 doubles, 46 triples, 521 home runs and 1,345 walks—but his job was not to hit doubles, nor to hit singles, nor to hit triples, nor to draw walks or even hit home runs, but rather to put runs on the scoreboard.” –Bill James

The goal of any team is to win.  You win by scoring more runs than your opponent.  You do not automatically win because you had more hits than your opponent, or because you walked more times than your opponent, although these things help.  The only way to win the game is to cross home plate more times than your opponent did.  Using this philosophy, why aren’t player’s judged the same way?  Why aren’t player’s judged by their ability to create runs, thereby contributing to the overall goal of the team?  Using the runs created (RC) formula, we can measure a player’s success or failure through the most important lens in the game.

OBP, SLG, wOBA, OPS, and other advanced offensive metrics are all great indicators of how well a player contributed to his team.  RC, however, more directly measures how well a player contributed to the team’s pursuit of victory.  The run is the most important part of the game.  It is the goal of every at-bat.  RC can quantify how many runs a player is worth, and by proxy how many wins a player is worth.  RC is calculated by multiplying OBP by total bases.  RC/27 takes the number of runs a player creates and divides it by 27 (the number of outs in a regulation game) to determine how many runs a player is worth per game.

There is also a more technical version of runs created that takes every event into consideration.  In addition to OBP, technical RC considers hit by pitches, stealing and caught stealing, grounded into double plays, intentional walks, sacrifice hits and sacrifice flies.  The more advanced formula can be found below.

RC and RC/27 are widely considered to be the most accurate measure of offensive contribution today.  When team totals are plugged into the RC formula, even the basic version, it can approximate the amount of runs the team actually scored within 5 percent for the entire season.  RC is also an integral part of value over replacement player (VORP) and wins above replacement (WAR), which are also useful for player comparisons.

Leave a comment

Filed under Baseball sabermetrics