Baseball

Introducing sERA: A New Way of Measuring Pitchers

|
Image for Introducing sERA: A New Way of Measuring Pitchers

The choice of Corbin Burnes over Zach Wheeler for National League Cy Young award winner sparked much controversy at the time, as voters grappled with Burnes having a record-setting FIP of only 1.63 over 167 innings, while Wheeler pitched 213.1 innings with a 2.78 ERA.

A simple way to evaluate the race would be using wins above replacement (WAR), but the multiple versions of WAR disagreed on the winner. Fangraphs WAR, which uses fielding independent pitching, declared Burnes the winner by 0.3 fWAR. Baseball-Reference WAR, which uses a combination of runs allowed and team defensive adjustment, declared Wheeler ahead of Burnes by 2 bWAR.

And RA9-WAR, which simply uses a pitcher’s runs allowed over 9 innings without accounting for defense, determined that Walker Buehler was the winner ahead of Wheeler by 0.7 WAR. Each of these stats and WAR’s has its individual flaws, and the result is a controversial decision and confusion for all baseball fans. In a sport that can measure the run value of each at-bat, there is no reason for the value of pitching to be this complicated.

Problems With Current Metrics

At its essence, pitching is evaluated using two stats: Earned Run Average and Fielding Independent Pitching. ERA attempts to measure the runs a pitcher is responsible for, by taking their runs allowed, subtracting runs caused by errors, and determining a pitcher’s earned runs per nine innings. ERA attempts to measure the effect of the defense on a pitcher, but it is not truly reflective of performance for a couple of reasons.

First, errors are subjective since they are determined by a scorekeeper, and punish players for not getting outs in certain ways above others.

Second, errors do not account for the runs prevented by a good defense. As a result, errors do not properly measure the effect of defense on pitchers, and ERA can be misleading beyond simply batted ball luck.

On the other end of the scale, FIP does not include batted ball data entirely, simply using a ratio of strikeouts, walks, and home runs to determine a pitcher’s estimate runs allowed per nine innings. FIP is a good stat for projecting future performance because batted ball data is subject to a lot of luck, but isn’t always a great stat for measuring past performance. By excluding defense entirely, FIP does not have the subjectivity of ERA in determining the runs a pitcher is responsible for.

But, by excluding all batted ball data, FIP doesn’t account for the fact that certain types of batted balls (ground balls, weak contact) are more likely to turn into outs than other types of batted balls (line drives, fly balls). Pitchers that produce high ground ball rates like Framber Valdez are likely to overperform their FIP. There are a variety of other peripheral stats that are great at predicting future runs allowed by pitchers, but they are unable to measure past performance while accounting for defense properly.

A Way of Including Defense and Batted-Ball Luck

sERA, which stands for Statcast earned run average, is a stat that I developed that attempts to account for the defensive performance behind a pitcher but also includes batted ball luck that every pitcher is subject to. rWAR essentially does this in its calculation, but it uses overall team defense as opposed to defense behind a particular pitcher and doesn’t have a stat for runs allowed per nine innings.

The formula for sERA is quite simple. It takes a pitcher’s runs allowed and uses Statcast’s Runs Prevented stat for that pitcher to determine their expected amount of runs assuming an average defense behind them. The leaderboard for pitchers with a minimum of 30 innings can be found here, and the leaderboard for those with a minimum of 100 innings can be found here.

sERA is derived from the runs prevented metric from Statcast, which adds their runs prevented to determine their expected number of runs. Looking at the charts below, we can see that Hector Neris was extremely unlucky with his defense, while Adam Wainwright greatly benefitted from the amazing Cardinals’ defense. The full list can be accessed here.

One thing that may stand out is that having lower or higher xRuns doesn’t necessarily lead to an sERA that is lower/higher than a pitcher’s ERA. For example, Joe Mantiply has 3.40 ERA, and -7 runs prevented, but a 3.86 sERA. This is because he only had 15 earned runs, so while a scorekeeper determined his defense cost him 9 runs, his defense might have saved him 2 runs on other plays. Since sERA is on a runs allowed basis and factors in both good and bad fielding plays, it can’t be compared directly with ERA.

The way we can use sERA is by comparing it to ERA+, a more standardized metric. ERA+ takes a pitcher’s ERA, and determines the percent above or below the league average of 100 the pitcher has performed. For example, Chris Bassitt was 30% better than league average in ERA in 2021, so he has a ERA+ of 130. By doing the same thing with sERA and making sERA+, we can compare the two metrics. A list of the two metrics for pitchers with 30 innings can be found here, and one with 100 innings can be found here.

sERA+ vs ERA+ (minimum 100 IP)

Normally, ERA+ is standardized by league (The NL does not have a DH) and ballpark (Coors is a lot easier to hit in than the Coliseum). For the purposes of simplicity, I did not do this, which is why the values on the screen may not match what you see on a site like Baseball-Reference. Additionally, the league average for each is taken from the sample of pitchers with greater than 30 innings pitched, which would make the league average slightly lower for each metric than normal, as the worst pitchers are often not allowed to throw 30 MLB innings. With that said, the comparison between the two metrics is still valid, since they are both subject to no adjustments and the same sample for league average.

Which Pitchers Were Affected the Most?

By looking at the difference between ERA+ and sERA+, we can see which pitcher’s ERA’s were most misleading in the 2021 season. The list of pitchers with 30 innings is here, and 100 innings here. One pitcher to note is Marcus Stroman, who is moving from an exceptional Mets defense to a more modest Cubs one. His sERA+ was 20 points lower than his ERA+, which indicates that his run prevention may not be as sustainable as a Cub next season. On the other side, the two Cy Young award winners, Corbin Burnes and Robbie Ray, were the two pitchers with the largest negative difference between ERA+ and sERA+. This truly underscores how great each of their seasons were, and further validates the decision by the voters.

Limitations

Since sERA is a stat made for measuring production and not predicting future production, I did not look to see if it correlates well to future ERA+. Additionally, sERA can only be looked at on a rate basis, as there is no sWAR stat to accompany it to measure the volume of pitchers’ production.

How Should sERA Be Used?

When voting for end-year awards like the All-MLB team and the Cy Young award, voters have a lot of ways to determine the winner. If they want to just measure the results on the field, voters can use a stat like RA9. If they want to eliminate luck entirely and purely measure a pitcher’s ability, they can use a peripheral stat such as SIERA. But since most voters use ERA to determine awards right now, I ask they start using sERA, since it more accurately measures the help they receive defensively.

Follow me on Twitter @LeoCardozoMLB for more of my content. Don’t forget to listen to our baseball podcast Cheap Seats Chatter! We’ll see ya there!

Come join the discussion made by the fans at the Overtime Heroics forums! A place for all sports!

Main image credit Embed from Getty Images

Share this article