MLB Predictions

Since starting pitching is so important to the outcome of a baseball game, I will include this data whenever possible. Unlike many stat based systems, my analysis does not use any information other than who the teams and starting pitchers were, the date and location of the game, and the final score. One consequence of this is that a pitcher's listed record is the record of his team when he starts, regardless of whether he actually got the decision.

Team ratings effectively measure the team's strength minus its starting pitching. Hence the offensive rating measures run production, and the defensive rating measures fielding and the bullpen.

Pitchers are rated mostly on their defensive contribution to the game (preventing the other team from scoring). The offensive rating for a pitcher is just an artifact of the run support that pitcher receives.

The model I have created may seem overly simplistic, but I believe it is quite elegant and robust. For example, it accounts for pitcher/hitter ballparks, and allows for pitchers to be traded without altering the ratings.

Game predictions are adjusted by the starting pitcher ratings (off,def,and hf) in a manner consistent with the formulas that use team ratings to generate predictions. Posted predictions include additional adjustments such as for the scoring differences at different parks.

Until further notice, only the Massey ratings will be updated for MLB, since the other models are not yet able to incorporate the pitcher data.

Making Predictions

A fascinating aspect of computer ratings is their ability to make predictions about games yet to be played. The MRR web site publishes results for three distinct rating systems, and the prediction pages list them side-by-side for easy comparison. Because of differences in the rating models, the predictions will not always agree.

Computer predictions, while insightful, should not be taken too seriously. Please read the disclaimer.

Prediction Pages

I post the predictions for many sports (MLB, NFL, NHL, NBA, college football, college basetball) on the web. Ideally, these are updated daily (or weekly in case of football). Because of travel or lack of time, these are sometimes delayed.

Because of my involvment with the Bowl Championship Series (BCS), I will not post predictions for 1999 College Football. They may still be obtained directly from the ratings via the formulas listed.

Understanding the Predictions

Beside each game, you will find the predictions of the 3 rating systems (Massey, Sauceda, and E-Ratings). Massey predicts the exact final score, while the other two only predict the point spread, which is listed beside the favored team. All three ratings list their confidence that the favored team will actually win the game. This is in terms of a percentage. So if Team A has a 67% beside it, then based on that rating system, we expect Team A to win approximately 67% of games against its opponent, Team B. Consequently, barring ties, B must have a 100 - 67 = 33% chance of winning the game.

Massey 3.0

These ratings are designed to measure the past and not to predict the future. However, the Off,Def,H can be used to not only predict the winner of a game, but also the final score. The predictions will not always agree with the team rankings (i.e. the #3 team may be predicted to beat the #1 team). Hypothetical predictions follow the formulae:

Ascore = Aoff - Bdef +/- (Ah + Bh)/4
Bscore = Boff - Adef -/+ (Ah + Bh)/4

Let's look at an example. Suppose we have the following:


              Off             Def           H
             ----            ----        ----
Team A       26.4            -3.1         3.5
Team B       21.8             2.6         4.0
The predicted score if the game were played at A's homefield would be:

Ascore = 26.4 - 2.6 + (3.5 + 4.0)/4 = 25.675
Bscore = 21.8 - (-3.1) - (3.5 + 4.0)/4 = 23.025

If instead the game is at B, then reverse the signs on the homefield term:

Ascore = 26.4 - 2.6 - (3.5 + 4.0)/4 = 21.925
Bscore = 21.8 - (-3.1) + (3.5 + 4.0)/4 = 26.775

Of course if the game is at a neutral site then simply omit the homefield term altogether.

Ascore = 26.4 - 2.6 = 23.8
Bscore = 21.8 - (-3.1) = 24.9

Now obviously a team can't score a fraction of a point. The decimal part in a prediction is because it is an average score that we might expect if the game could be played many times. It is OK to round to the nearest integer (or in the case of football to the nearest likely point total since scores like 5 or 8 are quite rare).

If we know the predicted score, the margin of victory and over / under predictions are given by:

Margin = Ascore - Bscore
Over / Under = Ascore + Bscore

Here we assume that Ascore > Bscore, indicating that team A is favored. So in the previous example we would favor A by (25.675 - 23.025) = 2.65 points at home, but would favor B by (26.775 - 21.925) = 4.85 points if they were at home. The predicted Over / Under is always (25.675 + 23.025) = (26.775 + 21.925) = 48.7 points regardless of the site of the game.

Probabilities are computed from the ratings by considering the predicted margin of victory and assuming a normal distribution of possible game results. The standard deviation is estimated from previous games.

Massey 2.1

Version 2.1 also predicts the final score, but uses a slightly simpler formula since there is a universal homefield constant.

Ascore = Aoff - Bdef +/- h/2
Bscore = Boff - Adef -/+ h/2

universal homefield = 3.0

              Off             Def       
             ----            ----       
Team A       26.4            -3.1       
Team B       21.8             2.6       
The predicted score if the game were played at A's homefield would be:

Ascore = 26.4 - 2.6 + 3.0/2 = 25.3
Bscore = 21.8 - (-3.1) - 3.0/2 = 23.4

If instead the game is at B, then reverse the signs on the homefield term:

Ascore = 26.4 - 2.6 - 3.0/2 = 22.3
Bscore = 21.8 - (-3.1) + 3.0/2 = 26.4

Of course if the game is at a neutral site then simply omit the homefield term altogether.

Ascore = 26.4 - 2.6 = 23.8
Bscore = 21.8 - (-3.1) = 24.9

Sauceda

The Sauceda rating system alone is capable of predicting the probability of a game's outcome, but not the margin of victory. However, since the ratings are linear I do a best fit to determine the proper scaling adjustment to translate the rating diffences into predicted point margins.

E-Ratings

The E-Rating system alone is capable of predicting the winner of the game, and the score ratio. However, I have developed a scheme of translating this information into a probability of victory as well. The E-Ratings are on an exponential scale, so after taking logarithms and choosing an appropriate scale factor, it is also possible to estimate margins of victory.
Kenneth Massey
June 2, 2002 Massey Ratings