Massey Ratings Description

Key

Rankings are listed in parenthesis beside the ratings.


The content below is older, and may be out-dated or incomplete.

The current version (3.0) of the Massey Ratings is described informally below.

Introduction

This is a brief description of the Massey Rating model. To keep it short and accessible, many technical details are ommitted. Since I began this hobby of computer ratings, the model has undergone several revisions. Overall, the Massey Ratings have been published since December, 1995. Version 3.0 has been implemented since August, 1999.

I believe that the Massey Ratings are the most scientific and full-featured system available for analyzing the performance of members of a competitive league. Future improvements are likely in an effort to further refine the model to accurately reflect the nature of sports.

Goals

The first challenge for any computer rating system is to account for the variability in performance. A team will not always play up to its full potential. Other random factors (officiating, bounce of the ball) may also affect the outcome of a game. The computer must somehow eliminate the "noise" which obscures the true strength of a team.

The second goal is to account for the differences in schedule. When there is a large disparity in schedule strength, win-loss records lose their significance. The computer must evaluate games involving mismatched opponents, as well as contests between well matched teams.

It is neccessary to achieve a reasonable balance between rewarding teams for wins, convincing wins, and playing a tough schedule. This issue is difficult to resolve, and rating systems exist that are based on each of the extremes.

Inputs

Only the score, venue, and date of each game are used to calculate the Massey ratings. Stats such as rushing yards, rebounds, or field-goal percentage are not included. Nor are game conditions such as weather, crowd noise, day/night, or grass/artificial turf. Overtime games are not treated any differently. Finally, neither injuries nor psychological factors like motivation are considered. While none of these are analyzed explicitly, they may be implicitly manifested through the game scores.

Predictions

The Massey Ratings are designed to measure past performance, not necessarily to predict future outcomes.

Rating

The overall team rating is a merit based quantity, and is the result of applying a Bayesian win-loss correction to the power rating.

Power

In contrast to the overall rating, the Power is a better measure of potential and is less concerned with actual wins-losses.

Offense / Defense

A team's Offense power rating essentially measures the ability to score points. This does not distinguish how points are scored, so good defensive play that leads to scoring will be reflected in the Offense rating. In general, the offensive rating can be interpretted as the number of points a team would be expected to score against an average defense.

Similarly, a team's Defense power rating reflects the ability to prevent its opponent from scoring. An average defense will be rated at zero. Positive or negative defensive ratings would respectively lower or raise the opponent's expected score accordingly.

It should be emphasized that the Off/Def breakdown is simply a post-processing step, and as such has no bearing on the overall rating. A consequence of this is that the Off/Def ratings may not always match actual production numbers. A team that routinely wins close games may have somewhat inflated Off/Def ratings to reflect the fact that they are likely to play well when they have to. Winning games requires more than just the ability to score points, but also teamwork, mental strength, and consistency. The Off/Def breakdown is simply an estimate of how much of a team's strength can be attributed to good offensive and defensive play respectively.

Home Advantage

Each team's home advantage is estimated based on the difference in performance when at home and on the road. Ratings and schedule strength both depend on where the games are played.

Schedule

The difficulty of each team's schedule is measured in the Sched column. It depends on the quality of each opponent, adjusted for the homefield advantage. More
details are available. Note that the schedule strength only represents games played to date.

Standard Deviation

As was mentioned before, the Massey model will in some sense minimize the unexplained error (noise). Upsets will occur and it is impossible (and also counter-productive) to get an exact fit to the actual game outcomes. Hence, I publish an estimated standard deviation. About 68% of observed game results will fall within one standard deviation of the expected ("average") result.

Conference Ratings

Below the team ratings, you will find a listing of the leagues, conferences, and divisions. The win / loss records include only inter-conference games. For conference games, the win / loss percentage will always be 50%, so it is not beneficial to include them. A conference's rating is determined by averaging the ratings of its members.

The Parity column measures how well matched the teams in a conference are. A value of 1 indicates perfect parity - there is complete balance from top to bottom. In contrast, a parity near 0 indicates that there is great disparity between the good and bad teams in the conference.

Preseason Ratings

Preseason ratings are typically derived as a weighted average of previous years' final ratings. As the current season progresses, their effect gets damped out completely. The only purpose preseason ratings serve is to provide a reasonable starting point for the computer. Mathematically, they guarantee a unique solution to the equations early in the season when not enough data is available yet.

Ratings Overview

In essence, each game "connects" two teams via an equation. As more games are played, eventually each team is connected to every other team through some chain of games. When this happens, the system of equations is coupled and a computer is necessary to solve them simultaneously.

The ratings are totally interdependent, so that a team's rating is affected by games in which it didn't even play. The solution therefore effectively depends on an infinite chain of opponents, opponents' opponents, opponents' opponents' opponents, etc. The final ratings represent a state of equilibrium in which each team's rating is exactly balanced by its good and bad performances.

Game Outcome Function (GOF)

Given the score of a game, GOF(pA,pB) assigns a number between 0 and 1 that estimates the probability that team A would win a rematch under the same conditions. Based on previous experience, it seems reasonable to distinguish between a 10-0 win and a 50-40 win. A close high scoring game is likely to have more variance, and less likely to be dominated by either team. While a low scoring game may indicate a defensive struggle, or poor game conditions. In which case, a small deficit is more difficult to overcome. Sample GOF values are listed below:

A's points (pA)B's points (pB)GOF(pA,pB)
30290.5270
1090.5359
27240.5836
27200.6924
50400.7292
1000.8548
30140.8786
45210.9433
45140.9823
3000.9920
5630.9998

Each game score is plugged into a GOF that outputs the estimated probability that team A would win if the game were played again under the same conditions. This is independent of any other information since it involves only that one game in isolation. For example, it may be determined that the winner in a 30-14 game has a 88% chance of winning a rematch, while a 27-24 winner only has a 58% of winning again.

Notice that a diminishing returns principle is manifested in this GOF. There is some advantage to winning "comfortably," but limited benefit to running up the score. A team will not be penalized just for playing a weak opponent (although it becomes much harder to improve its rating by blowing someone out).

Calculate Ratings

Each team's gametime performance is assumed to be normally distributed about a certain mean (its rating). The probability that team A would defeat team B is then determined from the cumulative distribution function (CDF) associated with a normal random variable.

Let p = Prob(A beats B) = F(rA,rB,hA,hB), where rA,hA and rB,hB are ratings and home advantages of teams A and B respectively. F is a function of rA,rB,hA,hB that is based on the CDF of a normal random variable.

All the game scores are translated to a scale from 0 to 1 by the GOF. Let g = GOF(pA,pB), where pA and pB are the points actually scored by teams A and B in a particular game.

A nonlinear function of the teams' ratings is formed by multiplying terms that look like:

p^g * (1-p)^(1-g)

Here ^ denotes an exponent. Also note that 0 <= p,g <= 1. By maximizing the resulting function, maximum liklihood estimates (MLE) are obtained for the ratings and home advantages. The optimization problem may be solved with standard techniques such as Newton's method.

Preseason ratings may be implemented via prior distribution factors in the optimization function. Their importance diminishes as the season progresses, and they are negligable by the end of the year. A strong prior distribution must be used to compensate for lack of enough single season data for the home advantages.

Time weighting is a debatable practice, however I believe that more recent games are generally better indications of a team's true strength. An exponential decay based time weighting is applied by premultiplying g by some weight w.

Bayesian Correction

The results obtained by the MLE will be predictive in nature since they are based entirely on the scores of games and contain no provision for teams that win, but don't always win big. Other teams will tend to perform in a way that is highly correlated with the strength of their opponent. Differences in style, coaching philosophy, and performance in close games can easily be overlooked if we look at scores alone.

The MLE ratings are used to create a prior distribution, which encodes the estimate of a team's strength based on looking at its game scores alone. The Bayesian correction is computed as an expected value using the actual wins and losses (and who they were against), combined with the prior distribution. This helps account for the possiblity of correlating performances (a team playing up or down to its opponent).

The advantage of the Bayesian approach is that it rewards teams that win consistently, no matter how they do it. The more games a team wins, the more confident the computer can be that scores are not so important. Ratings are less likely to be negatively impacted by beating a poor team. Furthermore, games involving well-matched opponents will naturally be given priority in determining the overall ratings.

Postprocessing

After the ratings and home advantages have been determined, the following additional steps are taken:

Output

My code implements a markup language that allows me to generate multiple web pages automaticly.


Contact | Massey Ratings | Theory