Description

This is a standard ratings system which considers margin of victory and a home court advantage. The idea is that the difference in rating between any pair of teams (plus a boost for the home team) gives a predicted scoring margin for a game between those teams. The predicted margin is then compared to the actual scoring margin when the teams play.

Of course, no set of ratings will get these actual scoring margins correct for all games, so for each game there is going to be some error. The definition of the least squares rating system, then, is that it is the set of team ratings which gives the minimum squared error.

To be more specific: let's say team A, with rating RA, is hosting team B, with rating RB, and that the home advantage is some number we'll call H. Then the predicted scoring margin for this game is

(predicted margin for A against B) = RA + H - RB

Note that this number can be positive, which means A is favored to win, or negative, which means B is favored to win. If A and B are meeting on a neutral court, we just leave out the term H above.

Now the game is played and team A scores SA points while team B scores SB, so that

(actual margin for A against B) = SA - SB

Of course, when this number is positive it means A won, and negative means B won. The difference

(predicted) - (actual) = (RA + H - RB) - (SA - SB)

tells us whether our prediction overshoots, undershoots, or gets the margin just right. We want to minimize our error, so we want to penalize both overshooting and undershooting the scoring margin. One way to do this to square the difference

(error) = (predicted - actual)2

which means the error is zero if we predict the margin correctly, and greater than zero if we're wrong in either direction. And the more wrong we are, the more we are penalized.

And that's almost it. Now imagine we have assigned a rating to each team and a guess for the home advantage factor. We go through every game that has been played so far and calculate the total error by the formula above. The result depends on which ratings we assigned, so we can try again with a second guess for the ratings (and home factor) and get a second error value. If the second error value is less than the first, then our second guess for the ratings is considered better than the first guess.

In reality, we don't have to do this in a hunt-and-peck kind of way. We can use some linear algebra to solve directly for the ratings values and home factor that give the lowest possible error. Note that this doesn't prove our ratings are the best possible ratings - they are just the best possible ratings given the definition of error above. Still, they do pretty well, in the same league as quality ratings like Massey, Pomeroy, and Sagarin, and a notch above ratings like RPI.