What does the SPI rating represent?
The basic concept is that the SPI rating represents a team's overall skill level. The SPI ratings are intended to be forward-looking. They measure a team's relative likelihood of victory if a competitive match were to be held tomorrow. This concept may differ slightly from a retrospective or backward-looking ratings system. We aren't trying to reward or punish teams based on past results so much as we are trying to predict which teams will have the most success in the immediate future.
SPI ratings span from a theoretical minimum of 0 to a theoretical maximum of 100. A team with a rating of 100 would be a lock to beat every other national team; a team with a rating of 0 would be guaranteed to lose to every other national team. Technically speaking, the SPI rating reflects the percentage of possible points that a team would pick up if it played every other national team in the world exactly once on a neutral field. For instance, if Spain had a rating of 91 and were to play such a round-robin, our ratings might predict that it would win about 88 percent (scoring 3 points), draw 9 percent (scoring 1 point) and lose 3 percent (scoring none) of its games. Spain's overall rating would then be …
(0.88 x 3) + (0.09 x 1) + (0.03 x 0) = 0.91
(1.00 x 3)
… or 0.91, which we express as "91" (without the decimal place).
As a general guideline, the following terms can be used to describe national teams:
• 85+: Elite
• 80-84: Very strong
• 75-79: Strong
• 70-74: Good
• 60-69: Competitive
• 50-59: Marginal
• 25-49: Weak
• 0-24: Very weak
What do the offensive and defensive ratings represent?
The offensive ("OFF") and defensive ("DEF") ratings reflect the average number of goals a team would be projected to score and concede, respectively, if it played an "average" international team. For instance, a team with an OFF rating of 2.00 and a DEF rating of 1.00 would be expected to beat an "average" opponent 2-1. Bear in mind that there are more than 200 national football teams in the world, so what we describe as "average" is a team that would be well below the standards of what is usually seen in World Cup competition (example: Lithuania or Canada -- teams that would rank in the 50s or 60s among international teams worldwide).
The OFF and DEF ratings for any two teams can be combined to create a prediction about the teams' chances to win or draw the game. This uses a statistical technique called multiple logit regression; basically, we examine a database of thousands of past games to see what happened when teams with similar OFF and DEF ratings faced one another.
How are the game-based ratings and player-based ratings combined to form the SPI rating?
The SPI ratings really consist of two distinct (though interrelated) ratings systems, which are combined to provide an overall assessment of team quality. One of these -- the game-based ratings -- reflects the team's results in recent international play. The other -- the player-based ratings -- reflects the performance of individual players in both international competition and club play (specifically, the "big four" leagues: English Premier, Spanish Primera, Italian Serie A, German Bundesliga -- plus the European Champions' League). The extent to which we weight one rating over the other depends upon the relative amount of data we have in each category. For instance, in the case of Russia (which has very few players in the big four leagues), the overall rating is based almost entirely on its game-level results. For a team such as England, whose roster is loaded with players who compete on elite teams in the Premiership or other major European leagues, the player-based ratings are more reliable and have more influence. In general, the game-based and player-based ratings are very strongly correlated with one another, although there are a few exceptions.
Do the teams with a high number of players in the English Premier League, the Spanish Primera Division, the Italian Serie A and the German Bundesliga have an advantage in this ratings system?
No. A player won't necessarily help his team's ratings merely because he plays in one of the big four leagues. Instead, he has to achieve some success in club play. The ratings are very carefully calibrated. In fact, such a player is equally likely to help or hurt his national team's overall rating as a result of his club play. If the player is playing well and/or is playing for a successful club team, he will most likely help his national team's rating. But if a player is performing poorly and/or competes for a mediocre club team, he may easily harm it.
How do you account for the fact that in some games -- certain friendlies, for example -- some elite teams don't play their "A team"?
For each game, we examine each team's lineups and use these to assign a "competitiveness coefficient" to each game. This figure reflects the degree of overlap between the lineup used for that game and those lineups used in competitions that we know to be important (World Cup, Confederations Cup, European Championships and World Cup qualifiers). For instance, in the 2009 Gold Cup final between the United States and Mexico, both teams were using players who rarely factor in the more important matches involving these clubs. The competitiveness coefficient for the game was only 0.01, meaning it is assigned only 1/100th the weight of a game in the World Cup finals.
The competitiveness coefficient is multiplicative; in other words, both teams must field competitive lineups in order for the game to score highly. If Brazil plays Colombia, and Colombia is using an "A" lineup but Brazil is using a "B" lineup, the game will not be heavily weighted.
In some cases, lineups for the game are unavailable. In these instances, a default weighting is used based on the type of competition. By default, for example, a friendly match is weighted as being between one-sixth and one-fifth as important as a World Cup game.
How can you accurately predict soccer performance based on scores and stats without any scouting or watching every team play? Soccer is a rich, wonderful and unpredictable sport, and indeed it would be quite a shame if a single number could tell us everything that we needed to know about a soccer team. Fortunately, ours do not. They merely reflect the relatively limited statistical information that is available in international soccer, and they do so in a way that is as fair and accurate as possible. In other words, they serve as a general guideline -- a starting point for debates about club quality. They are not intended to be the end point or to settle all arguments.
Further, it should be kept in mind that even relatively large differences in ratings are hardly insurmountable. For instance, a team with an SPI score of 70 would generally defeat a team with an SPI rating of 80 about 20 percent of the time, and would draw another 30 percent of the time.
Will every World Cup qualifier be ranked in the top 32?
Certainly not. On some continents, particularly Europe and South America, qualification is extremely competitive and a single upset or two could ruin a team's dream of advancing to South Africa. As a general rule of thumb, it takes between 30 and 40 competitive matches in order to get a very solid idea of team quality, so a couple of matches in which a team underachieves or is unlucky may not tell us all that much.
In addition, the World Cup qualification process is not designed to qualify the 32 strongest teams; instead, it seeks to create some balance among the different continents. For instance, whereas the World Cup draw will provide for between four and five slots for teams from Asia, we rate only one Asian team (Australia) among our initial top 32. By contrast, we have 19 European teams rated in our top 32 -- a bunch that will compete for only 13 available slots. Some European and South American teams -- probably including one or two very strong clubs -- will lose the game of musical chairs.
Are more recent games treated differently from games from two or three years ago?
Recent games are weighted more heavily, but the extent to which this is true depends on the number of competitive games a particular team has played. For instance, in the case of Spain, which plays competitive matches very frequently, the algorithm sets a cutoff point of about 3.5 years: A game played 3.5 years ago is weighted at 0; a game played yesterday is weighted at 1, with a straight-line adjustment in between. (So, for example, a game played 1.5 years ago is weighted at about 0.6.)
For teams that play competitive games less frequently, a larger -- sometimes much larger -- window is used. We will reach back in excess of eight years to evaluate New Zealand's play, for instance. Although this is less than ideal -- very few of the players on New Zealand's roster from eight years ago remain with the team today -- our statistical tests nevertheless verify that this is a more reliable approach than trying to rate a team based on a handful of recent games. This is presumably because, for most international sides, soccer skill tends to be at least relatively consistent across time, related to factors such as population, wealth and soccer tradition.
In addition to this, there is a slight bonus given to games played within the past 100 days, in order to better reflect very near-term fluctuations in form.
How does the impact of the game's site affect a ranking?
The result of each game is adjusted for home-field advantage, which is equivalent to about 0.57 goals in international soccer. Home field is very important in international soccer, as compared with virtually any other sport. In addition, the algorithm distinguishes games played at neutral sites and splits the away team bonus evenly between the two clubs.
What is the effect of margin of victory?
Our research has indicated that margin of victory is a much better predictor of future results than wins and losses taken by themselves. We do not, therefore, apply any diminishing returns factor to larger margins of victory. On the other hand, goals scored and allowed receive a very strict adjustment based on the quality of opponent. If a team beats Brazil 2-1, the win might be treated as the equivalent of a 4-0 victory against an average club. But if a team beats American Samoa 31-0 (as Australia did in 2001), it will get credit for having scored the equivalent of only about 2.5 goals.
What is more important: scoring goals or preventing them?
In games between two elite teams, the DEF rating -- that is, goal prevention -- tends to be slightly more important than the OFF rating. This is one reason strong defensive teams such as Italy have had more than their share of success in the World Cup.
What is the biggest difference between the ESPN SPI and FIFA's rankings?
What we see as the key differentiators between our ratings and FIFA's are as follows:
• SPI uses detailed "competitiveness coefficients" based on lineups and rosters to assess the true quality and importance of particular matches.
• SPI uses a flexible "assessment period" -- we don't need to go as far back in time to rate teams that play more frequently.
• SPI uses an advanced, iterative calculation of opponent strength -- similar to those used by superior college basketball and college football ratings systems -- rather than assigning arbitrary continent coefficients as other systems such as FIFA's do.
• SPI creates separate ratings for offense and defense.
• SPI accounts for goal differential.
• SPI accounts for home-field advantage.
• SPI's ratings are intended explicitly to be forward-looking and predictive; other ratings systems may have different objectives.
• SPI evaluates how a team's players performed in club play as well as international play.
Nate Silver is a contributor to ESPN.com and ESPN the Magazine, and is an author of Baseball Prospectus.