Tofu (thetofu ) wrote,

Baseball and Chess

In sports, or any competition, we always want to know who is the best. Also, we want to know who has the best chance to win the game. Some of us always want our favourite player or team to win. But knowing their chances may help with the expectations that may bring heartache or joy or a wager we may have with a friend.
There are many ways, statisticians and mathematicians have come up with, to determine outcomes and strengths of players or teams. One way, in Chess, is finding a players rating by a rating system. At Chesspark, we use a variation of the Glicko rating system. I have always wondered if this rating system would work well with team sports. So, being that I love baseball, I did some research (google search) on baseball and chess ratings. What follows, is a my first experiments in using glicko atings to determine game outcomes and finding the strength of a team.

First, I could not find any information on baseball and Glicko. I did find some on ELO and baseball. There is a site that does ELO ratings on soccer (football) teams as well. ELO and Glicko are rating systems used in Chess or other games involving two players. Team sports do not necessarily use these methods to determine the outcome of the next game. Baseball has a well known method called the Pythagorean expectation. It uses what the teams have done in previous games to determine the outcome. It uses runs scored and runs allowed. You can use this equation in other sports too. I would like to compare ratings versus the Pythagorean expectation and also investigate combining the two.I gathered data from http://www.retrosheet.org. Wrote a python script to parse the data, and compute the ratings. I will release code in upcoming blog posts. Using the script and data we can calculate the final ratings of teams at the end of the season. The results are interesting.

Glicko Rating results for the Major League Baseball 2007 Season:

====================== teams by rating ==========================
team                           rating  rd         total wins - losses 

New York Yankees               1580.0  28.465171  162   94   - 68 
Boston Red Sox                 1571.0  29.260524  162   96   - 66 
Cleveland Indians              1564.0  28.649712  162   96   - 66 
Los Angeles Angels of Anaheim  1562.0  28.888047  162   94   - 68 
Arizona Diamondbacks           1537.0  28.440955  162   90   - 72 
Colorado Rockies               1534.0  28.165939  163   90   - 73 
Detroit Tigers                 1533.0  28.574597  162   88   - 74 
Seattle Mariners               1532.0  28.210795  162   88   - 74 
Toronto Blue Jays              1529.0  28.210438  162   83   - 79 
Philadelphia Phillies          1529.0  28.447520  162   89   - 73 
San Diego Padres               1523.0  28.317698  163   89   - 74 
New York Mets                  1510.0  28.604631  162   88   - 74 
Los Angeles Dodgers            1505.0  28.615645  162   82   - 80 
Minnesota Twins                1495.0  28.312780  162   79   - 83 
Atlanta Braves                 1494.0  28.618743  162   84   - 78 
Chicago Cubs                   1494.0  28.402241  162   85   - 77 
Oakland Athletics              1488.0  28.395109  162   76   - 86 
Texas Rangers                  1487.0  28.586974  162   75   - 87 
Milwaukee Brewers              1474.0  28.445591  162   83   - 79 
St. Louis Cardinals            1464.0  28.453097  162   78   - 84 
Washington Nationals           1462.0  28.858994  162   73   - 89 
San Francisco Giants           1462.0  28.315478  162   71   - 91 
Chicago White Sox              1461.0  28.461948  162   72   - 90 
Kansas City Royals             1461.0  28.856424  162   69   - 93 
Baltimore Orioles              1460.0  28.501870  162   69   - 93 
Tampa Bay Rays                 1450.0  28.742325  162   66   - 96 
Houston Astros                 1448.0  28.453125  162   73   - 89 
Cincinnati Reds                1442.0  28.818955  162   72   - 90 
Florida Marlins                1438.0  28.412856  162   71   - 91 
Pittsburgh Pirates             1424.0  28.681253  162   68   - 94 


NOTE: This is a great example on how close in skill professional baseball teams are.


Now, lets take a game from the 2007 season and determine the outcome via glicko and the same percentage via the Pythagorean expectation. I will use Atlanta Braves games since they are my favourite team. Lets take the 42nd game Atlanta played. This game was versus the Boston Red Sox who was a high rated team all year.

NOTE: This determination is very simplistic and missing some steps, a more in-depth example may follow in other blog posts.

Boston had a record of 29 - 12 (Winning percentage 70.7%)
Atlanta had a record of 25 - 17 (WP 59.52%)

2007-05-19 ATL 1529.0 vs. BOS 1675.0

2007-05-19 ATL 52.562% vs. BOS 72.554%

By rating, Boston should win, they are the strongest. The chance for Atlanta to win is about 47.7%. The outcome of the game was Boston 13 and Atlanta 3. Boston crushed them. Before this, Atlanta split a series with Washington, a very weak team.

By winning percentage, we see that Boston's real percentage is lower than what their percentage should be and we see that Atlanta's real percentage is higher than the determined one. Using that we see that Boston is favoured to win unless luck comes into play.

Both methods show Boston being the favourite, but 10 runs seems too much. :)

Determining the actual chance that Boston will win is a bit more complicated than the above example, with both methods. It is easy to see that using each way shows that Boston is stronger and should win the game, and they did. You can also use other statistical methods to determine how many runs both teams could score and use that to help with determining the outcome. I want to find if I could use ratings to help or replace some of these.I believe ratings is a great indicator of how strong the team is right now (or at the moment) and is a simpler way when looking at strength of schedule and other factors in finding out who will win a series or finish first in the standings.

In posts to follow, I hope to start predicting the outcome of current games, and developing a system to make this easier for me and others. I plan on using XMPP, Pubsub and BOSH. I also want to investigate finding a pitcher or batter's rating using glicko. After that, who knows.


The baseball information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".
Tags: baseball, chess, fun, glicko, ratings, sports, statistics
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic
  • 1 comment