Friday, April 10, 2009
Revised CAIRO Playoff Odds through games of April 9
While tooling around the internets, I found a cool Monte Carlo simulator spreadsheet for the baseball season at a site called xlsSports. I've modified it to import the current standings and then run the season going forward, and I've set it up to use a weighted average of YTD and 2009 projections to figure out the strength of the teams. I've also modified the basic Pythagorean theorem formula it uses to the more accurate PythagenPat formula. Both of those formulas use a team's runs scored and runs allowed to determine the strength of the team and calculate it's winning percentage going forward.Anyway, what this will let me do is run updated playoff odds for the six projection systems I used in the Diamond Mind Projection Blowout, as well as with the combined projections whenever I feel like it. I'll create a page where I will keep these updated, but for now here's a sneak peak at the CAIRO version, run 10,000 times.
| System | cairo | |||||||||
| Div | Team | W | L | RF | RA | Div% | WC% | PO% | Max | Min |
| ALE | TAM | 94.2 | 67.8 | 804 | 695 | 38.9% | 29.1% | 68.0% | 121 | 67 |
| ALE | NYA | 93.7 | 68.3 | 867 | 724 | 35.1% | 29.8% | 65.0% | 122 | 62 |
| ALE | BOS | 92.1 | 69.9 | 843 | 739 | 25.4% | 28.6% | 54.0% | 118 | 67 |
| ALE | TOR | 77.4 | 84.6 | 690 | 717 | 0.5% | 1.6% | 2.0% | 105 | 50 |
| ALE | BAL | 73.5 | 88.5 | 801 | 870 | 0.1% | 0.4% | 0.5% | 100 | 48 |
| Div | Team | W | L | RF | RA | Div | WC | PO% | Max | Min |
| ALC | CLE | 84.4 | 77.6 | 810 | 808 | 38.6% | 1.3% | 39.9% | 112 | 53 |
| ALC | DET | 84.2 | 77.8 | 774 | 764 | 37.4% | 1.2% | 38.6% | 110 | 56 |
| ALC | MIN | 79.2 | 82.8 | 718 | 748 | 13.7% | 0.6% | 14.3% | 106 | 52 |
| ALC | KC | 76.2 | 85.8 | 717 | 835 | 6.5% | 0.3% | 6.8% | 103 | 50 |
| ALC | CHA | 74.1 | 87.9 | 739 | 782 | 3.7% | 0.2% | 3.8% | 102 | 46 |
| Div | Team | W | L | RF | RA | Div | WC | PO% | Max | Min |
| ALW | LAA | 86.2 | 75.8 | 768 | 729 | 41.9% | 2.5% | 44.4% | 112 | 57 |
| ALW | OAK | 85.4 | 76.6 | 767 | 752 | 35.7% | 2.6% | 38.3% | 110 | 58 |
| ALW | SEA | 81.7 | 80.3 | 721 | 728 | 17.6% | 1.6% | 19.2% | 114 | 52 |
| ALW | TEX | 76.2 | 85.8 | 820 | 881 | 4.7% | 0.4% | 5.1% | 103 | 47 |
| Div | Team | W | L | RF | RA | Div | WC | PO% | Max | Min |
| NLE | NYN | 91.5 | 70.5 | 842 | 778 | 42.4% | 16.3% | 58.7% | 118 | 62 |
| NLE | ATL | 91.3 | 70.7 | 800 | 730 | 40.7% | 16.3% | 57.1% | 119 | 63 |
| NLE | PHI | 86.0 | 76.0 | 834 | 798 | 13.9% | 10.4% | 24.3% | 113 | 54 |
| NLE | FLA | 79.3 | 82.7 | 777 | 836 | 2.6% | 2.4% | 5.0% | 105 | 53 |
| NLE | PIT | 74.2 | 87.8 | 799 | 903 | 0.3% | 0.5% | 0.8% | 102 | 50 |
| Div | Team | W | L | RF | RA | Div | WC | PO% | Max | Min |
| NLC | CHN | 96.3 | 65.7 | 845 | 730 | 77.2% | 7.7% | 85.0% | 124 | 69 |
| NLC | STL | 86.7 | 75.3 | 797 | 745 | 13.8% | 14.6% | 28.4% | 113 | 58 |
| NLC | MIL | 82.9 | 79.1 | 780 | 784 | 5.6% | 7.0% | 12.6% | 112 | 55 |
| NLC | CIN | 80.9 | 81.1 | 738 | 781 | 3.1% | 4.0% | 7.2% | 111 | 53 |
| NLC | HOU | 72.8 | 89.2 | 740 | 829 | 0.2% | 0.3% | 0.5% | 100 | 47 |
| NLC | WAS | 70.8 | 91.2 | 763 | 885 | 0.1% | 0.1% | 0.2% | 97 | 42 |
| Div | Team | W | L | RF | RA | Div | WC | PO% | Max | Min |
| NLC | LAN | 90.7 | 71.3 | 818 | 761 | 56.1% | 5.9% | 62.0% | 118 | 63 |
| NLC | SF | 84.7 | 77.3 | 764 | 746 | 18.0% | 5.6% | 23.6% | 112 | 56 |
| NLC | COL | 83.2 | 78.8 | 841 | 822 | 13.1% | 4.4% | 17.5% | 111 | 58 |
| NLC | ARI | 82.5 | 79.5 | 739 | 724 | 11.0% | 3.8% | 14.8% | 111 | 57 |
| NLC | SD | 75.8 | 86.2 | 729 | 820 | 1.8% | 0.6% | 2.4% | 101 | 49 |
RF: Runs for
RA: Runs against
Div%: Percentage of times the team won their division
WC%: Percentage of times the team won the wild card
PO%: Playoff % (Div% + WC%)
Max: High win total
Min: Low win total
One note, this is a blatant ripoff of Baseball Prospectus's various Playoff Odds Reports, except that I know what the input data is so I'm more comfortable with it. If anyone sees anything that doesn't look right, let me know.
Comments
Color me confused. I thought the preseason projected order of finish for the AL East was NYY, Bos, TB. Now, after just three games, it’s TB, NY, Bos?
Now, after just three games, it’s TB, NY, Bos?
I think the teams are so evenly matched that games in hand are going to be pretty significant, although maybe I’m overweighing 2009. I’m using the formula of:
Total 2009 MLB games played to date divided by 2430 times the teams’ actual runs scored and runs allowed to date pro-rated to 162 games plus 2430 minus 2009 MLB games played to date times the teams’ projected runs scored and runs allowed for their revised runs scored/allowed.
Tampa taking 2 out of 3 from Boston is probably non-trivial to both teams’ playoff chances in this type of methodology. Whether that’s true or not, I’m not really sure.
Looking at B Pro’s various odds they seem to have swung pretty big as well. They had the Yankees projected to go 99-63 before the season started and now their PECOTA playoff odds has them at 95 wins.
Then again, it just might be too early to run this type of thing, or at least take it too seriously.
Eh, they were all pretty close to start. I could see TB winning 2 of 3 against BOS and the Yanks losing 2 of 3 to BAL could swap some stuff around. TB got three tough games (on the road) out of the way, the Yanks lost to theoretically the weakest team in the division.
jI could see TB winning 2 of 3 against BOS and the Yanks losing 2 of 3 to BAL could swap some stuff around. TB got three tough games (on the road) out of the way, the Yanks lost to theoretically the weakest team in the division.
Right. At this point, log5 would have said Boston and the Yankees should be 2-1, and Tampa Bay should be 1-2. So you’ve got a 1 game swing on all three teams already.
i thought we could just extrapolate. you mean the Yankees might not go 54-108 this year?
I think what’s troubling is that a pre-season projection of, say, 98 wins incorporates the assumption that the team will, at various points in the season, lose two out of three. I don’t recall anyone saying two weeks ago, “the Yankees are projected to win 98 games, but if they begin the season 1-2 (outscoring their opponent 21-18 over those 3 games), then they are only a 93-win team.” That makes no intuitive sense. I sense that we’re mixing apples and oranges here.
you mean the Yankees might not go 54-108 this year?
Apparently not, although 62-100 is still in play.
I don’t recall anyone saying two weeks ago, “the Yankees are projected to win 98 games, but if they begin the season 1-2 (outscoring their opponent 21-18 over those 3 games), then they are only a 93-win team.” That makes no intuitive sense. I sense that we’re mixing apples and oranges here.
And if the Yankees were projected to win 98 games, I’d agree. But they weren’t, they were projected to win 96 by CAIRO. So their going forward projection assuming 2009 tells us nothing about their ability would be 96/162 times the 159 remaining games, 94-65. 1-2 + 94-65 = 95-67. The Monte Carlo spreadsheet includes a higher standard deviation than my Diamond projections btw, to account for the greater volatility in a season, which is probably where 95-67 becomes 93.7-68.3.
isn’t this just Bayes’ Theorem (it’s been 10 years, cut me some slack)?
isn’t this just Bayes’ Theorem (it’s been 10 years, cut me some slack)?
Possibly, but if it were just the Bayes theorem, then the volatility (standard deviation) of the posterior distribution would decrease with additional data (assuming the predictions that we started with were the prior distributions). I don’t necessarily see that, but I could be missing something.
Any system that has every team (except the Nationals) winning 100+ games at least once should probably be promptly ignored.
How much playing time was attributed to David Price?
There are 10,000 seasons. I’m sure with that many any number of fluky things can happen and pretty much any team could win over 100. That’s almost 2 orders of magnitude more games than have been played real history.
I didn’t get to comment yesterday on Swisher when the comparisons were made with Giambi, but with his eyeblack on and his squinty eyes, he sort of looks to me like Don Mattingly. Hope he ends up hitting like the pre 1990 version.
The articles the first few days by Klapish on how Tex and CC were a waste of money are now followed up by an article that praises Burnett for his guts.
What a ridiculous thing to say. Why would a player give a guy an interview after such BS?
Possibly, but if it were just the Bayes theorem, then the volatility (standard deviation) of the posterior distribution would decrease with additional data (assuming the predictions that we started with were the prior distributions). I don’t necessarily see that, but I could be missing something.
According to this article by Keith Woolner, the standard deviation for team wins even if we have perfect information (ie, we nail every single player’s projection and playing time) is 6.3. When I run the projections through Diamond Mind, I get a standard deviation around 6.7, which is not really high enough given the fact that we don’t have perfect information. Ideally, team wins should have a standard deviation in the 8-9 area. This Monte Carlo simulator accounts for that by randomly modifying the teams’ talent level slightly during each iteration. So in some iterations, the Yankees are a 105 win team, in some they are an 80 win team, etc., That’s why you’re seeing a higher standard deviation than we saw in the Diamond Mind projections.
Any system that has every team (except the Nationals) winning 100+ games at least once should probably be promptly ignored.
Anyone is welcome to ignore anything I post. However, it’s a statistical fact that at this point in the season, just about every team has a non-zero chance to win 100 games. Obviously as we get farther into the season, most teams will regress towards their expected talent level and you’ll see fewer 100 win/100 loss teams.
How much playing time was attributed to David Price?
Around 140 innings.
There are 10,000 seasons. I’m sure with that many any number of fluky things can happen and pretty much any team could win over 100. That’s almost 2 orders of magnitude more games than have been played real history.
Exactly.
Why would a player give a guy an interview after such BS?
Because it may actually be worse for them if they don’t? Some reporters I think (and I’m not saying Klapish is one of them) will attack players if they don’t feel they are treated with respect. So not giving the interview the article instead could be, “Burnett was lucky to get through the 4th, never mind get the win”.
However, it’s a statistical fact that at this point in the season, just about every team has a non-zero chance to win 100 games.
Sounds like a fun project. How many teams that going into the season that had “no chance” of being that good, finished with a winning pct above .615? .615 is basically 100 wins in a 162 game schedule, and I think that may be more interesting than setting 100. That would be 94 wins in 154 game schedule.
CAIRO *may* be able to tell us that. I know in the past you (SG) have given some projections for past players. But I’m not sure how well CAIRO would handle pre-Retrosheet players (though we could limit it to 1955 forward), or if you have a machine with enough horsepower to do it in a reasonable amount of time. If I were ambitious, I’d do it myself…I can say that about a lot of things.
Some reporters I think (and I’m not saying Klapish is one of them) will attack players if they don’t feel they are treated with respect.
This couldn’t be more apparent in Abraham’s treatment of A-Rod: he despises A-Rod because A-Rod is indifferent to the press except for when it suits him. So Pete is on the offensive all the time. The sports media really is an absolute joke. You’ll have the occasional Rob Neyer or peripheral publication like BP, but otherwise, these guys are all-red-meat all-the-time, and self-important to boot. It’s disgusting.
Whoa, Kris Benson is making a start for the Rangers.
This Monte Carlo simulator accounts for that by randomly modifying the teams’ talent level slightly during each iteration. So in some iterations, the Yankees are a 105 win team, in some they are an 80 win team, etc., That’s why you’re seeing a higher standard deviation than we saw in the Diamond Mind projections.
Ahh, I see, so the Monte Carlo simulator is adding noise to the prior, which increases the SD of the prior relative to what was there in the predictions. Makes sense now.
For a sports reporter to call Sabathia and Teixeira a waste of money two games into a 162 season is like a U.S. president not knowing where to find Canada or Mexico on a map. This goes beyond run-of-the-mill dunderheadedness. It’s double digit IQ, Ian (if you don’t trade for Eric Gagne your season is over) O’Connor type stupidity.
Whoa, Kris Benson is making a start for the Rangers.
Take that junk to the RLMB.
Using the Monte Carlo simulation with production volitility seems to make more sense than straight projections. This seemingly would account for the win/loss effect of Player_X outperforming while Player_Y underperforms. Even if their full year projections are exactly the same, their placement in the batting order etc. would cause run differentials leading to win/loss variability.
Obviously as we get farther into the season, most teams will regress towards their expected talent level and you’ll see fewer 100 win/100 loss teams.
I would imagine that if we “Monte Carlo’d” the 2008 season we could compare the day by day projections to the season ending actuals to determine at approx what point in the season the projection becomes reliable. Basically, when does Monte Carlo say that the mid year sample size of games completed is large enough to accurately project ending standings.
Using the Monte Carlo simulation with production volitility seems to make more sense than straight projections. This seemingly would account for the win/loss effect of Player_X outperforming while Player_Y underperforms. Even if their full year projections are exactly the same, their placement in the batting order etc. would cause run differentials leading to win/loss variability.
Yeah, I think so. I think what I will do going forward is run the 1000 iterations in Diamond Mind like I’ve done in the past to build a base of team expectations as far as runs scored/allowed, etc., and then feed that data into the Monte Carlo simulator and run 10,000 more iterations and then present both datasets.
I would imagine that if we “Monte Carlo’d” the 2008 season we could compare the day by day projections to the season ending actuals to determine at approx what point in the season the projection becomes reliable. Basically, when does Monte Carlo say that the mid year sample size of games completed is large enough to accurately project ending standings.
I like this idea a lot, but it may take me a bit to figure out how to rig the spreadsheet to do that. I’ll see what I can do over the next few weeks.
Projections, schmojections.
The Yanks aren’t gonna be in the post-season. I heard it on ESPN, and that’s all there is to it. See ya in 2010.
I’m not sure I understand the RS projection. At the moment the Yanks have averaged 7 runs per game. Now, I don’t think 7 is sustainable but I do think 5.5 is and that projects to 891. Am I being overly optimistic?
14 what reminds me of Giambi with Swisher is that with Giambi I felt like every at bat went like this:
Called Strike
Swinging Strike
Foul
Ball
Ball
Ball
Foul
Foul
Ball
Swisher didn’t walk yesterday, but he just seems to have the same knack for making pitchers throw a lot of pitches.
It’s not really his appearance that reminds me of Jason, but he does seem to have the same energy, minus the expectations of being the team MVP.
Am I being overly optimistic?
Relative to preseason projections, yeah. However, there’s reason to think the Yankees have more upside offensively than projected.
1) I assumed Jorge Posada would only get about 400 PA. If he’s closer to 500 PA, that’s probably another 5-10 runs.
2) I assumed Alex Rodriguez would only get about 500 PA. If he’s back ahead of schedule he should get more than that.
3) Maybe Gardner outhits his poor projections?
4) Maybe Cano is closer to his 65% projections (.307/.349/.495) than his baseline(.296/.332/.465)? That’s around 10 more runs.
5) Maybe Matsui plays more than expected (400 PA) and exceeds his projections since they are partially skewed by his post-injury return last year?
Still, I wouldn’t read too much into a few high scoring games against Baltimore. They’re going to have 38 games against two of the better run preventions teams in baseball (Boston/Tampa Bay) and Toronto may be strong there as well.
6) Jeter may have been more hampered with small injuries in 07-08 than recognized.
so is the Royals broadcast streamed?
Delayed for ceremony? In KC? What, morituri te salutant?
Next entry: NY Yankees (1-2) at Kansas City (2-1) - 4:10 PM ET **Game Chatter/Liveblog**
Previous entry: Daily News: A.J. Burnett sharp, bats come alive as Yankees avert sweep against Orioles
There are currently 69 visitors who are not logged in.
There was a record 241 simultaneous visitors on May 2, 2011 at 11:54:25 pm.
Logged in users: PredX











