Wednesday, October 17, 2007
CAIRO Projections v0.1
One of the areas of baseball analysis that I've taken an interest in are projections. Some of the people I consider the best analysts out there have spent a lot of time and effort devising various projection systems. With Dan Szymborski's ZiPS, Sean Smith's CHONE, Tango Tiger's Marcels, Nate Silver's PECOTA, and some others I'm surely forgetting there's no shortage of systems out there.I have a stubborn side to me though. I have this burning need to understand how all these numbers work instead of just taking them as presented. So I've taken to trying to calculate a lot of stuff myself. That's why I decided to come up with my own projection system, which I've code-named CAIRO after my favorite bad baseball player. It will eventually stand for something, but I'm not quite sure what yet.
I don't expect CAIRO to be any better than any of the systems mentioned above, but at least now I don't have to wait for the others to come out. The other thing I want is to make this a reasonably open-source system. The heart of CAIRO is Tango Tiger's Marcel system, but with a few of my own tweaks. This is the methodology I am currently using.
1) Park adjust and league adjust the component stats for each season from 2003-2007. I am including 2006 and 2007 MLEs (major league equivalencies) but for now I've only projected people who appeared in the majors in 2007 except for a couple of Yankee farmhands.
2) Weigh each season using a 5/4/3/2/1 weight (most recent season weighed most heavily) for batters. For pitchers I use a 7/5/3/2/1 weight as I believe pitchers' most recent performance should be weighed a little more heavily since pitchers are more likely to change their true talent level both positively and negatively.
3) Add in a percentage of plate appearances or innings based on league average performance (regression towards the mean)
4) Adjust the appropriate components for the player's age (hits, xbh, HR, BB, K, SB)
5) Park-adjust the final stat line for the player's expected park/league. For pitchers I include an adjustment for the projected defense behind them.
6) This is an almost entirely objective system, but I have made/will make playing time adjustments for some players
6) Last, wait for everyone to tell me how wrong the projections are.
I have included defensive projections using Zone Rating as well. I used a weighted average from 2002-2007 with regression and aging factored in. I included a lot of players who don't have significant playing time at specific positions so small sample size caveats apply, even with the regression that I added in.
This is all very much a work in progress and I expect to continue tweaking it during the offseason, so if anyone sees any numbers that look wrong or has any questions about the methodology feel free to let me know.
Since this is supposed to be a Yankee site, here are the Yankees' projections.
| NAME | AGE | TEAM | POS | G | PA | AB | H | 2B | 3B | HR | RBI | BB | SO | SB | CS | DP | AVG | OBP | SLG |
| Alberto Gonzalez | 25 | NYA | ss | 75 | 259 | 237 | 60 | 11 | 1 | 4 | 28 | 18 | 34 | 0 | 0 | 0 | .255 | .311 | .363 |
| Alex Rodriguez | 32 | NYA | 3b | 157 | 686 | 580 | 174 | 27 | 1 | 42 | 124 | 89 | 127 | 19 | 4 | 17 | .300 | .402 | .570 |
| Andy Phillips | 31 | NYA | 1b | 63 | 198 | 180 | 48 | 9 | 1 | 4 | 24 | 13 | 34 | 1 | 1 | 4 | .266 | .310 | .403 |
| Angel Chavez | 26 | NYA | ss | 75 | 280 | 264 | 68 | 14 | 1 | 7 | 34 | 14 | 54 | 0 | 0 | 0 | .259 | .297 | .395 |
| Bobby Abreu | 34 | NYA | rf | 155 | 676 | 565 | 161 | 37 | 2 | 19 | 89 | 101 | 120 | 23 | 6 | 10 | .284 | .392 | .462 |
| Brett Gardner | 24 | NYA | cf | 83 | 331 | 297 | 78 | 11 | 3 | 2 | 31 | 31 | 64 | 0 | 0 | 0 | .262 | .332 | .341 |
| Bronson Sardinha | 25 | NYA | rf | 85 | 322 | 294 | 69 | 12 | 2 | 9 | 39 | 26 | 62 | 0 | 0 | 0 | .234 | .296 | .378 |
| Chris Basak | 29 | NYA | 3b | 69 | 239 | 215 | 56 | 11 | 1 | 6 | 30 | 21 | 42 | 0 | 0 | 0 | .263 | .329 | .411 |
| Derek Jeter | 34 | NYA | ss | 151 | 688 | 608 | 193 | 35 | 3 | 16 | 90 | 62 | 102 | 18 | 5 | 17 | .318 | .387 | .463 |
| Doug Mientkiewicz | 34 | NYA | 1b | 79 | 291 | 254 | 66 | 15 | 1 | 7 | 35 | 29 | 39 | 1 | 1 | 7 | .262 | .338 | .409 |
| Eric Duncan | 23 | NYA | 1b | 79 | 303 | 274 | 63 | 14 | 1 | 8 | 36 | 26 | 56 | 0 | 0 | 0 | .231 | .299 | .379 |
| Hideki Matsui | 34 | NYA | lf | 137 | 579 | 506 | 147 | 30 | 2 | 22 | 86 | 65 | 73 | 2 | 1 | 12 | .291 | .370 | .490 |
| Jason Giambi | 37 | NYA | dh | 107 | 422 | 339 | 87 | 15 | 0 | 22 | 66 | 70 | 82 | 1 | 0 | 6 | .256 | .398 | .498 |
| Johnny Damon | 34 | NYA | cf | 137 | 607 | 539 | 154 | 27 | 4 | 16 | 78 | 60 | 75 | 21 | 5 | 7 | .286 | .357 | .438 |
| Jorge Posada | 36 | NYA | c | 141 | 553 | 473 | 139 | 31 | 1 | 21 | 82 | 71 | 93 | 2 | 1 | 14 | .294 | .391 | .499 |
| Jose Molina | 33 | NYA | c | 60 | 195 | 179 | 45 | 10 | 0 | 4 | 22 | 10 | 39 | 1 | 0 | 4 | .248 | .281 | .369 |
| Kevin Reese | 30 | NYA | lf | 61 | 234 | 212 | 52 | 7 | 1 | 5 | 26 | 18 | 40 | 0 | 0 | 0 | .245 | .309 | .361 |
| Melky Cabrera | 23 | NYA | cf | 139 | 564 | 502 | 144 | 24 | 5 | 9 | 64 | 48 | 64 | 6 | 3 | 7 | .287 | .345 | .407 |
| Robinson Cano | 25 | NYA | 2b | 152 | 624 | 581 | 181 | 41 | 4 | 18 | 90 | 31 | 76 | 3 | 3 | 17 | .312 | .347 | .490 |
| Shelley Duncan | 28 | NYA | dh | 83 | 311 | 281 | 72 | 13 | 1 | 16 | 50 | 26 | 64 | 0 | 0 | 1 | .256 | .317 | .479 |
| Wil Nieves | 30 | NYA | c | 42 | 147 | 136 | 32 | 6 | 0 | 2 | 15 | 8 | 18 | 0 | 0 | 1 | .233 | .274 | .340 |
| Wilson Betemit | 26 | NYA | 1b | 109 | 308 | 272 | 71 | 14 | 1 | 12 | 44 | 30 | 70 | 1 | 1 | 4 | .263 | .332 | .451 |
| NAME | AGE | TEAM | ERA | G | W | L | IP | H | ER | HR | BB | SO |
| Andy Pettitte | 36 | NYA | 4.21 | 36 | 13 | 9 | 201 | 199 | 94 | 19 | 57 | 149 |
| Brian Bruney | 26 | NYA | 5.20 | 32 | 2 | 2 | 33 | 31 | 19 | 4 | 22 | 30 |
| Carl Pavano | 32 | NYA | 4.49 | 10 | 3 | 3 | 50 | 53 | 25 | 6 | 11 | 31 |
| Chase Wright | 25 | NYA | 4.76 | 17 | 3 | 3 | 51 | 57 | 27 | 6 | 23 | 30 |
| Chien-Ming Wang | 28 | NYA | 3.93 | 35 | 13 | 9 | 202 | 210 | 88 | 12 | 56 | 95 |
| Chris Britton | 25 | NYA | 3.79 | 46 | 4 | 2 | 57 | 57 | 24 | 5 | 19 | 39 |
| Colter Bean | 31 | NYA | 4.44 | 13 | 2 | 1 | 24 | 23 | 12 | 2 | 16 | 22 |
| Darrell Rasner | 27 | NYA | 4.77 | 13 | 3 | 2 | 45 | 51 | 24 | 6 | 13 | 26 |
| Edwar Ramirez | 27 | NYA | 3.73 | 39 | 4 | 2 | 55 | 45 | 23 | 6 | 25 | 70 |
| Ian Kennedy | 23 | NYA | 4.19 | 38 | 11 | 9 | 181 | 179 | 84 | 21 | 72 | 143 |
| Jeff Karstens | 25 | NYA | 5.82 | 13 | 2 | 4 | 51 | 61 | 33 | 9 | 18 | 27 |
| Jim Brower | 35 | NYA | 5.88 | 22 | 1 | 2 | 26 | 28 | 17 | 4 | 11 | 19 |
| Joba Chamberlain | 22 | NYA | 3.68 | 55 | 10 | 7 | 149 | 136 | 61 | 16 | 48 | 150 |
| Jose Veras | 27 | NYA | 4.29 | 27 | 2 | 2 | 38 | 38 | 18 | 4 | 15 | 29 |
| Kei Igawa | 28 | NYA | 5.72 | 25 | 6 | 9 | 138 | 158 | 88 | 29 | 46 | 102 |
| Kyle Farnsworth | 32 | NYA | 4.23 | 58 | 4 | 2 | 57 | 53 | 27 | 7 | 24 | 58 |
| Luis Vizcaino | 33 | NYA | 4.46 | 71 | 4 | 4 | 73 | 69 | 36 | 8 | 33 | 62 |
| Mariano Rivera | 38 | NYA | 2.74 | 66 | 6 | 2 | 75 | 67 | 23 | 4 | 15 | 69 |
| Matt DeSalvo | 27 | NYA | 6.82 | 15 | 2 | 4 | 53 | 63 | 40 | 8 | 37 | 26 |
| Mike Mussina | 39 | NYA | 4.26 | 32 | 11 | 8 | 173 | 177 | 82 | 19 | 40 | 129 |
| Philip Hughes | 22 | NYA | 3.73 | 35 | 11 | 6 | 157 | 152 | 65 | 14 | 50 | 124 |
| Roger Clemens | 45 | NYA | 3.69 | 22 | 7 | 4 | 103 | 96 | 42 | 9 | 30 | 85 |
| Ron Villone | 38 | NYA | 4.60 | 45 | 3 | 4 | 59 | 56 | 30 | 7 | 29 | 47 |
| Ross Ohlendorf | 25 | NYA | 4.77 | 16 | 4 | 3 | 64 | 73 | 34 | 9 | 15 | 39 |
| Sean Henn | 27 | NYA | 5.90 | 19 | 2 | 2 | 32 | 36 | 21 | 5 | 19 | 21 |
| Tyler Clippard | 23 | NYA | 5.81 | 19 | 4 | 5 | 81 | 92 | 52 | 16 | 35 | 58 |
| Player | Age | Team | LG | Pos | G | Innings | PO | A | E | DP | PM | CH | ZR | PM +/- | RS | RS/162 |
| Doug Mientkiewicz | 34 | NYY | AL | 1B | 86 | 669 | 626 | 53 | 4 | 63 | 119 | 137 | .870 | 4 | 3 | 7 |
| Jason Giambi | 37 | NYY | AL | 1B | 60 | 459 | 402 | 32 | 5 | 38 | 72 | 90 | .796 | -4 | -3 | -9 |
| Andy Phillips | 31 | NYY | AL | 1B | 66 | 428 | 342 | 45 | 4 | 41 | 80 | 94 | .846 | 1 | 1 | 2 |
| Wilson Betemit | 27 | NYY | AL | 1B | 36 | 255 | 88 | 63 | 3 | 20 | 69 | 84 | .816 | -1 | -1 | -5 |
| Shelley Duncan | 29 | NYY | AL | 1B | 31 | 218 | 51 | 62 | 3 | 18 | 61 | 73 | .834 | 1 | 1 | 6 |
| Robinson Cano | 26 | NYY | AL | 2B | 140 | 1213 | 320 | 397 | 13 | 100 | 367 | 441 | .833 | 4 | 3 | 4 |
| Wilson Betemit | 27 | NYY | AL | 2B | 11 | 84 | 53 | 13 | 0 | 7 | 18 | 22 | .813 | -1 | -1 | -9 |
| Alex Rodriguez | 33 | NYY | AL | 3B | 156 | 1337 | 115 | 273 | 16 | 31 | 299 | 394 | .758 | -3 | -3 | -3 |
| Wilson Betemit | 27 | NYY | AL | 3B | 41 | 288 | 23 | 63 | 4 | 9 | 64 | 81 | .786 | 0 | 0 | -2 |
| Johnny Damon | 35 | NYY | AL | CF | 117 | 980 | 284 | 16 | 3 | 3 | 280 | 318 | .881 | 0 | 0 | 0 |
| Melky Cabrera | 24 | NYY | AL | CF | 62 | 505 | 153 | 23 | 3 | 4 | 153 | 173 | .880 | 2 | 1 | 4 |
| Hideki Matsui | 34 | NYY | AL | LF | 105 | 892 | 204 | 6 | 4 | 1 | 199 | 238 | .833 | -7 | -6 | -10 |
| Melky Cabrera | 24 | NYY | AL | LF | 72 | 599 | 139 | 7 | 1 | 1 | 134 | 159 | .840 | -3 | -2 | -6 |
| Johnny Damon | 35 | NYY | AL | LF | 52 | 423 | 116 | 3 | 2 | 0 | 112 | 129 | .869 | 0 | 0 | 0 |
| Bobby Abreu | 34 | NYY | AL | RF | 128 | 1089 | 240 | 7 | 4 | 1 | 237 | 275 | .864 | -2 | -2 | -2 |
| Derek Jeter | 34 | NYY | AL | SS | 152 | 1300 | 225 | 373 | 15 | 85 | 369 | 457 | .806 | -11 | -8 | -9 |
| Alberto Gonzalez | 25 | NYY | AL | SS | 35 | 225 | 49 | 18 | 2 | 3 | 56 | 66 | .838 | -1 | 0 | 0 |
| Wilson Betemit | 27 | NYY | AL | SS | 18 | 114 | 22 | 21 | 2 | 5 | 29 | 36 | .793 | -2 | -1 | -19 |
The full spreadsheet is available here.
Update: Version 1.3 is now available. I added more minor league data and changed some of my pitching algorithms. Link is here.
It's important to remember that any projection system is inherently limited. We're dealing with athletes playing games, and their true talent can change in ways that can be forecasted. In addition fluke seasons happen, both good and bad. I think that on a team level projections are a useful tool for understanding probabilities, but at the end of the day that's all they are. Probabilities, not predictions.
Page 1 of 1 pages:

















































