Creating Player Forescasts

Creating Player Forescasts#

Model Development:#

Now im going to talk a bit about how these predictions came about. If you don’t care - skip to the Players to Watch section to see the players of interest.

The intent behind my model is to use metrics & advanced stats from the previous year to predict fantasy points for the current year. So for this year looking at metrics from the 2024 season and utilizing those to predict 2025 fantasy points. Rather than predicting season long fantasy points, I predict fantasy points per IP for pitchers and fantasy points per game for batters. It is nearly impossible to predict how many games a player will play each season - and therefore I remove this factor from our initial evaluation. The metrics include a lot of statcast data - exit velocity, fastball spin rate, etc. along with some more descriptive statistics such as K% or pull%. Since I am using a lot of statcast data it only ranges from 2015-present. Another key feature used in prediction is the percentile of the weighted average of the fantasy scores of the last two years. So if the current player’s metrics are from 2023, this feature will be the percentile of ((Fpoints22 * 0.7) + (Fpoints21 * 0.3)) throughout the entire dataset. This works as an indicator of the player’s performance prior to the current data, weighing more recent perfomances heavier. Players who do not have data from the previous 2 years (via injury or not being in the MLB) are assigned values through the use of an imputer. This imputer looks at entries that do have this previous data, computes an approximate formula, and then applies this to fill in missing values. Another engineered feature is K%/BB% - because it is the king of all predictive statstics for pitchers, and works fairly effectively for batters as well. I also created feature representing the yearly averages of each metric/statistic in an attempt to recognize the year-to-year variation. We don’t want the model to think every player in 2019 was prime Barry Bonds.

Once these features are created the data was run through a prototypical ML workflow - train/test split, preprocessing, hyperparameter optimization and model selection. Models are created separately for pitchers and batters as the feature set used is entirely different. The optimal pitcher model ended up being a LASSO (L1) regressor with a MSE of 0.44 (average fantasy points per inning is ~ 2), and the optimal batter model was a Gradient Boosting regressor with a MSE of 0.33 (and a similar mean of ~ 2).

Here are the top 10 features in terms of model feature importance for the pitcher model:

	Feature	Importance
0	k_percent	0.143281
1	fastball_avg_speed	0.060041
2	whiff_percent	0.051201
3	KpBB%	0.046999
4	z_swing_miss_percent	0.042774
5	Fpoints_IP_two_prev_year	0.042523
6	f_strike_percent	0.042179
7	breaking_avg_break	0.035291
8	fastball_avg_spin	0.022523
9	offspeed_avg_speed	0.018227

And for the batter model:

	Feature	Importance
0	Fpoints_G_percentile	0.423210
1	xwoba	0.082674
2	xba	0.062978
3	xslg	0.051417
4	oz_contact_percent	0.036774
5	xobp	0.033210
6	age	0.025769
7	sprint_speed	0.021797
8	avg_best_speed	0.015279
9	exit_velocity_avg	0.014472

Both models end up slightly more reliant on K% than most commonly used statistics - but strikeouts are critical (good for pitchers, bad for hitters) in fantasy. Also having a Spencer Strider pitch 100 innings with 125 K’s and a 2.5 ERA is much more exciting than someone like Kyle Gibson soft tossing groundballs for 175 IP and a 4 ERA. But thats just my opinion. Also note the HIGH importance of the previous year percentile feature.

Predictions#

Now that the model is trained its time to use it. We take the metrics from 2024 and predict our 2025 values. For pitchers I created two different scores - one not including “team stats” called fpoints_proj_skill, and another including called fpoints_proj. Not that wins and losses don’t require skill - just sometimes they don’t. I utilize Razzball’s Steamer/Razzball projections to get values for games played for batters, along with innings pitched and all team counting stats for the pitchers. From there we have our initial projections for 2025! The hitter projections are found here, pitcher projections here, and full projections here.

Positional Adjustment:#

It happens to me every year where I forget to draft a certain position and end up with a very unsatisfactory option. This notion of positionality needs to be quantified in some way in order to truly capture a players impact on a fantasy team. The standard ESPN roster has the following positions:

C
1B
2B
3B
SS
OF * 3
UTIL (any batter)
P * 7 (SP or RP)
Bench * 3 (any)

We also play with a weekly starts cap (TBD but probably 12) and limiting weekly additions to prevent pitcher spamming. I usually like to construct my roster with at most 3 RP and at least 2 of my 3 bench spots as pitchers.

To calculate this positional adjustment I first get the mean and standard deviation of fantasy points for each position (including only the top 300 scorers). The distributions come out as below. Distribution of Fantasy Points by Position

From here the initial positional adjustment is calculated as: $$ \text{pos\_adj} = \left( \frac{\mu_p - \mu}{\sigma_p} \right) \times \sigma \times 0.5 $$ where we standardize the difference from each position to the overall mean, and then rescale by the overall SD, and multiply by 0.5 just to reduce the effect a bit.

Another adjustment is applied for each position taking into account the importance of the position. These are admittedly arbitrary values created by yours truly, however the effects are not very significant. It reduces the value of more abundant positions such as SP, and OF, along with reducing the value of DH as there is no set spot on a team (they would have to be used in UTIL). A few more minor adjustments were also made to increase the value of catchers and further decrease the value of pitchers.

The resulting adjustment values are shown below:

	Pos	mean	std	pos_adj
0	1B	322.116217	52.852769	-6.087865
1	2B	296.160442	46.739445	10.415340
2	3B	308.411984	64.004061	1.642868
3	C	289.196370	30.518661	34.589502
4	DH	346.544905	119.092538	-27.275181
5	OF	308.809142	62.415301	2.477440
6	RP	276.259683	39.371023	6.624046
7	SP	335.029272	62.892789	-56.022887
8	SS	323.558628	61.526180	-5.959969

Lastly a bonus of 5 points was added to players with multiple batter positions (not including DH), and a bonus of 20 was applied to anyone who can play catcher along with another positions. Now these are applied to each player, and the final rankings are created. If you are interested please check them out here. One caveat - Ohtani does not have both his hitting and pitching predictions here. He is obviously #1 overall if he has both. Check out the next chapter if you want to see some of my takeaways.

Creating Player Forescasts

Contents

Creating Player Forescasts#

Model Development:#

Predictions#

Positional Adjustment:#