Winning Through Data: Game Strategy Optimization in Baseball

11 min readApr 5, 2023

In the first article of this series on using data in baseball I explored how to leverage pitch data to predict the next pitch a pitcher will throw. As I continue to delve into the world of baseball analytics, I’ll now turn my attention to game strategy optimization. By analyzing various aspects of the game, teams can make data-driven decisions to maximize their chances of winning, transforming them from underdogs to champions.

Game strategy optimization has the potential to be a game-changer, tipping the balance in favor of teams that harness the power of data and analytics. This approach allows teams to exploit weaknesses, capitalize on strengths, and adjust their strategies on-the-fly. In this article, I’ll discuss the key elements of game strategy optimization, including lineup optimization, defensive positioning, and in-game decision-making. I’ll also provide practical examples of how to apply these concepts using data analysis and machine learning.

One notable example where game strategy optimization played a significant role in a team’s success is the 2016 Chicago Cubs. The team’s management, led by Theo Epstein, embraced advanced analytics and data-driven decision-making to break the 108-year championship drought. By employing strategies such as optimal lineup construction, defensive shifts, and in-game decision-making based on data, the Cubs were able to achieve remarkable success and win the World Series.

In this article, I will provide an in-depth exploration of the key aspects of game strategy optimization, demonstrating how teams can use data to transform their strategy in real-time.

Lineup Optimization

Lineup optimization involves determining the optimal batting order to maximize run production. Traditionally, managers have relied on intuition and experience to create lineups. However, with the advent of advanced analytics, we can now use data to determine the ideal batting order based on factors such as batter performance, handedness, and pitcher tendencies.

One approach to lineup optimization is using the Markov Chain approach, which simulates the flow of runners around the bases and calculates the expected number of runs scored for a given lineup. By adjusting the order of batters and simulating thousands of innings, I can identify the lineup configuration that maximizes run production.

Example: Using historical player statistics, I can calculate the on-base percentage (OBP) and slugging percentage (SLG) for each batter. Then, I can use a Markov Chain model to find the lineup that maximizes the team’s expected runs scored.

import pandas as pd
import numpy as np
from itertools import permutations
from markov_chain import MarkovChain

# Load batter data with OBP and SLG values
batter_data = pd.read_csv("batter_data.csv")

# Calculate the transition matrix for each batter using their OBP and SLG
transition_matrices = [MarkovChain.calculate_transition_matrix(row["OBP"], row["SLG"]) for _, row in batter_data.iterrows()]

# Function to calculate the expected runs for a given lineup
def expected_runs(lineup, transition_matrices):
    lineup_matrix = np.vstack([transition_matrices[i] for i in lineup])
    markov_chain = MarkovChain(lineup_matrix)
    return markov_chain.expected_runs()

# Find the optimal lineup that maximizes expected runs
best_lineup = None
best_runs = -np.inf

for lineup in permutations(range(len(batter_data))):
    runs = expected_runs(lineup, transition_matrices)
    if runs > best_runs:
        best_runs = runs
        best_lineup = lineup

# Print the optimal lineup and expected runs
print("Optimal Lineup:", best_lineup)
print("Expected Runs:", best_runs)

As an example, output of the above code would provide the following:

Optimal Lineup: (2, 5, 1, 3, 0, 6, 4, 7, 8)
Expected Runs: 5.237

In this example, I was able to use a Markov Chain model to simulate the flow of runners around the bases and calculate the expected runs scored for each possible lineup configuration. The optimal lineup generated has an expected runs value of 5.237 per game. By iterating through all possible lineups, I can identify the optimal batting order that maximizes the team’s run production. This data-driven approach to lineup optimization could assist teams to capitalize on the strengths of their batters and exploit pitcher matchups, giving them a competitive edge in the pursuit of victory.

Defensive Positioning

Defensive positioning plays a crucial role in preventing the opposing team from scoring runs. By analyzing historical data on batted ball locations and exit velocities, teams can optimize their fielders’ positions to maximize the likelihood of recording outs. This approach, known as defensive shifting, has become increasingly popular in recent years, as more teams embrace the power of data to inform their on-field tactics. While ‘the shift’ is no longer legal due to recent rule changes this model can help with other defensive and position adjustments to capitalize on what we know about an upcoming batter.

Example: Using batted ball data, I can calculate the optimal positions for fielders based on the probability distribution of where a batter is most likely to hit the ball.

import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt

# Load batted ball data for a specific batter
batter_id = 12345
batted_ball_data = pd.read_csv(f"batted_ball_data_{batter_id}.csv")

# Calculate the probability distribution of batted ball locations
x = batted_ball_data["x_coord"]
y = batted_ball_data["y_coord"]
xy = np.vstack([x, y])
kde = gaussian_kde(xy)

# Create a grid of field locations and evaluate the probability distribution
grid_size = 100
x_grid, y_grid = np.mgrid[x.min():x.max():grid_size * 1j, y.min():y.max():grid_size * 1j]
positions = np.vstack([x_grid.ravel(), y_grid.ravel()])
probabilities = kde(positions).reshape(x_grid.shape)

# Find the optimal positions for infielders based on the probability distribution
num_infielders = 4
infielder_positions = np.argpartition(probabilities.ravel(), -num_infielders)[-num_infielders:]
infielder_coords = np.column_stack(np.unravel_index(infielder_positions, probabilities.shape))

# Convert grid coordinates to actual field coordinates
infielder_coords_actual = np.array([(x.min() + (x_coord / grid_size) * (x.max() - x.min()), y.min() + (y_coord / grid_size) * (y.max() - y.min())) for x_coord, y_coord in infielder_coords])

# Print the optimal infielder positions
print("Optimal Infielder Positions (Actual Field Coordinates):")
print(infielder_coords_actual)

# Plot the probability distribution and optimal infielder positions
plt.imshow(probabilities, extent=[x.min(), x.max(), y.min(), y.max()], origin="lower", cmap="viridis")
plt.scatter(infielder_coords[:, 0], infielder_coords[:, 1], color="red", marker="o", s=50)
plt.title("Optimal Infielder Positions")
plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.show()

As an example, output of the above code would provide the following:

Optimal Infielder Positions (Actual Field Coordinates):
[[-20.15,  92.25],
 [ 28.83,  91.50],
 [  6.17, 107.83],
 [-37.00, 107.50]]

In this example, I use batted ball data to calculate the probability distribution of where a specific batter is most likely to hit the ball. Based on this distribution, I found the optimal positions for infielders to maximize the likelihood of recording outs. The output displays the optimal infielder positions in actual field coordinates. By adjusting their defensive positioning based on data-driven insights, teams can increase their chances of preventing runs and ultimately improve their chances of winning.

In-Game Decision-Making

In-game decision-making is an essential aspect of game strategy optimization, as it allows managers to make informed choices that can influence the outcome of the game. Examples of in-game decisions include when to make pitching changes, when to use pinch-hitters or pinch-runners, and when to attempt aggressive base running tactics like steals or hit-and-run plays. By leveraging historical data and predictive models, managers can make better decisions that increase their team’s chances of winning.

Two examples of data-driven in-game decision-making are:

Matchup-based pitching changes: In the 2017 American League Division Series between the Houston Astros and the Boston Red Sox, Astros manager A.J. Hinch used matchup data and analytics to make strategic pitching changes, bringing in specific relievers to face certain Red Sox hitters. This approach neutralized the Red Sox’s offensive threats and played a key role in the Astros’ series victory.
Pinch-hitting decisions: In a critical late-season game, a manager may face a situation where they need to decide between letting the starting pitcher continue to bat or bringing in a pinch-hitter to increase the chances of scoring. By analyzing the starting pitcher’s fatigue level, the pinch-hitter’s historical performance against the opposing pitcher, and other relevant factors, the manager can make a data-driven decision that maximizes the team’s scoring potential.

Example: Using historical play-by-play data, I can calculate the win probability added (WPA) for various in-game decisions. By analyzing these values, managers can determine which decisions are likely to have the most significant positive impact on their team’s chances of winning.

import pandas as pd
from win_probability import WinProbabilityCalculator

# Load play-by-play data
play_by_play_data = pd.read_csv("play_by_play_data.csv")

# Initialize the win probability calculator
wpa_calculator = WinProbabilityCalculator()

# Calculate the win probability before and after specific decisions
decision_indices = [42, 128]  # Example indices of the decisions in play_by_play_data

for decision_index in decision_indices:
    pre_decision_state = play_by_play_data.loc[decision_index, ["inning", "outs", "runners_on_bases", "score_difference"]]
    post_decision_state = play_by_play_data.loc[decision_index + 1, ["inning", "outs", "runners_on_bases", "score_difference"]]

    pre_decision_win_probability = wpa_calculator.calculate_win_probability(pre_decision_state)
    post_decision_win_probability = wpa_calculator.calculate_win_probability(post_decision_state)

    # Calculate the win probability added (WPA) for the decision
    wpa = post_decision_win_probability - pre_decision_win_probability
    print(f"Win Probability Added (WPA) for Decision {decision_index}: {wpa:.4f}")

As an example, output of the above code would provide the following:

Win Probability Added (WPA) for Decision 42: 0.0512
Win Probability Added (WPA) for Decision 128: -0.0245

In this example, I used historical play-by-play data and a win probability calculator to determine the win probability added (WPA) for specific in-game decisions. WPA measures the change in win probability from one play to the next, allowing managers to quantify the impact of their decisions on the team’s chances of winning. By analyzing WPA values for various decisions, managers can make informed choices that maximize their team’s likelihood of victory.

Evaluating Performance and Adjusting Strategy

In addition to making data-driven in-game decisions, it is also essential for managers and coaching staff to evaluate their team’s performance post game and adjust their strategy accordingly. This involves analyzing post-game data, identifying areas for improvement, and making necessary adjustments to optimize team performance. By constantly evaluating and refining their strategy, teams can stay ahead of their competition and maximize their chances of success.

Post-game data typically includes information such as player statistics, team performance metrics, and situational data from each game. This data can be obtained from various sources, such as the MLB’s official website, third-party data providers like FanGraphs or Baseball-Reference, or through APIs like the MLB Stats API or the PyBaseball Python library.

Example: Using historical game data, I can calculate various performance metrics, such as batting average, on-base percentage, and earned run average, to evaluate a team’s strengths and weaknesses.

import pandas as pd

# Load historical game data
game_data = pd.read_csv("historical_game_data.csv")

# Calculate performance metrics
batting_average = game_data["hits"].sum() / game_data["at_bats"].sum()
on_base_percentage = (game_data["hits"].sum() + game_data["walks"].sum()) / (game_data["at_bats"].sum() + game_data["walks"].sum() + game_data["hit_by_pitch"].sum())
earned_run_average = (game_data["earned_runs"].sum() / game_data["innings_pitched"].sum()) * 9

# Print performance metrics
print("Batting Average:", round(batting_average, 3))
print("On-Base Percentage:", round(on_base_percentage, 3))
print("Earned Run Average:", round(earned_run_average, 2))

As an example, output of the above code would provide the following:

Batting Average: 0.254
On-Base Percentage: 0.321
Earned Run Average: 4.05

In the output from the example code above, I have calculated and displayed three key performance metrics for a baseball team: Batting Average, On-Base Percentage, and Earned Run Average. These metrics offer valuable insights into the team’s offensive and defensive performance, which can help managers and coaching staff identify areas for improvement and make data-driven adjustments to their strategies.

Let’s consider an example where a team has been struggling with a low on-base percentage (OBP) and the manager wants to address this issue to improve the team’s offensive performance.

1. Analyzing post-game data: The manager and coaching staff would first analyze the post-game data to identify the reasons behind the low OBP. This could include looking at individual player statistics like walk rates, strikeout rates, and hit-by-pitch occurrences.

2. Identifying areas for improvement: After analyzing the data, the coaching staff might identify that the team’s batters have high strikeout rates and low walk rates. This suggests that the batters may be struggling with plate discipline and pitch recognition.

3. Making necessary adjustments: Based on these insights, the coaching staff can take several actions to improve the team’s OBP:

a. Adjust the batting order: The manager may decide to move batters with higher walk rates and better pitch recognition higher up in the batting order, increasing their chances of getting on base and creating more run-scoring opportunities.

b. Tailor training sessions: The coaching staff can work with the batters during training sessions, focusing on improving their pitch recognition and plate discipline. This could involve using advanced video analysis tools to study opposing pitchers and their pitch patterns, or employing drills that emphasize pitch recognition and patience at the plate.

c. Leverage scouting reports: The coaching staff can study scouting reports on opposing pitchers to identify weaknesses that the batters can exploit. For example, if a particular pitcher tends to throw breaking balls in 2–2 counts, the coaching staff can instruct the batters to be prepared for that pitch in those situations.

These predictive aspects can be invaluable for coaching staff when it comes to making strategic decisions, such as lineup construction, pitching matchups, or in-game tactics. For instance, if a team’s on-base percentage is significantly higher against right-handed pitchers, a manager might choose to stack their lineup with left-handed batters in a crucial game against a right-handed starter. Similarly, if a pitcher’s earned run average tends to increase after they have pitched more than six innings, a coach might be more inclined to bring in a reliever earlier in the game to preserve the lead.

By continuously evaluating performance metrics and leveraging predictive analytics, coaching staff can make well-informed decisions that increase their team’s chances of success and stay ahead of the competition.

Conclusion

Data analytics has revolutionized the way baseball teams approach game strategy optimization. Through the use of advanced statistical analysis and machine learning techniques, managers and coaching staff can make more informed decisions and tailor their strategies to maximize their chances of success on the field.

In this article, I explored various aspects of leveraging data in baseball, including:

Optimizing lineup construction and platoon splits by examining hitter-pitcher matchups, player performance metrics, and other relevant factors. By strategically constructing lineups, managers can exploit favorable matchups and minimize the impact of unfavorable ones. A data-driven lineup might show that the top three batters have an average On-Base Percentage (OBP) of .350 against right-handed pitchers, significantly improving the team’s chances of scoring runs.
Enhancing in-game decision-making, such as defensive positioning, bullpen usage, and pinch-hitting choices, by leveraging real-time data and situational probabilities. For example, a manager might decide to employ a defensive shift against a pull-heavy hitter, based on the hitter’s historical spray chart data showing that 75% of their ground balls are hit to the pull side of the field.
Evaluating performance and adjusting strategy through the analysis of post-game data and performance metrics, such as Batting Average, On-Base Percentage, and Earned Run Average. By identifying areas for improvement, coaching staff can make data-driven adjustments to their strategies. Suppose a team’s analysis reveals a high strikeout rate of 25% and a low walk rate of 7%. In that case, the coaching staff might focus on improving plate discipline during training sessions.

By integrating data analytics into the decision-making processes, teams can gain a competitive edge over their opponents and increase their chances of success in the long run. As data collection and analysis tools continue to advance, embracing a data-driven approach will be crucial for baseball teams looking to stay ahead of the curve and adapt to the ever-evolving game.

As I continue this series on leveraging data in baseball, the next article will delve into player performance analysis and injury prevention. By using advanced data analytics techniques, teams can better understand each player’s strengths and weaknesses, identify areas for improvement, and monitor their workload to minimize the risk of injuries. I will explore how teams can integrate sports science, biomechanics, and wearable technology data to create personalized training programs, optimize player recovery, and ultimately prolong the careers of their athletes. Stay tuned to learn more about how data-driven approaches are shaping the future of player development and performance management in baseball.

Winning Through Data: Game Strategy Optimization in Baseball

Written by MaFisher

Responses (1)