Baseball Analytics Case Study

Public-data audit of player-development translation and roster-construction signals.

A reproducible Python case study using Statcast, FanGraphs, and a curated prospect cohort to examine where measurable inputs translated into MLB value and where they did not.

The project evaluates pitch-level vulnerabilities, prospect readiness, roster archetypes, baserunning, defense, and team-level value using public data, notebooks, reusable feature engineering, and written findings.

It is framed as an analytical audit, not a causal claim. The workflow is designed to make assumptions, uncertainty, and limitations visible.

Data: Statcast, FanGraphs batting/fielding data, and a curated 43-prospect cohort.
Methods: Vectorized pandas features, effect sizes, z-scores, regression, cross-validation, and notebooks.
Outputs: Twelve notebooks, generated figures, methodology notes, limitations, and a concise findings summary.
Posture: Observational public-data analysis, not access to private models, coaching decisions, or internal data.

2017-26 Analysis window

43 Prospects in cohort

21 Organizations compared

12 Case-study notebooks

89 Automated tests

8 Summary findings

Workflow

Collect Public Data

Use Statcast, FanGraphs, and curated prospect records to build a reproducible analysis base.

Engineer Features

Create pitch-type, count, velocity, plate-discipline, batted-ball, roster, baserunning, and fielding features.

Compare Cohorts

Evaluate player-development translation, organizational outcomes, and balanced roster archetypes.

Model Team Value

Test exploratory pressure, baserunning, and defense composites against public WAR outcomes.

Document Limits

Separate measured patterns from causality and account for sample size, public-data gaps, and ongoing seasons.

Analysis Areas

notebooks/01_translation_gap.ipynb

Translation Gap

Tools-to-production analysis for how measurable prospect inputs translated into MLB outcomes.

notebooks/02_pitch_diagnostics.ipynb

Pitch Diagnostics

Pitch-type and count-specific vulnerabilities across chase, whiff, and contact-quality signals.

notebooks/07_yankees_systemic.ipynb

Organization Comparison

Curated cohort comparison across target organizations, with sample-size caveats.

notebooks/08_yankees_case_studies.ipynb

Roster Case Studies

Baserunning, defense, home-run dependency, roster extremes, and construction tradeoffs.

notebooks/09_team_value_composite.ipynb

Team Value Composite

Exploratory pressure, baserunning, and defense composite tested against team WAR.

notebooks/10_rice_comparison.ipynb

Ben Rice Comparison

Internal counterexample showing a different readiness-gate profile and live validation case.

Data Analysis Proof Points

src/fire_fishman/

Reusable Data Helpers

Project code fetches and caches public data, then exposes reusable feature-engineering helpers for notebooks.

pitch-level features

Feature Engineering

Builds chase, whiff, velocity-tier, pitch-type, batted-ball, and count-context features from public data.

readiness gates

Validation Framework

Uses readiness gates to compare prospect profiles before updating the read with MLB outcomes.

notebooks/10_rice_comparison.ipynb

Ben Rice Check

Rice is the internal counterexample: lower-profile path, stronger readiness-gate profile, and better early validation.

notebooks/09_team_value_composite.ipynb

Model Sanity Checks

Regression and Bayesian checks test whether non-offensive components add descriptive value beyond offense.

docs/limitations.md

Interpretation Discipline

Separates measured public-data patterns from causal claims, private decision-making, and scouting certainty.

Findings

Pitch-Type Signals

Offspeed chase, breaking-ball chase, and high-velocity whiff created clearer separation than aggregate discipline.

Live Validation Case

Ben Rice passed 5/5 readiness gates, broke out in 2025, and opened 2026 on OPS/SLG leaderboards.

Open Evaluations

Dominguez and Volpe remain live cases; the project flags why neither should be treated as closed.

Organization Comparison

The curated cohort shows Yankees outcomes lagging top comparison organizations, with modest-sample caveats.

Baserunning Decline

Public BsR moved from +7.6 in 2017 to -17.2 in 2024, with cumulative -39.2 from 2018-2024.

Exploratory Composite

Pressure, baserunning, and defense added descriptive signal beyond offense, but not causal proof.

Scope And Limitations

Observational

The analysis identifies public-data patterns and missed value. It cannot fully observe internal decisions, coaching context, private models, or health.

Sample Size

Prospect cohorts and organizational comparisons are modest. Labels and live-player evaluations can change as seasons and careers evolve.

Public Metrics

Statcast and FanGraphs metrics are useful but incomplete proxies for player quality, fielding value, and organizational process.

Model Use

Regression and machine-learning checks support exploratory analysis. They are not scouting grades, causal models, or predictive guarantees.

GitHub