pymc-labs · drbenvincent · Dec 5, 2025 · Dec 5, 2025 · Dec 5, 2025 · Dec 5, 2025
diff --git a/AGENTS.md b/AGENTS.md
@@ -19,8 +19,15 @@
 
 ## Documentation
 
+- **Dual README files**: The project has two files that must be kept in sync:
+  - `README.md` (GitHub landing page)
+  - `docs/source/index.md` (documentation website homepage)
+
+  When adding major new features to the Features table or making other content changes, update **both files**. The Features table lists all quasi-experimental methods supported by CausalPy.
+- **Reporting statistics**: When adding new experiment types, update the "Experiment Support" table in `docs/source/knowledgebase/reporting_statistics.md` to document `effect_summary()` support status for the new experiment.
 - **Structure**: Notebooks (how-to examples) go in `docs/source/notebooks/`, knowledgebase (educational content) goes in `docs/source/knowledgebase/`
 - **Notebook naming**: Use pattern `{method}_{model}.ipynb` (e.g., `did_pymc.ipynb`, `rd_skl.ipynb`), organized by causal method
+- **Notebook index**: New notebooks must be added to `docs/source/notebooks/index.md` under the appropriate `toctree` section for them to appear in the rendered documentation
 - **MyST directives**: Use `:::{note}` and other MyST features for callouts and formatting
 - **Glossary linking**: Link to glossary terms (defined in `glossary.rst`) on first mention in a file:
   - In Markdown files (`.md`, `.ipynb`): Use MyST syntax `{term}glossary term``

diff --git a/README.md b/README.md
@@ -79,6 +79,7 @@ CausalPy has a broad range of quasi-experimental methods for causal inference:
 | Geographical lift | Measures the impact of an intervention in a specific geographic area by comparing it to similar areas without the intervention. Commonly used in marketing to assess regional campaigns. |
 | ANCOVA | Analysis of Covariance combines ANOVA and regression to control for the effects of one or more quantitative covariates. Used when comparing group means while controlling for other variables. |
 | Differences in Differences | Compares the changes in outcomes over time between a treatment group and a control group. Used in observational studies to estimate causal effects by accounting for time trends. |
+| Event Study | Estimates dynamic treatment effects over event time (time relative to treatment). Extends difference-in-differences by estimating separate effects for each time period, enabling pre-trend validation and analysis of how causal effects evolve. |
 | Regression discontinuity | Identifies causal effects by exploiting a cutoff or threshold in an assignment variable. Used when treatment is assigned based on a threshold value of an observed variable, allowing comparison just above and below the cutoff. |
 | Regression kink designs | Focuses on changes in the slope (kinks) of the relationship between variables rather than jumps at cutoff points. Used to identify causal effects when treatment intensity changes at a threshold. |
 | Interrupted time series | Analyzes the effect of an intervention by comparing time series data before and after the intervention. Used when data is collected over time and an intervention occurs at a known point, allowing assessment of changes in level or trend. |

diff --git a/causalpy/__init__.py b/causalpy/__init__.py
@@ -19,6 +19,7 @@
 
 from .data import load_data
 from .experiments.diff_in_diff import DifferenceInDifferences
+from .experiments.event_study import EventStudy
 from .experiments.instrumental_variable import InstrumentalVariable
 from .experiments.interrupted_time_series import InterruptedTimeSeries
 from .experiments.inverse_propensity_weighting import InversePropensityWeighting
@@ -30,6 +31,7 @@
 __all__ = [
     "__version__",
     "DifferenceInDifferences",
+    "EventStudy",
     "create_causalpy_compatible_class",
     "InstrumentalVariable",
     "InterruptedTimeSeries",

diff --git a/causalpy/data/simulate_data.py b/causalpy/data/simulate_data.py
@@ -440,11 +440,228 @@ def generate_multicell_geolift_data() -> pd.DataFrame:
     return df
 
 
+def generate_event_study_data(
+    n_units: int = 20,
+    n_time: int = 20,
+    treatment_time: int = 10,
+    treated_fraction: float = 0.5,
+    event_window: tuple[int, int] = (-5, 5),
+    treatment_effects: dict[int, float] | None = None,
+    unit_fe_sigma: float = 1.0,
+    time_fe_sigma: float = 0.5,
+    noise_sigma: float = 0.2,
+    predictor_effects: dict[str, float] | None = None,
+    ar_phi: float = 0.9,
+    ar_scale: float = 1.0,
+    seed: int | None = None,
+) -> pd.DataFrame:
+    """
+    Generate synthetic panel data for event study / dynamic DiD analysis.
+
+    Creates panel data with unit and time fixed effects, where a fraction of units
+    receive treatment at a common treatment time. Treatment effects can vary by
+    event time (time relative to treatment). Optionally includes time-varying
+    predictor variables generated via AR(1) processes.
+
+    Parameters
+    ----------
+    n_units : int
+        Total number of units (treated + control). Default 20.
+    n_time : int
+        Number of time periods. Default 20.
+    treatment_time : int
+        Time period when treatment occurs (0-indexed). Default 10.
+    treated_fraction : float
+        Fraction of units that are treated. Default 0.5.
+    event_window : tuple[int, int]
+        Range of event times (K_min, K_max) for which treatment effects are defined.
+        Default (-5, 5).
+    treatment_effects : dict[int, float], optional
+        Dictionary mapping event time k to treatment effect beta_k.
+        Default creates effects that are 0 for k < 0 (pre-treatment)
+        and gradually increase post-treatment.
+    unit_fe_sigma : float
+        Standard deviation for unit fixed effects. Default 1.0.
+    time_fe_sigma : float
+        Standard deviation for time fixed effects. Default 0.5.
+    noise_sigma : float
+        Standard deviation for observation noise. Default 0.2.
+    predictor_effects : dict[str, float], optional
+        Dictionary mapping predictor names to their true coefficients.
+        Each predictor is generated as an AR(1) time series that varies over time
+        but is the same for all units at a given time. For example,
+        ``{'temperature': 0.3, 'humidity': -0.1}`` creates two predictors.
+        Default None (no predictors).
+    ar_phi : float
+        AR(1) autoregressive coefficient controlling persistence of predictors.
+        Values closer to 1 produce smoother, more persistent series.
+        Default 0.9.
+    ar_scale : float
+        Standard deviation of the AR(1) innovation noise for predictors.
+        Default 1.0.
+    seed : int, optional
+        Random seed for reproducibility.
+
+    Returns
+    -------
+    pd.DataFrame
+        Panel data with columns:
+        - unit: Unit identifier
+        - time: Time period
+        - y: Outcome variable
+        - treat_time: Treatment time for unit (NaN if never treated)
+        - treated: Whether unit is in treated group (0 or 1)
+        - <predictor_name>: One column per predictor (if predictor_effects provided)
+
+    Example
+    --------
+    >>> from causalpy.data.simulate_data import generate_event_study_data
+    >>> df = generate_event_study_data(
+    ...     n_units=20, n_time=20, treatment_time=10, seed=42
+    ... )
+    >>> df.shape
+    (400, 5)
+    >>> df.columns.tolist()
+    ['unit', 'time', 'y', 'treat_time', 'treated']
+
+    With predictors:
+
+    >>> df = generate_event_study_data(
+    ...     n_units=10,
+    ...     n_time=10,
+    ...     treatment_time=5,
+    ...     seed=42,
+    ...     predictor_effects={"temperature": 0.3, "humidity": -0.1},
+    ... )
+    >>> df.shape
+    (100, 7)
+    >>> "temperature" in df.columns and "humidity" in df.columns
+    True
+    """
+    if seed is not None:
+        np.random.seed(seed)
+
+    # Default treatment effects: zero pre-treatment, gradual increase post-treatment
+    if treatment_effects is None:
+        treatment_effects = {}
+        for k in range(event_window[0], event_window[1] + 1):
+            if k < 0:
+                treatment_effects[k] = 0.0  # No anticipation
+            else:
+                # Gradual treatment effect that increases post-treatment
+                treatment_effects[k] = 0.5 + 0.1 * k
+
+    # Determine treated units
+    n_treated = int(n_units * treated_fraction)
+    treated_units = set(range(n_treated))
+
+    # Generate unit fixed effects
+    unit_fe = np.random.normal(0, unit_fe_sigma, n_units)
+
+    # Generate time fixed effects
+    time_fe = np.random.normal(0, time_fe_sigma, n_time)
+
+    # Generate predictor time series (if any)
+    # Each predictor is an AR(1) series that varies over time but is the same
+    # for all units at a given time
+    predictors: dict[str, np.ndarray] = {}
+    if predictor_effects is not None:
+        for predictor_name in predictor_effects:
+            predictors[predictor_name] = generate_ar1_series(
+                n=n_time, phi=ar_phi, scale=ar_scale
+            )
+
+    # Build panel data
+    data = []
+    for unit in range(n_units):
+        is_treated = unit in treated_units
+        unit_treat_time = treatment_time if is_treated else np.nan
+
+        for t in range(n_time):
+            # Base outcome: unit FE + time FE + noise
+            y = unit_fe[unit] + time_fe[t] + np.random.normal(0, noise_sigma)
+
+            # Add predictor contributions to outcome
+            if predictor_effects is not None:
+                for predictor_name, coef in predictor_effects.items():
+                    y += coef * predictors[predictor_name][t]
+
+            # Add treatment effect for treated units in event window
+            if is_treated:
+                event_time = t - treatment_time
+                if (
+                    event_window[0] <= event_time <= event_window[1]
+                    and event_time in treatment_effects
+                ):
+                    y += treatment_effects[event_time]
+
+            row = {
+                "unit": unit,
+                "time": t,
+                "y": y,
+                "treat_time": unit_treat_time,
+                "treated": 1 if is_treated else 0,
+            }
+            # Add predictor values to the row
+            for predictor_name, series in predictors.items():
+                row[predictor_name] = series[t]
+
+            data.append(row)
+
+    return pd.DataFrame(data)
+
+
 # -----------------
 # UTILITY FUNCTIONS
 # -----------------
 
 
+def generate_ar1_series(
+    n: int,
+    phi: float = 0.9,
+    scale: float = 1.0,
+    initial: float = 0.0,
+) -> np.ndarray:
+    """
+    Generate an AR(1) autoregressive time series.
+
+    The AR(1) process is defined as:
+        x_{t+1} = phi * x_t + eta_t, where eta_t ~ N(0, scale^2)
+
+    Parameters
+    ----------
+    n : int
+        Length of the time series to generate.
+    phi : float
+        Autoregressive coefficient controlling persistence. Values closer to 1
+        produce smoother, more persistent series. Must be in (-1, 1) for
+        stationarity. Default 0.9.
+    scale : float
+        Standard deviation of the innovation noise. Default 1.0.
+    initial : float
+        Initial value of the series. Default 0.0.
+
+    Returns
+    -------
+    np.ndarray
+        Array of length n containing the AR(1) time series.
+
+    Example
+    -------
+    >>> from causalpy.data.simulate_data import generate_ar1_series
+    >>> np.random.seed(42)
+    >>> series = generate_ar1_series(n=10, phi=0.9, scale=0.5)
+    >>> len(series)
+    10
+    """
+    series = np.zeros(n)
+    series[0] = initial
+    innovations = np.random.normal(0, scale, n - 1)
+    for t in range(1, n):
+        series[t] = phi * series[t - 1] + innovations[t - 1]
+    return series
+
+
 def generate_seasonality(
     n: int = 12, amplitude: int = 1, length_scale: float = 0.5
 ) -> np.ndarray:

diff --git a/causalpy/experiments/__init__.py b/causalpy/experiments/__init__.py
@@ -14,6 +14,7 @@
 """CausalPy experiment module"""
 
 from .diff_in_diff import DifferenceInDifferences
+from .event_study import EventStudy
 from .instrumental_variable import InstrumentalVariable
 from .interrupted_time_series import InterruptedTimeSeries
 from .inverse_propensity_weighting import InversePropensityWeighting
@@ -24,6 +25,7 @@
 
 __all__ = [
     "DifferenceInDifferences",
+    "EventStudy",
     "InstrumentalVariable",
     "InversePropensityWeighting",
     "PrePostNEGD",

diff --git a/causalpy/experiments/base.py b/causalpy/experiments/base.py
@@ -32,6 +32,7 @@
     _compute_statistics_ols,
     _detect_experiment_type,
     _effect_summary_did,
+    _effect_summary_event_study,
     _effect_summary_rd,
     _effect_summary_rkink,
     _extract_counterfactual,
@@ -148,18 +149,20 @@ def effect_summary(
         relative: bool = True,
         min_effect: float | None = None,
         treated_unit: str | None = None,
+        include_pretrend_check: bool = True,
     ) -> EffectSummary:
         """
         Generate a decision-ready summary of causal effects.
 
         Supports Interrupted Time Series (ITS), Synthetic Control, Difference-in-Differences (DiD),
-        and Regression Discontinuity (RD) experiments. Works with both PyMC (Bayesian) and OLS models.
-        Automatically detects experiment type and model type, generating appropriate summary.
+        Regression Discontinuity (RD), and Event Study experiments. Works with both PyMC (Bayesian)
+        and OLS models. Automatically detects experiment type and model type, generating
+        appropriate summary.
 
         Parameters
         ----------
         window : str, tuple, or slice, default="post"
-            Time window for analysis (ITS/SC only, ignored for DiD/RD):
+            Time window for analysis (ITS/SC only, ignored for DiD/RD/EventStudy):
             - "post": All post-treatment time points (default)
             - (start, end): Tuple of start and end times (handles both datetime and integer indices)
             - slice: Python slice object for integer indices
@@ -171,16 +174,19 @@ def effect_summary(
         alpha : float, default=0.05
             Significance level for HDI/CI intervals (1-alpha confidence level)
         cumulative : bool, default=True
-            Whether to include cumulative effect statistics (ITS/SC only, ignored for DiD/RD)
+            Whether to include cumulative effect statistics (ITS/SC only, ignored for DiD/RD/EventStudy)
         relative : bool, default=True
             Whether to include relative effect statistics (% change vs counterfactual)
-            (ITS/SC only, ignored for DiD/RD)
+            (ITS/SC only, ignored for DiD/RD/EventStudy)
         min_effect : float, optional
             Region of Practical Equivalence (ROPE) threshold (PyMC only, ignored for OLS).
             If provided, reports P(|effect| > min_effect) for two-sided or P(effect > min_effect) for one-sided.
         treated_unit : str, optional
             For multi-unit experiments (Synthetic Control), specify which treated unit
             to analyze. If None and multiple units exist, uses first unit.
+        include_pretrend_check : bool, default=True
+            Whether to include parallel trends analysis in prose summary (Event Study only).
+            When True, checks if pre-treatment coefficient HDIs include zero.
 
         Returns
         -------
@@ -193,7 +199,16 @@ def effect_summary(
         # Check if PyMC or OLS model
         is_pymc = isinstance(self.model, PyMCModel)
 
-        if experiment_type == "rd":
+        if experiment_type == "event_study":
+            # Event Study: time-varying effects over event time
+            return _effect_summary_event_study(
+                self,
+                direction=direction,
+                alpha=alpha,
+                min_effect=min_effect,
+                include_pretrend_check=include_pretrend_check,
+            )
+        elif experiment_type == "rd":
             # Regression Discontinuity: scalar effect, no time dimension
             return _effect_summary_rd(
                 self,