Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
34c93da
initial commit
drbenvincent Dec 5, 2025
8d624ba
add new notebook to index so it renders in the docs
drbenvincent Dec 5, 2025
f78eac2
Add notebook indexing guideline to documentation
drbenvincent Dec 5, 2025
7abd4f5
add cell tags
drbenvincent Dec 5, 2025
3e09253
add glossary entry + refs
drbenvincent Dec 5, 2025
bed06c5
update README with event study functionality notes
drbenvincent Dec 5, 2025
da9f77c
fix math formatting
drbenvincent Dec 5, 2025
e2c59f4
add reference
drbenvincent Dec 5, 2025
e7e0562
Refactor EventStudy to use patsy formula for FEs
drbenvincent Dec 5, 2025
ff61e55
Clarify EventStudy formula and event-time dummies
drbenvincent Dec 5, 2025
9e5b790
Add check and warning for staggered adoption in EventStudy
drbenvincent Dec 6, 2025
c4e678e
Add plot customization options to EventStudy
drbenvincent Dec 6, 2025
0915ebe
plot data with seaborn, not matplotlib
drbenvincent Dec 6, 2025
9cff7f0
re-render notebook
drbenvincent Dec 6, 2025
59595d2
run make uml
drbenvincent Dec 6, 2025
e4f82a3
Add event study effect summary and reporting
drbenvincent Dec 6, 2025
b9abbb8
Add more model examples to notebook + add time-varying predictors
drbenvincent Dec 6, 2025
b606970
minor changes + re-run notebook
drbenvincent Dec 6, 2025
deee019
update the homepage (index.md, not README.md)
drbenvincent Dec 6, 2025
af32b60
Add note about dual README files in documentation
drbenvincent Dec 6, 2025
10690c6
Add tests for EventStudy effect_summary method
drbenvincent Dec 6, 2025
b442e4d
re-run notebook
drbenvincent Dec 6, 2025
786a5f8
minor notebook edit
drbenvincent Dec 6, 2025
1d66f25
fix spelling error
drbenvincent Dec 6, 2025
308ee80
Document Event Study support in effect_summary
drbenvincent Dec 6, 2025
5bdca9e
Add comprehensive tests for reporting and synthetic data modules
drbenvincent Dec 6, 2025
15faa7e
Clarify indicator function and dummy variable usage in event study
drbenvincent Dec 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,15 @@

## Documentation

- **Dual README files**: The project has two files that must be kept in sync:
- `README.md` (GitHub landing page)
- `docs/source/index.md` (documentation website homepage)

When adding major new features to the Features table or making other content changes, update **both files**. The Features table lists all quasi-experimental methods supported by CausalPy.
- **Reporting statistics**: When adding new experiment types, update the "Experiment Support" table in `docs/source/knowledgebase/reporting_statistics.md` to document `effect_summary()` support status for the new experiment.
- **Structure**: Notebooks (how-to examples) go in `docs/source/notebooks/`, knowledgebase (educational content) goes in `docs/source/knowledgebase/`
- **Notebook naming**: Use pattern `{method}_{model}.ipynb` (e.g., `did_pymc.ipynb`, `rd_skl.ipynb`), organized by causal method
- **Notebook index**: New notebooks must be added to `docs/source/notebooks/index.md` under the appropriate `toctree` section for them to appear in the rendered documentation
- **MyST directives**: Use `:::{note}` and other MyST features for callouts and formatting
- **Glossary linking**: Link to glossary terms (defined in `glossary.rst`) on first mention in a file:
- In Markdown files (`.md`, `.ipynb`): Use MyST syntax `{term}glossary term``
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ CausalPy has a broad range of quasi-experimental methods for causal inference:
| Geographical lift | Measures the impact of an intervention in a specific geographic area by comparing it to similar areas without the intervention. Commonly used in marketing to assess regional campaigns. |
| ANCOVA | Analysis of Covariance combines ANOVA and regression to control for the effects of one or more quantitative covariates. Used when comparing group means while controlling for other variables. |
| Differences in Differences | Compares the changes in outcomes over time between a treatment group and a control group. Used in observational studies to estimate causal effects by accounting for time trends. |
| Event Study | Estimates dynamic treatment effects over event time (time relative to treatment). Extends difference-in-differences by estimating separate effects for each time period, enabling pre-trend validation and analysis of how causal effects evolve. |
| Regression discontinuity | Identifies causal effects by exploiting a cutoff or threshold in an assignment variable. Used when treatment is assigned based on a threshold value of an observed variable, allowing comparison just above and below the cutoff. |
| Regression kink designs | Focuses on changes in the slope (kinks) of the relationship between variables rather than jumps at cutoff points. Used to identify causal effects when treatment intensity changes at a threshold. |
| Interrupted time series | Analyzes the effect of an intervention by comparing time series data before and after the intervention. Used when data is collected over time and an intervention occurs at a known point, allowing assessment of changes in level or trend. |
Expand Down
2 changes: 2 additions & 0 deletions causalpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

from .data import load_data
from .experiments.diff_in_diff import DifferenceInDifferences
from .experiments.event_study import EventStudy
from .experiments.instrumental_variable import InstrumentalVariable
from .experiments.interrupted_time_series import InterruptedTimeSeries
from .experiments.inverse_propensity_weighting import InversePropensityWeighting
Expand All @@ -30,6 +31,7 @@
__all__ = [
"__version__",
"DifferenceInDifferences",
"EventStudy",
"create_causalpy_compatible_class",
"InstrumentalVariable",
"InterruptedTimeSeries",
Expand Down
217 changes: 217 additions & 0 deletions causalpy/data/simulate_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,11 +440,228 @@ def generate_multicell_geolift_data() -> pd.DataFrame:
return df


def generate_event_study_data(
n_units: int = 20,
n_time: int = 20,
treatment_time: int = 10,
treated_fraction: float = 0.5,
event_window: tuple[int, int] = (-5, 5),
treatment_effects: dict[int, float] | None = None,
unit_fe_sigma: float = 1.0,
time_fe_sigma: float = 0.5,
noise_sigma: float = 0.2,
predictor_effects: dict[str, float] | None = None,
ar_phi: float = 0.9,
ar_scale: float = 1.0,
seed: int | None = None,
) -> pd.DataFrame:
"""
Generate synthetic panel data for event study / dynamic DiD analysis.

Creates panel data with unit and time fixed effects, where a fraction of units
receive treatment at a common treatment time. Treatment effects can vary by
event time (time relative to treatment). Optionally includes time-varying
predictor variables generated via AR(1) processes.

Parameters
----------
n_units : int
Total number of units (treated + control). Default 20.
n_time : int
Number of time periods. Default 20.
treatment_time : int
Time period when treatment occurs (0-indexed). Default 10.
treated_fraction : float
Fraction of units that are treated. Default 0.5.
event_window : tuple[int, int]
Range of event times (K_min, K_max) for which treatment effects are defined.
Default (-5, 5).
treatment_effects : dict[int, float], optional
Dictionary mapping event time k to treatment effect beta_k.
Default creates effects that are 0 for k < 0 (pre-treatment)
and gradually increase post-treatment.
unit_fe_sigma : float
Standard deviation for unit fixed effects. Default 1.0.
time_fe_sigma : float
Standard deviation for time fixed effects. Default 0.5.
noise_sigma : float
Standard deviation for observation noise. Default 0.2.
predictor_effects : dict[str, float], optional
Dictionary mapping predictor names to their true coefficients.
Each predictor is generated as an AR(1) time series that varies over time
but is the same for all units at a given time. For example,
``{'temperature': 0.3, 'humidity': -0.1}`` creates two predictors.
Default None (no predictors).
ar_phi : float
AR(1) autoregressive coefficient controlling persistence of predictors.
Values closer to 1 produce smoother, more persistent series.
Default 0.9.
ar_scale : float
Standard deviation of the AR(1) innovation noise for predictors.
Default 1.0.
seed : int, optional
Random seed for reproducibility.

Returns
-------
pd.DataFrame
Panel data with columns:
- unit: Unit identifier
- time: Time period
- y: Outcome variable
- treat_time: Treatment time for unit (NaN if never treated)
- treated: Whether unit is in treated group (0 or 1)
- <predictor_name>: One column per predictor (if predictor_effects provided)

Example
--------
>>> from causalpy.data.simulate_data import generate_event_study_data
>>> df = generate_event_study_data(
... n_units=20, n_time=20, treatment_time=10, seed=42
... )
>>> df.shape
(400, 5)
>>> df.columns.tolist()
['unit', 'time', 'y', 'treat_time', 'treated']

With predictors:

>>> df = generate_event_study_data(
... n_units=10,
... n_time=10,
... treatment_time=5,
... seed=42,
... predictor_effects={"temperature": 0.3, "humidity": -0.1},
... )
>>> df.shape
(100, 7)
>>> "temperature" in df.columns and "humidity" in df.columns
True
"""
if seed is not None:
np.random.seed(seed)

# Default treatment effects: zero pre-treatment, gradual increase post-treatment
if treatment_effects is None:
treatment_effects = {}
for k in range(event_window[0], event_window[1] + 1):
if k < 0:
treatment_effects[k] = 0.0 # No anticipation
else:
# Gradual treatment effect that increases post-treatment
treatment_effects[k] = 0.5 + 0.1 * k

# Determine treated units
n_treated = int(n_units * treated_fraction)
treated_units = set(range(n_treated))

# Generate unit fixed effects
unit_fe = np.random.normal(0, unit_fe_sigma, n_units)

# Generate time fixed effects
time_fe = np.random.normal(0, time_fe_sigma, n_time)

# Generate predictor time series (if any)
# Each predictor is an AR(1) series that varies over time but is the same
# for all units at a given time
predictors: dict[str, np.ndarray] = {}
if predictor_effects is not None:
for predictor_name in predictor_effects:
predictors[predictor_name] = generate_ar1_series(
n=n_time, phi=ar_phi, scale=ar_scale
)

# Build panel data
data = []
for unit in range(n_units):
is_treated = unit in treated_units
unit_treat_time = treatment_time if is_treated else np.nan

for t in range(n_time):
# Base outcome: unit FE + time FE + noise
y = unit_fe[unit] + time_fe[t] + np.random.normal(0, noise_sigma)

# Add predictor contributions to outcome
if predictor_effects is not None:
for predictor_name, coef in predictor_effects.items():
y += coef * predictors[predictor_name][t]

# Add treatment effect for treated units in event window
if is_treated:
event_time = t - treatment_time
if (
event_window[0] <= event_time <= event_window[1]
and event_time in treatment_effects
):
y += treatment_effects[event_time]

row = {
"unit": unit,
"time": t,
"y": y,
"treat_time": unit_treat_time,
"treated": 1 if is_treated else 0,
}
# Add predictor values to the row
for predictor_name, series in predictors.items():
row[predictor_name] = series[t]

data.append(row)

return pd.DataFrame(data)


# -----------------
# UTILITY FUNCTIONS
# -----------------


def generate_ar1_series(
n: int,
phi: float = 0.9,
scale: float = 1.0,
initial: float = 0.0,
) -> np.ndarray:
"""
Generate an AR(1) autoregressive time series.

The AR(1) process is defined as:
x_{t+1} = phi * x_t + eta_t, where eta_t ~ N(0, scale^2)

Parameters
----------
n : int
Length of the time series to generate.
phi : float
Autoregressive coefficient controlling persistence. Values closer to 1
produce smoother, more persistent series. Must be in (-1, 1) for
stationarity. Default 0.9.
scale : float
Standard deviation of the innovation noise. Default 1.0.
initial : float
Initial value of the series. Default 0.0.

Returns
-------
np.ndarray
Array of length n containing the AR(1) time series.

Example
-------
>>> from causalpy.data.simulate_data import generate_ar1_series
>>> np.random.seed(42)
>>> series = generate_ar1_series(n=10, phi=0.9, scale=0.5)
>>> len(series)
10
"""
series = np.zeros(n)
series[0] = initial
innovations = np.random.normal(0, scale, n - 1)
for t in range(1, n):
series[t] = phi * series[t - 1] + innovations[t - 1]
return series


def generate_seasonality(
n: int = 12, amplitude: int = 1, length_scale: float = 0.5
) -> np.ndarray:
Expand Down
2 changes: 2 additions & 0 deletions causalpy/experiments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"""CausalPy experiment module"""

from .diff_in_diff import DifferenceInDifferences
from .event_study import EventStudy
from .instrumental_variable import InstrumentalVariable
from .interrupted_time_series import InterruptedTimeSeries
from .inverse_propensity_weighting import InversePropensityWeighting
Expand All @@ -24,6 +25,7 @@

__all__ = [
"DifferenceInDifferences",
"EventStudy",
"InstrumentalVariable",
"InversePropensityWeighting",
"PrePostNEGD",
Expand Down
27 changes: 21 additions & 6 deletions causalpy/experiments/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
_compute_statistics_ols,
_detect_experiment_type,
_effect_summary_did,
_effect_summary_event_study,
_effect_summary_rd,
_effect_summary_rkink,
_extract_counterfactual,
Expand Down Expand Up @@ -148,18 +149,20 @@ def effect_summary(
relative: bool = True,
min_effect: float | None = None,
treated_unit: str | None = None,
include_pretrend_check: bool = True,
) -> EffectSummary:
"""
Generate a decision-ready summary of causal effects.

Supports Interrupted Time Series (ITS), Synthetic Control, Difference-in-Differences (DiD),
and Regression Discontinuity (RD) experiments. Works with both PyMC (Bayesian) and OLS models.
Automatically detects experiment type and model type, generating appropriate summary.
Regression Discontinuity (RD), and Event Study experiments. Works with both PyMC (Bayesian)
and OLS models. Automatically detects experiment type and model type, generating
appropriate summary.

Parameters
----------
window : str, tuple, or slice, default="post"
Time window for analysis (ITS/SC only, ignored for DiD/RD):
Time window for analysis (ITS/SC only, ignored for DiD/RD/EventStudy):
- "post": All post-treatment time points (default)
- (start, end): Tuple of start and end times (handles both datetime and integer indices)
- slice: Python slice object for integer indices
Expand All @@ -171,16 +174,19 @@ def effect_summary(
alpha : float, default=0.05
Significance level for HDI/CI intervals (1-alpha confidence level)
cumulative : bool, default=True
Whether to include cumulative effect statistics (ITS/SC only, ignored for DiD/RD)
Whether to include cumulative effect statistics (ITS/SC only, ignored for DiD/RD/EventStudy)
relative : bool, default=True
Whether to include relative effect statistics (% change vs counterfactual)
(ITS/SC only, ignored for DiD/RD)
(ITS/SC only, ignored for DiD/RD/EventStudy)
min_effect : float, optional
Region of Practical Equivalence (ROPE) threshold (PyMC only, ignored for OLS).
If provided, reports P(|effect| > min_effect) for two-sided or P(effect > min_effect) for one-sided.
treated_unit : str, optional
For multi-unit experiments (Synthetic Control), specify which treated unit
to analyze. If None and multiple units exist, uses first unit.
include_pretrend_check : bool, default=True
Whether to include parallel trends analysis in prose summary (Event Study only).
When True, checks if pre-treatment coefficient HDIs include zero.

Returns
-------
Expand All @@ -193,7 +199,16 @@ def effect_summary(
# Check if PyMC or OLS model
is_pymc = isinstance(self.model, PyMCModel)

if experiment_type == "rd":
if experiment_type == "event_study":
# Event Study: time-varying effects over event time
return _effect_summary_event_study(
self,
direction=direction,
alpha=alpha,
min_effect=min_effect,
include_pretrend_check=include_pretrend_check,
)
elif experiment_type == "rd":
# Regression Discontinuity: scalar effect, no time dimension
return _effect_summary_rd(
self,
Expand Down
Loading