StaggeredDifferenceInDifferences#

class causalpy.experiments.staggered_did.StaggeredDifferenceInDifferences[source]#

A class to analyse data from staggered adoption Difference-in-Differences settings.

This class implements the Borusyak, Jaravel, and Spiess (BJS, 2024) imputation estimator for staggered adoption settings. It fits a model on untreated observations only (pre-treatment periods for eventually-treated units plus all periods for never-treated units), then predicts counterfactual outcomes for all observations. Treatment effects are computed as the difference between observed and predicted outcomes for treated observations.

Assumptions#

This estimator requires the following identifying assumptions:

  1. Absorbing treatment: Once a unit receives treatment, it must remain treated in all subsequent periods. Treatment cannot be reversed or temporarily suspended. This is validated at runtime.

  2. Parallel trends: In the absence of treatment, treated and control units would have followed parallel outcome trajectories.

  3. No anticipation: Units do not change their behavior in anticipation of future treatment.

type data:

DataFrame

param data:

A pandas dataframe with panel data (unit x time observations).

type data:

pd.DataFrame

type formula:

str

param formula:

A statistical model formula. Recommended: “y ~ 1 + C(unit) + C(time)” for unit and time fixed effects.

type formula:

str

type unit_variable_name:

str

param unit_variable_name:

Name of the column identifying units.

type unit_variable_name:

str

type time_variable_name:

str

param time_variable_name:

Name of the column identifying time periods.

type time_variable_name:

str

type treated_variable_name:

str

param treated_variable_name:

Name of the column indicating treatment status (0/1). Defaults to “treated”.

type treated_variable_name:

str, optional

type treatment_time_variable_name:

str | None

param treatment_time_variable_name:

Name of the column containing unit-level treatment time (G_i). If None, treatment time is inferred from the treated_variable_name column.

type treatment_time_variable_name:

str, optional

type never_treated_value:

Any

param never_treated_value:

Value indicating never-treated units in treatment_time column. Defaults to np.inf.

type never_treated_value:

Any, optional

type model:

PyMCModel | RegressorMixin | None

param model:

A model for the untreated outcome. Defaults to LinearRegression.

type model:

PyMCModel or RegressorMixin, optional

type event_window:

tuple[int, int] | None

param event_window:

Tuple (min_event_time, max_event_time) to restrict event-time aggregation. If None, uses all available event-times.

type event_window:

tuple[int, int], optional

type reference_event_time:

int

param reference_event_time:

Event-time index associated with plots (reserved for future use). Defaults to -1.

type reference_event_time:

int, optional

data_#

Augmented data with G (treatment time), event_time, y_hat0 (counterfactual), and tau_hat (treatment effect) columns.

Type:

pd.DataFrame

att_group_time_#

Group-time ATT estimates: ATT(g, t) for each cohort g and calendar time t.

Type:

pd.DataFrame

att_event_time_#

Event-time ATT estimates: ATT(e) for each event-time e = t - G.

Type:

pd.DataFrame

Notes

Panel Balance: This implementation supports both balanced and unbalanced panel data. While balanced panels (where each unit is observed in every time period) are common in staggered DiD applications, the imputation-based approach of Borusyak et al. (2024) can accommodate unbalanced panels. The key requirement is that treatment timing is well-defined for each unit, not that all units are observed in all periods. Unit and observation counts in the summary output are computed without assuming balanced panels.

Example

>>> import causalpy as cp
>>> from causalpy.data.simulate_data import generate_staggered_did_data
>>> df = generate_staggered_did_data(n_units=30, n_time_periods=15, seed=42)
>>> result = cp.StaggeredDifferenceInDifferences(
...     df,
...     formula="y ~ 1 + C(unit) + C(time)",
...     unit_variable_name="unit",
...     time_variable_name="time",
...     treated_variable_name="treated",
...     treatment_time_variable_name="treatment_time",
...     model=cp.pymc_models.LinearRegression(
...         sample_kwargs={
...             "tune": 100,
...             "draws": 200,
...             "chains": 2,
...             "progressbar": False,
...         }
...     ),
... )

References

Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event Study Designs: Robust and Efficient Estimation. Review of Economic Studies.

Methods

StaggeredDifferenceInDifferences.__init__(...)

StaggeredDifferenceInDifferences.algorithm()

Run the experiment algorithm: fit model, predict counterfactuals, and aggregate effects.

StaggeredDifferenceInDifferences.effect_summary(*)

Generate a decision-ready summary of causal effects for Staggered Difference-in-Differences.

StaggeredDifferenceInDifferences.fit(*args, ...)

StaggeredDifferenceInDifferences.get_plot_data(...)

Recover the data of an experiment along with the prediction and causal impact information.

StaggeredDifferenceInDifferences.get_plot_data_bayesian([...])

Get plotting data for Bayesian model.

StaggeredDifferenceInDifferences.get_plot_data_ols()

Get plotting data for OLS model.

StaggeredDifferenceInDifferences.input_validation()

Validate the input data and parameters.

StaggeredDifferenceInDifferences.plot(*args, ...)

Plot the model.

StaggeredDifferenceInDifferences.print_coefficients([...])

Ask the model to print its coefficients.

StaggeredDifferenceInDifferences.summary([...])

Print summary of main results.

Attributes

idata

Return the InferenceData object of the model.

supports_bayes

supports_ols

labels

__init__(data, formula, unit_variable_name, time_variable_name, treated_variable_name='treated', treatment_time_variable_name=None, never_treated_value=inf, model=None, event_window=None, reference_event_time=-1, **kwargs)[source]#
Parameters:
Return type:

None

classmethod __new__(*args, **kwargs)#