StaggeredDifferenceInDifferences#

class causalpy.experiments.staggered_did.StaggeredDifferenceInDifferences[source]#

A class to analyse data from staggered adoption Difference-in-Differences settings.

This class implements the Borusyak, Jaravel, and Spiess (BJS, 2024) imputation estimator for staggered adoption settings. It fits a model on untreated observations only (pre-treatment periods for eventually-treated units plus all periods for never-treated units), then predicts counterfactual outcomes for all observations. Treatment effects are computed as the difference between observed and predicted outcomes for treated observations.

Assumptions#

This estimator requires the following identifying assumptions:

Absorbing treatment: Once a unit receives treatment, it must remain treated in all subsequent periods. Treatment cannot be reversed or temporarily suspended. This is validated at runtime.
Parallel trends: In the absence of treatment, treated and control units would have followed parallel outcome trajectories.
No anticipation: Units do not change their behavior in anticipation of future treatment.

type data:: DataFrame
param data:: A pandas dataframe with panel data (unit x time observations).
type data:: pd.DataFrame
type formula:: str
param formula:: A statistical model formula. Recommended: “y ~ 1 + C(unit) + C(time)” for unit and time fixed effects.
type formula:: str
type unit_variable_name:: str
param unit_variable_name:: Name of the column identifying units.
type unit_variable_name:: str
type time_variable_name:: str
param time_variable_name:: Name of the column identifying time periods.
type time_variable_name:: str
type treated_variable_name:: str
param treated_variable_name:: Name of the column indicating treatment status (0/1). Defaults to “treated”.
type treated_variable_name:: str, optional
type treatment_time_variable_name:: str | None
param treatment_time_variable_name:: Name of the column containing unit-level treatment time (G_i). If None, treatment time is inferred from the treated_variable_name column.
type treatment_time_variable_name:: str, optional
type never_treated_value:: Any
param never_treated_value:: Value indicating never-treated units in treatment_time column. Defaults to np.inf.
type never_treated_value:: Any, optional
type model:: PyMCModel | RegressorMixin | None
param model:: A model for the untreated outcome. Defaults to LinearRegression.
type model:: PyMCModel or RegressorMixin, optional
type event_window:: tuple[int, int] | None
param event_window:: Tuple (min_event_time, max_event_time) to restrict event-time aggregation. If None, uses all available event-times.
type event_window:: tuple[int, int], optional
type reference_event_time:: int
param reference_event_time:: Event-time index associated with plots (reserved for future use). Defaults to -1.
type reference_event_time:: int, optional

data_#

Augmented data with G (treatment time), event_time, y_hat0 (counterfactual), and tau_hat (treatment effect) columns.

Type:: pd.DataFrame

att_group_time_#

Group-time ATT estimates: ATT(g, t) for each cohort g and calendar time t.

Type:: pd.DataFrame

att_event_time_#

Event-time ATT estimates: ATT(e) for each event-time e = t - G.

Type:: pd.DataFrame

Notes

Panel Balance: This implementation supports both balanced and unbalanced panel data. While balanced panels (where each unit is observed in every time period) are common in staggered DiD applications, the imputation-based approach of Borusyak et al. (2024) can accommodate unbalanced panels. The key requirement is that treatment timing is well-defined for each unit, not that all units are observed in all periods. Unit and observation counts in the summary output are computed without assuming balanced panels.

Example

>>> import causalpy as cp
>>> from causalpy.data.simulate_data import generate_staggered_did_data
>>> df = generate_staggered_did_data(n_units=30, n_time_periods=15, seed=42)
>>> result = cp.StaggeredDifferenceInDifferences(
...     df,
...     formula="y ~ 1 + C(unit) + C(time)",
...     unit_variable_name="unit",
...     time_variable_name="time",
...     treated_variable_name="treated",
...     treatment_time_variable_name="treatment_time",
...     model=cp.pymc_models.LinearRegression(
...         sample_kwargs={
...             "tune": 100,
...             "draws": 200,
...             "chains": 2,
...             "progressbar": False,
...         }
...     ),
... )

References

Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event Study Designs: Robust and Efficient Estimation. Review of Economic Studies.

Methods

`StaggeredDifferenceInDifferences.__init__`(...)
`StaggeredDifferenceInDifferences.algorithm`()	Run the experiment algorithm: fit model, predict counterfactuals, and aggregate effects.
`StaggeredDifferenceInDifferences.effect_summary`(*)	Generate a decision-ready summary of causal effects for Staggered Difference-in-Differences.
`StaggeredDifferenceInDifferences.fit`(*args, ...)
`StaggeredDifferenceInDifferences.get_plot_data`(...)	Recover the data of an experiment along with the prediction and causal impact information.
`StaggeredDifferenceInDifferences.get_plot_data_bayesian`([...])	Get plotting data for Bayesian model.
`StaggeredDifferenceInDifferences.get_plot_data_ols`()	Get plotting data for OLS model.
`StaggeredDifferenceInDifferences.input_validation`()	Validate the input data and parameters.
`StaggeredDifferenceInDifferences.plot`(*args, ...)	Plot the model.
`StaggeredDifferenceInDifferences.print_coefficients`([...])	Ask the model to print its coefficients.
`StaggeredDifferenceInDifferences.summary`([...])	Print summary of main results.

Attributes

`idata`	Return the InferenceData object of the model.
`supports_bayes`
`supports_ols`
`labels`

__init__(data, formula, unit_variable_name, time_variable_name, treated_variable_name='treated', treatment_time_variable_name=None, never_treated_value=inf, model=None, event_window=None, reference_event_time=-1, **kwargs)[source]#

Parameters:

data (DataFrame)
formula (str)
unit_variable_name (str)
time_variable_name (str)
treated_variable_name (str)
treatment_time_variable_name (str | None)
never_treated_value (Any)
model (PyMCModel | RegressorMixin | None)
event_window (tuple[int, int] | None)
reference_event_time (int)
kwargs (dict)

Return type:

None

classmethod __new__(*args, **kwargs)#