You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19
Original file line number
Diff line number
Diff line change
@@ -144,6 +144,25 @@ Regression discontinuity designs are used when treatment is applied to units acc
144
144
145
145
> The data, model fit, and counterfactual are plotted (top). Frequentist analysis shows the causal impact with the blue shaded region, but this is not shown in the Bayesian analysis to avoid a cluttered chart. Instead, the Bayesian analysis shows shaded Bayesian credible regions of the model fits. The Frequentist analysis visualises the point estimate of the causal impact, but the Bayesian analysis also plots the posterior distribution of the regression discontinuity effect (bottom).
146
146
147
+
### Regression kink designs
148
+
149
+
Regression discontinuity designs are used when treatment is applied to units according to a cutoff on a running variable, which is typically not time. By looking for the presence of a discontinuity at the precise point of the treatment cutoff then we can make causal claims about the potential impact of the treatment.
> The data and model fit. The Bayesian analysis shows the posterior mean with credible intervals (shaded regions). We also report the Bayesian $R^2$ on the data along with the posterior mean and credible intervals of the change in gradient at the kink point.
165
+
147
166
### Interrupted time series
148
167
149
168
Interrupted time series analysis is appropriate when you have a time series of observations which undergo treatment at a particular point in time. This kind of analysis has no control group and looks for the presence of a change in the outcome measure at or soon after the treatment time. Multiple predictors can be included.
filtered_data=self.data.query(f"{fmin} <= x <= {fmax}")
@@ -836,7 +836,7 @@ def __init__(
836
836
self.score=self.model.score(X=self.X, y=self.y)
837
837
838
838
# get the model predictions of the observed data
839
-
ifself.bandwidthisnotNone:
839
+
ifself.bandwidthisnotnp.inf:
840
840
xi=np.linspace(fmin, fmax, 200)
841
841
else:
842
842
xi=np.linspace(
@@ -903,7 +903,7 @@ def plot(self):
903
903
self.data,
904
904
x=self.running_variable_name,
905
905
y=self.outcome_variable_name,
906
-
c="k",# hue="treated",
906
+
c="k",
907
907
ax=ax,
908
908
)
909
909
@@ -939,7 +939,7 @@ def plot(self):
939
939
labels=labels,
940
940
fontsize=LEGEND_FONT_SIZE,
941
941
)
942
-
return(fig, ax)
942
+
returnfig, ax
943
943
944
944
defsummary(self) ->None:
945
945
"""
@@ -957,6 +957,220 @@ def summary(self) -> None:
957
957
self.print_coefficients()
958
958
959
959
960
+
classRegressionKink(ExperimentalDesign):
961
+
"""
962
+
A class to analyse sharp regression kink experiments.
963
+
964
+
:param data:
965
+
A pandas dataframe
966
+
:param formula:
967
+
A statistical model formula
968
+
:param kink_point:
969
+
A scalar threshold value at which there is a change in the first derivative of
970
+
the assignment function
971
+
:param model:
972
+
A PyMC model
973
+
:param running_variable_name:
974
+
The name of the predictor variable that the kink_point is based upon
975
+
:param epsilon:
976
+
A small scalar value which determines how far above and below the kink point to
977
+
evaluate the causal impact.
978
+
:param bandwidth:
979
+
Data outside of the bandwidth (relative to the discontinuity) is not used to fit
980
+
the model.
981
+
"""
982
+
983
+
def__init__(
984
+
self,
985
+
data: pd.DataFrame,
986
+
formula: str,
987
+
kink_point: float,
988
+
model=None,
989
+
running_variable_name: str="x",
990
+
epsilon: float=0.001,
991
+
bandwidth: float=np.inf,
992
+
**kwargs,
993
+
):
994
+
super().__init__(model=model, **kwargs)
995
+
self.expt_type="Regression Kink"
996
+
self.data=data
997
+
self.formula=formula
998
+
self.running_variable_name=running_variable_name
999
+
self.kink_point=kink_point
1000
+
self.epsilon=epsilon
1001
+
self.bandwidth=bandwidth
1002
+
self._input_validation()
1003
+
1004
+
ifself.bandwidthisnotnp.inf:
1005
+
fmin=self.kink_point-self.bandwidth
1006
+
fmax=self.kink_point+self.bandwidth
1007
+
filtered_data=self.data.query(f"{fmin} <= x <= {fmax}")
1008
+
iflen(filtered_data) <=10:
1009
+
warnings.warn(
1010
+
f"Choice of bandwidth parameter has lead to only {len(filtered_data)} remaining datapoints. Consider increasing the bandwidth parameter.", # noqa: E501
0 commit comments