SpilloverDiD survey_design= on HC1/CR1 via Binder TSL (Wave E.1)#468
SpilloverDiD survey_design= on HC1/CR1 via Binder TSL (Wave E.1)#468igerber wants to merge 1 commit into
Conversation
|
Overall assessment ✅ Looks good Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
ec641b8 to
15044a4
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment ✅ Looks good Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
15044a4 to
a161725
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
Path to Approval
|
a161725 to
9c9532e
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings remain in this re-review. Executive Summary
Methodology
Code QualityNo findings. PerformanceNo findings. MaintainabilityNo findings. Tech Debt
SecurityNo findings. Documentation/Tests
|
9c9532e to
0391c79
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The Spillover Wave E.1 survey work looks coherent in this re-review, but the PR still leaves unmitigated P1 methodology/reporting issues in the Executive Summary
Path to Approval
Methodology
Code Quality Performance Maintainability Tech Debt
Security Documentation/Tests
|
0391c79 to
a04c96b
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
a04c96b to
c0e8967
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. Affected methods are Code Quality No findings. The changed inference sites still route through Performance No findings. The earlier O( Maintainability No findings. The new Tech Debt No findings. The remaining deferred pieces from this initiative, Security No findings. Documentation/Tests
|
c0e8967 to
5866913
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. The changed Wave E.1 survey-design path is consistent with the registry’s documented synthesis and with the cited source material at the level claimed by the PR: Binder-style design-based variance for smooth IF/regression-based modern DiD estimators, Gardner’s untreated-first two-stage structure, and Butts’s spillover DiD identification setting. (arxiv.org) Executive Summary
Methodology No findings. Affected methods are the Wave E.1 survey-design variance path, the survey-weighted scalar No findings. The potentially controversial behaviors are explicitly documented rather than silent deviations: Code Quality No findings. The modified inference call sites still route through Performance No findings. Maintainability No findings. The changed survey helpers consistently propagate Tech Debt No findings. The remaining Wave E.2 Conley×survey and replicate-weight follow-ups are explicitly tracked in Security No findings. Documentation/Tests
No further findings. The prior helper-vs-end-to-end test-description issue is resolved at |
5866913 to
fe7c2cb
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. The changed Wave E.1 Executive Summary
Methodology No findings. Affected methods are the new Wave E.1 survey-design variance path for The potentially controversial behaviors are documented rather than silent: survey-weighted event-study shares, Code Quality No findings. The changed inference sites continue to route through Performance No findings. The new Binder helper avoids repeated per-PSU scans by using a single Maintainability No findings. The changed survey helper paths consistently preserve Tech Debt No findings. The remaining Conley×survey and replicate-weight follow-ups are explicitly tracked in Security No findings. Documentation/Tests
No further findings. The prior prose mismatch around Static review only: |
fe7c2cb to
cce7bed
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Affected methods are the new
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
Path to Approval
|
cce7bed to
8c79065
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
8c79065 to
bf28e10
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
MethodologyAffected methods reviewed: Against the cited papers and the registry, the new survey path is internally coherent: survey weights are threaded through stage 1,
No other methodology findings. Code QualityNo findings. PerformanceNo findings. MaintainabilityNo findings. Tech DebtNo findings. The remaining Conley+survey, replicate-weight survey variance, and subpopulation-preservation follow-ups are explicitly tracked in SecurityNo findings. Documentation/TestsNo findings in the changed docs/tests. The prior docs/code mismatch around Verification note: this was a static review only. I could not run the tests here because the environment is missing |
Composes Gerber (2026, arXiv:2605.04124) Proposition 1 Binder Taylor Series
Linearization for IF representations of smooth functionals -- explicitly
derived for TwoStageDiD in the paper's Appendix -- with the Wave D Gardner
GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4)
applied to SpilloverDiD's ring-indicator stage-2 design. No reference
software combines all ingredients.
Mechanical composition: SpilloverDiD's per-obs Wave D IF
`psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` (with
survey weights threaded through gamma_hat solve, eps construction, and
bread inversion via Hajek normalization) is aggregated to PSU totals and
passed to the audited `_compute_stratified_meat_from_psu_scores` Binder
TSL meat helper. Stage-1 FE estimation extends `_iterative_fe_subset`
with a `weights=` kwarg implementing WLS-FE via weighted bincount; the
`weights is None` path is bit-identical to Wave B/C/D unweighted bincount.
Identification support correctness: under survey_design, stage-1 FE
support and connectivity are evaluated on the POSITIVE-WEIGHT portion
of Omega_0 (`omega_0_effective = omega_0_mask & (survey_weights > 0)`),
consistently for unsupported-period / unsupported-unit checks,
`_check_omega_0_connectivity`, and `stage1_n_obs`. Zero-weight rows are
outside the WLS estimating sample; treating them as identifying support
would silently corrupt point estimates and SEs when raw Omega_0
membership masquerades as positive-weight support.
Event-study scalar `att`: under survey_design, per-horizon shares are
SURVEY-WEIGHT TOTALS rather than raw observation counts. Using raw
n_obs_per_col shares on weighted WLS horizon coefficients would mix
unweighted aggregation with weighted horizons and target the wrong
estimand. The same survey-weight totals enter both `att` and
`Var(att) = w' V_subset w`, keeping the lincom variance consistent
with the point estimate.
Cluster-vs-PSU resolution: `cluster=<col> + survey_design.psu` warns
and uses PSU (TwoStageDiD parity). `cluster=<col> + survey_design`
without PSU injects cluster as the effective PSU via
`_inject_cluster_as_psu`, which now honors `SurveyDesign.nest`: under
`nest=False`, cluster labels must be globally unique across strata
(raises if they repeat, matching the explicit-PSU resolver's contract).
The result's `cluster_name` reports the effective PSU label when PSU
wins, not the user-supplied cluster column.
DOF: `ResolvedSurveyDesign.df_survey` (4-way branch: PSU+strata ->
n_PSU - n_strata; PSU only -> n_PSU - 1; strata only -> n_obs - n_strata;
neither -> n_obs - 1) threaded through all four `safe_inference` call
sites (aggregate tau_total, per-ring delta_j, event-study per-event-time
tau_k / delta_jk, scalar att lincom).
Survey-array subsetting: when `finite_mask` drops baseline-treated rows,
`survey_weights` and `ResolvedSurveyDesign.{weights, strata, psu, fpc,
replicate_weights}` are subsetted in parallel; `n_psu`, `n_strata`, and
`survey_metadata` are recomputed after PSU injection so summary() /
to_dict() reflect the actual variance design.
Saturated `df_survey = 0` NaN-fail: when `lonely_psu="remove"` removes
all strata (singleton PSUs), the meat helper returns `(_, var_computed=False,
legit_zero=0)` and SpilloverDiD's Wave E.1 path returns NaN meat with a
UserWarning matching "df_survey" so callers can `pytest.warns(match="df_survey")`.
This is a departure from TwoStageDiD (`two_stage.py:2003-2005`) which
currently NaN-fails silently; Wave E.1 surfaces the diagnostic per
`feedback_no_silent_failures`.
Public surface restrictions:
- `vcov_type="conley" + survey_design=` raises NotImplementedError pointing
at planned Wave E.2 (Conley x survey product-kernel synthesis with
within-stratum Conley sandwich on PSU totals).
- Replicate-weight variance (BRR/Fay/JK1/JKn/SDR) raises NotImplementedError
per Gerber (2026) Appendix A (IF-reweighting shortcut does NOT apply
to TwoStageDiD-class estimators because gamma_hat is weight-sensitive;
correct support requires per-replicate full re-fit).
- Non-pweight (`weight_type ∈ {"fweight", "aweight"}`) raises ValueError
(the Binder TSL assumes probability weights).
Implementation:
- `_compute_gmm_corrected_meat` extended with `survey_weights` +
`resolved_survey` kwargs at `diff_diff/two_stage.py`.
- New module-level helper `_compute_binder_tsl_meat` at
`diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores`
with implicit per-obs PSU synthesis for no-PSU survey designs (matches
`ResolvedSurveyDesign.df_survey` no-PSU branches).
- `_iterative_fe_subset` weighted path with bit-identical no-weights
fallback + positive-weight identification gate.
- `_inject_cluster_as_psu` honors `nest` (shared helper fix that also
benefits TwoStageDiD survey path; 18 existing TwoStageDiD survey tests
unaffected).
- `ResolvedSurveyDesign` gains a `nest` field propagated through all 5
construction sites in resolve()/subpopulation()/replicate-helpers.
- `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata`
fields at `diff_diff/results.py`.
Tests: 24 new invariants across `TestSpilloverDiDWaveE1SurveyDesignHc1`
(17: bit-identity fallback, Binder TSL hand-check uniform + non-uniform
weights, lonely_psu, FPC e1/e2/e3, saturated NaN-fail with
`pytest.warns(match="df_survey")`, rejections (conley+survey, replicate,
non-pweight), fit idempotency, finite_mask subsetting,
no-PSU regressions (weights-only, weights+strata, cluster+survey-no-PSU,
cluster overlap with nest=False/True), zero-weight Omega_0 exclusion +
all-zero raises) and `TestSpilloverDiDWaveE1SurveyDesignEventStudy`
(7: both is_staggered branches with df_survey lincom verification,
distinguishability between survey-share and sample-share lincom rules
via cohort-correlated weights + non-constant tau_k + manual reconstruction,
aggregate-vs-event-study parity, drift goldens, subset-path invariant).
Docs: REGISTRY new "Variance (Wave E.1)" subsection + restrictions block
update + survey-weighted share rule note; api/spillover.rst parallel
.. note:: block; references.rst adds Gerber (2026); CHANGELOG [Unreleased]
new bullet; TODO.md replaces deferred row with Wave E.2 +
replicate-weight follow-ups; llms.txt + README.md catalog entries appended.
Folds the `.gitignore` carryover (survey-did-paper-arxiv-v2/ entry).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bf28e10 to
967c6a3
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology The affected method set is
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
survey_design=forvcov_type ∈ {hc1}pluscluster=<col>(CR1), closing the Wave B/C/DNotImplementedErrorgate. Documented synthesis of Gerber (2026, arXiv:2605.04124) Proposition 1 (Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the Appendix) with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all ingredients. Mechanical: Wave D Psi → PSU-aggregated → audited Binder TSL meat; survey weights enter via Hájek normalization at gamma_hat / eps / bread.event_study=FalseANDevent_study=Truebranches; bothis_staggered=TrueANDis_staggered=Falsepaths audited perfeedback_cohort_loop_trigger_cache_both_branches.vcov_type="conley" + survey_design=→NotImplementedError(Wave E.2 will compose Conley × survey product-kernel with within-stratum Conley sandwich on PSU totals); replicate-weight variance (BRR/Fay/JK1/JKn/SDR) →NotImplementedError(Gerber 2026 Appendix A notes the IF-reweighting shortcut does not apply because gamma_hat is weight-sensitive — needs per-replicate full re-fit); non-pweight →ValueError. Tracked inTODO.md._inject_cluster_as_psunow honorsSurveyDesign.nest(raises on cross-stratum cluster overlap undernest=False), matching the explicit-PSU resolver. Also benefits TwoStageDiD survey path (18 existing TwoStageDiD survey tests pass unchanged).Methodology references
df_survey = 0(singleton PSUs +lonely_psu="remove"), Wave E.1 surfaces aUserWarningmatching"df_survey"and NaN-fails. TwoStageDiD's_compute_gmm_variance(two_stage.py:2003-2005) currently NaN-fails silently; Wave E.1 surfaces the diagnostic perfeedback_no_silent_failures. Documented as a**Note:**indocs/methodology/REGISTRY.mdSpilloverDiD "Variance (Wave E.1)" subsection.docs/methodology/REGISTRY.mdSpilloverDiD section anddocs/api/spillover.rst.Validation
TestSpilloverDiDWaveE1SurveyDesignHc1(17 invariants — bit-identity fallback, Binder TSL hand-check uniform + non-uniform, lonely_psu, FPC degenerate limits ×3, saturated NaN-fail withpytest.warns(match="df_survey"), rejections (conley+survey, replicate, non-pweight), fit idempotency, finite_mask subsetting, no-PSU regressions (weights-only, weights+strata, cluster+survey-no-PSU, cluster overlap nest=False/True), zero-weight Omega_0 exclusion + all-zero raises) +TestSpilloverDiDWaveE1SurveyDesignEventStudy(7 — bothis_staggeredbranches withdf_surveylincom verification, distinguishability between survey-share and sample-share lincom rules via cohort-correlated weights + non-constant tau_k + manual reconstruction, aggregate-vs-event-study parity, drift goldens, subset-path invariant).rtol=1e-12, atol=1e-14perfeedback_assert_allclose_numerical_parity.Security / privacy
🤖 Generated with Claude Code