English · 简体中文
What this is. An end-to-end empirical-finance research project — when a stock is added to a major index, does its price actually move, is the move permanent, and does the effect survive scrutiny (especially in China)? — built solo in Python: a reproducible event-study pipeline, a matched-control design, an interactive research dashboard, ~1,190 tests, and an answer I report honestly, including where it comes back null and where my own identification strategy didn't hold up.
It is deliberately a descriptive study, not a causal-claims paper — see The honest version below.
- The question. Index-inclusion is a classic "demand shock" laboratory: when CSI 300 / S&P 500 reshuffles, passive funds must buy the new names. The textbook prediction is a price pop. I test whether it happens, whether it reverses, and which mechanism drives it — across two markets (CN + US).
- What I found. The announcement-window effect is real and robust in the US (US announce
CAR[-1,+1] ≈ +1.3%, permutationp = 0.0002, holds under event-clustered SE), marginal in China (p ≈ 0.03). But the effective-day window is null everywhere (p > 0.27), and 5 of 7 mechanism hypotheses are inconclusive. That pattern — a shrinking, mostly-anticipated effect — is consistent with the disappearing index effect (Greenwood & Sammon, 2022), here replicated cross-market. - What it demonstrates. Full-stack empirical research (event study, propensity-style matching with covariate balance, pseudo-event placebos, permutation tests, clustered SE, multiple-testing correction), a reproducible pipeline with automated quality gates, and — the part I care most about — knowing and stating the limits of the data rather than manufacturing significance.
A research project is only as good as what it admits. Three things I put up front rather than bury:
- My flagship identification design was not valid, and I say so. I built an HS300 regression-discontinuity (RDD) around the index-membership cutoff. On inspection the "running variable" is a fabricated rank index (evenly spaced
299.85 … 300.28), perfectly collinear with treatment, with zero overlap at the cutoff — mathematically not an RDD at all. I kept the full machinery for reproducibility but downgraded it to an appendix "design that failed identification" instead of presenting it as causal evidence. (why, in detail) - The hypotheses are post-hoc / exploratory. The 7 mechanism hypotheses were formed after seeing the announce-vs-effective and CN-vs-US asymmetries; there is no pre-analysis plan. The main table reports only
evidence_tier = coreresults; small-n / exploratory ones (e.g. H3 with n = 4) stay in the appendix, flagged. - The data has real limits. US market-cap/weights are Yahoo approximations; ~39% of US announcement events are dropped for lack of valid window returns — and that drop is non-random (delisted / acquired tickers), i.e. a survivorship/selection bias I document explicitly (effective
N = 371). (full limitations)
Putting this near the top is intentional: it's exactly the signal I'd want to see from a research hire.
The cross-market-asymmetry (CMA) pipeline emits a verdict per hypothesis on the real sample (index-inclusion-verdict-summary prints the same table). Verdict column is kept in the project's original notation; the right column is the plain-English reading.
| # | Mechanism hypothesis | 裁决 | 写作层级 | Reading (headline stat, n) |
|---|---|---|---|---|
| H1 | Information leakage / pre-run-up | 证据不足 | 正文 core | inconclusive — permutation p = 0.97 (n=455) |
| H2 | Passive-fund AUM gap (demand curve) | 证据不足 | 正文 core | inconclusive — US AUM ratio 13.5×, but effective CAR shows no decay (combined n=18) |
| H3 | Retail vs institutional structure | 支持 | 附录 supplementary | nominally supported, but n = 4, ~zero power → appendix only |
| H4 | Short-sale constraints | 证据不足 | 附录 supplementary | inconclusive — regression p = 0.60 (n=455) |
| H5 | Price-limit (涨跌停) rules | 证据不足 | 正文 core | inconclusive — limit-coef p = 0.43 (n=1096) |
| H6 | Index-weight predictability | 证据不足 | 附录 supplementary | inconclusive — heavy−light spread −0.016 (n=87) |
| H7 | Sector-structure differences | 支持 | 正文 core | supported — US sector spread 5.97, interaction p = 0.095 |
(证据不足 = insufficient evidence; 支持 = supported.) Source of truth: results/real_tables/cma_hypothesis_verdicts.csv (narrative: results/real_tables/research_summary.md). A --sensitivity flag re-runs every verdict across significance thresholds (0.05 → 0.20) with Bonferroni/BH correction; details in docs/sensitivity_workflow.md.
Two findings (H5 price-limits, H2 demand) flipped from "supported" to "inconclusive" once I replaced free Yahoo data with licensed Tushare A-share data. I left the reversal in the record rather than quietly keeping the more flattering numbers.
The descriptive claim ("announce-window strong, effective-window null") is backed by four independent checks, all generated by the pipeline into the results/real_tables/robustness_car_permutation.csv family and results/real_figures/parallel_trends_aar_us_announce.png (one per market/window):
| Check | What it shows | US announce [-1,+1] |
Effective windows |
|---|---|---|---|
| Daily AAR parallel trends | treated vs matched control overlap pre-event, diverge only in the window | clean pre-trend, day-0 jump | — |
| Pseudo-event-date placebo | real CAR sits in the tail of a placebo distribution | p = 0.005 |
p > 0.29 |
| Permutation test (sign-flip, 5,000) | empirical significance under H₀ | p = 0.0002 |
p > 0.27 |
| Event-clustered SE (CRV1, by date) | inference robust to same-day correlation | p = 0.0003 |
not significant |
All three significance tests agree, and the effective-window null holds under every one — the cross-market "anticipated, mostly-gone" story, not a causal index-demand effect.
The whole project is navigable through one local Flask dashboard (http://localhost:5001) — literature, sample, figures and verdicts in a single workflow.
More screenshots (full-page)
There is no public hosted demo — run it locally (below).
make sync # install pinned deps from uv.lock (reproducible)
index-inclusion-dashboard # then open http://localhost:5001
make rebuild # 10 步: re-run the full offline pipeline (events → CMA → figures → report)
make verdicts # print the 7-hypothesis verdict table in the terminal
make ci # lint + type-check + coverage gate + project health checksDashboard modes: / (overview), /?mode=brief (3-min read), /?mode=full (everything), /paper/<id> (single-paper reader + source PDF).
The research is ~11k lines; the rest is the infrastructure that makes it reproducible and auditable end-to-end — built to the standard I'd want a research codebase held to.
- Deterministic, offline pipeline.
index-inclusion-rebuild-allrecomputes every result fromdata/in ~3 min with no network calls; the frozen verdict baseline reproduces unchanged on re-run — apap-diffdrift audit confirms all 7 hypotheses stay put. - Automated quality gates. A custom
doctorframework runs 30 health checks (artifact freshness, schema contracts, chart registry, cross-document consistency) and apaper-integritygate cross-checks that the README/paper numbers actually match the committed CSVs;index-inclusion-paper-skeletonregenerates the paper skeleton straight from the frozen artifacts.make ciis green. - Tested. ~1,190 unit + integration tests (event study, matching + covariate balance, robustness, pipeline
main()integration, dashboard rendering), lint (ruff) andmypyclean. - Honest seeds & snapshots. All randomness is seeded; verdict baselines are snapshotted so any drift in conclusions is visible over time.
Event study (market-adjusted + market-model AR, Patell Z, BMP t) · propensity-style matched controls with Stuart-2010 SMD balance · long-window retention · pseudo-event placebos · sign-flip permutation tests · event-clustered (CRV1) SE · post-hoc power analysis (MDE) · Bonferroni/BH multiple-testing correction.
src/index_inclusion_research/
analysis/ event study, regressions, RDD, cross-market asymmetry, robustness, power
pipeline/ sample construction, matching (+ covariate balance)
outputs/ figure & table builders
dashboard/ web/ Flask app + templates/static (the interactive front-end)
doctor/ project-health check framework
data/ raw/ + processed/
results/ event_study/, regressions/, figures/, tables/, real_*/, literature/
docs/ literature maps, methodology, limitations, identification roadmap (some in Chinese)
tests/ ~1,190 unit + integration tests
Deeper write-ups (several in Chinese): research delivery package · paper outline · limitations · identification roadmap · CLI reference — 43 个 console scripts.
A solo build that takes an established question in the index-inclusion literature and implements it end-to-end — data, event study, matched-control design, robustness, and an interactive research front-end — with the goal of getting the process right (reproducibility, honest inference, clean code) rather than forcing a headline result. Licensed MIT.

