This repository documents a practical workflow for studying and modeling observational incompleteness in astronomy datasets.
It is based on coursework material and includes:
- assignment brief and report,
- notebooks for exploratory analysis and statistical modeling,
- source data used in the analysis.
docs/- assignment brief and submitted report PDF files.data/- raw and derived datasets used by notebooks and scripts.notebooks/- original Jupyter notebooks for exploratory analysis.src/incompleteness/- reusable analysis package (models, fitting, plotting, CLI).scripts/main.py- executable entry point withmain()for full pipeline runs.requirements.txt- Python dependencies for running notebooks.
notebooks/galaxy_colours_SDSS.ipynb- SDSS galaxy color analysis.notebooks/IMF_distribution.ipynb- initial mass function distribution work.notebooks/Introducing Markov Chain Monte Carlo.ipynb- MCMC examples and inference.notebooks/StAN distributions.ipynb- distribution-focused assignment notebook.
- Create and activate a virtual environment:
python3 -m venv .venvsource .venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Run the report-aligned pipeline (raw data path):
PYTHONPATH=src python -m incompleteness --data data/SDSS_16_r-band_galaxy_data.csv --output results --n-boot 100 --n-bins 10 --output-table-format both- alternative script wrapper:
python scripts/main.py --data data/SDSS_16_r-band_galaxy_data.csv --output results --n-boot 100
- Optional custom bin edges:
PYTHONPATH=src python -m incompleteness --z-bin-edges "0.0,0.05,0.1,0.2,0.35,0.5"
- Optional rerun from pre-cleaned data:
PYTHONPATH=src python -m incompleteness --run-mode from-cleaned --cleaned-data results/intermediate/cleaned_data.parquet
- Optional incompleteness fit mode with free per-bin
m0:PYTHONPATH=src python -m incompleteness --incompleteness-mode fit-m0 --m0-min 15.0 --m0-max 21.0
- Optional progress control for long runs:
- default with bars:
--progress - quiet mode:
--no-progress
- default with bars:
- Main outputs include:
eda/plots before/after filteringmodels/parameter-sweep model plotsfits/per-redshift-bin fit overlayssummaries/raw_summary.jsonandsummaries/cleaned_summary.jsonbin_fit_summary.{csv,parquet},bin_confidence_summary.{csv,parquet},bin_fit_quality_summary.{csv,parquet},bin_absolute_summary.{csv,parquet}fit_quality_aggregate.{json,csv,parquet}(model-comparison summary)alpha_ci_combined.pngandmstar_ci_combined.png(report-style 90% CI panels)run_manifest.json- fit overlays use Sturges histogram bins by default for report-style visualization
- Run the test suite:
pytest
- Optional: launch Jupyter for notebook exploration:
jupyter notebook
- Data files are kept in
data/and loaded by notebooks. - Assignment context is stored in
docs/. - If you clone this repository, check notebook paths if you move files.
- Pipeline outputs are written to
results/:bin_fit_summary.csv- per-bin fit plots in
results/fits/ - redshift-evolution plots for
alphaandm_star
- Data filtering rules from the assignment:
- remove
r_mag > 21 - remove
redshift < 0.01 - remove large redshift errors (
redshift_error > 0.001, configurable in code)
- remove
- Schechter luminosity function in magnitude form.
- Incompleteness-corrected luminosity model with
p(m) = min(exp(-(m - m0)), 1). - Redshift splitting into 10 bins over
[0, 0.5]. - MLE fitting for each bin:
- plain Schechter
- incompleteness-corrected Schechter
- Bootstrap confidence intervals for
alphaandm_star. - Optional adaptive incompleteness (
m0=17.6baseline andm0=19.1forz > 0.3) matching the report discussion. - Fit diagnostics include AIC/BIC/KS/RMSE and per-bin winner labels for plain vs incompleteness-corrected models.