Skip to content

BenjaminBenHarosh/observational_incompleteness_luminosity_function

Repository files navigation

Observational Incompleteness in Galaxy Data

This repository documents a practical workflow for studying and modeling observational incompleteness in astronomy datasets.

It is based on coursework material and includes:

  • assignment brief and report,
  • notebooks for exploratory analysis and statistical modeling,
  • source data used in the analysis.

Repository Structure

  • docs/ - assignment brief and submitted report PDF files.
  • data/ - raw and derived datasets used by notebooks and scripts.
  • notebooks/ - original Jupyter notebooks for exploratory analysis.
  • src/incompleteness/ - reusable analysis package (models, fitting, plotting, CLI).
  • scripts/main.py - executable entry point with main() for full pipeline runs.
  • requirements.txt - Python dependencies for running notebooks.

Included Analysis Notebooks

  • notebooks/galaxy_colours_SDSS.ipynb - SDSS galaxy color analysis.
  • notebooks/IMF_distribution.ipynb - initial mass function distribution work.
  • notebooks/Introducing Markov Chain Monte Carlo.ipynb - MCMC examples and inference.
  • notebooks/StAN distributions.ipynb - distribution-focused assignment notebook.

Quick Start

  1. Create and activate a virtual environment:
    • python3 -m venv .venv
    • source .venv/bin/activate
  2. Install dependencies:
    • pip install -r requirements.txt
  3. Run the report-aligned pipeline (raw data path):
    • PYTHONPATH=src python -m incompleteness --data data/SDSS_16_r-band_galaxy_data.csv --output results --n-boot 100 --n-bins 10 --output-table-format both
    • alternative script wrapper: python scripts/main.py --data data/SDSS_16_r-band_galaxy_data.csv --output results --n-boot 100
  4. Optional custom bin edges:
    • PYTHONPATH=src python -m incompleteness --z-bin-edges "0.0,0.05,0.1,0.2,0.35,0.5"
  5. Optional rerun from pre-cleaned data:
    • PYTHONPATH=src python -m incompleteness --run-mode from-cleaned --cleaned-data results/intermediate/cleaned_data.parquet
  6. Optional incompleteness fit mode with free per-bin m0:
    • PYTHONPATH=src python -m incompleteness --incompleteness-mode fit-m0 --m0-min 15.0 --m0-max 21.0
  7. Optional progress control for long runs:
    • default with bars: --progress
    • quiet mode: --no-progress
  8. Main outputs include:
    • eda/ plots before/after filtering
    • models/ parameter-sweep model plots
    • fits/ per-redshift-bin fit overlays
    • summaries/raw_summary.json and summaries/cleaned_summary.json
    • bin_fit_summary.{csv,parquet}, bin_confidence_summary.{csv,parquet}, bin_fit_quality_summary.{csv,parquet}, bin_absolute_summary.{csv,parquet}
    • fit_quality_aggregate.{json,csv,parquet} (model-comparison summary)
    • alpha_ci_combined.png and mstar_ci_combined.png (report-style 90% CI panels)
    • run_manifest.json
    • fit overlays use Sturges histogram bins by default for report-style visualization
  9. Run the test suite:
    • pytest
  10. Optional: launch Jupyter for notebook exploration:
    • jupyter notebook

Reproducibility Notes

  • Data files are kept in data/ and loaded by notebooks.
  • Assignment context is stored in docs/.
  • If you clone this repository, check notebook paths if you move files.
  • Pipeline outputs are written to results/:
    • bin_fit_summary.csv
    • per-bin fit plots in results/fits/
    • redshift-evolution plots for alpha and m_star

Assignment-Aligned Functionality Implemented

  • Data filtering rules from the assignment:
    • remove r_mag > 21
    • remove redshift < 0.01
    • remove large redshift errors (redshift_error > 0.001, configurable in code)
  • Schechter luminosity function in magnitude form.
  • Incompleteness-corrected luminosity model with p(m) = min(exp(-(m - m0)), 1).
  • Redshift splitting into 10 bins over [0, 0.5].
  • MLE fitting for each bin:
    • plain Schechter
    • incompleteness-corrected Schechter
  • Bootstrap confidence intervals for alpha and m_star.
  • Optional adaptive incompleteness (m0=17.6 baseline and m0=19.1 for z > 0.3) matching the report discussion.
  • Fit diagnostics include AIC/BIC/KS/RMSE and per-bin winner labels for plain vs incompleteness-corrected models.

Releases

No releases published

Packages

 
 
 

Contributors

Languages