LightningLM

Reference training pipeline for the LightningLM 0.1V model family. One architecture, four growth stages: a 2B dense seed grown to 5B MoE, 9B MoE, and a 120B sparse mixture-of-experts trained through TurboQuant-PreTraining (TQP) on a single eight-GPU node.

The 120B model is publicly released on Hugging Face: LightningLM-0.1V-120B-MoE.

Paper

Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling Rohan Shravan. arXiv preprint arXiv:TBD, 2026.

The paper is a systems and experience report describing the full training pipeline this repository implements. It documents the three disciplines the work is organized around — reversibility, state-preserving growth, and single-node economics — and the failure modes the recipe is shaped to avoid.

Released model

Stage	Parameters (stored / active)	Checkpoint
120B sparse MoE	118.67B / 5.93B (top-12 of 460 routed experts)	`LightningLM-0.1V-120B-MoE`

The 5B-MoE, 9B-MoE, and 2B-dense intermediate checkpoints from the same training lineage are also planned for public release.

Quickstart

Install dependencies, run the health check, and launch the 2B seed stage:

bash scripts/setup_stable.sh
python3 scripts/doctor.py
NUM_GPUS=8 bash scripts/run_2b_stage.sh

Grow through 5B, 9B, and launch the 120B TQP stage:

python3 -m lightninglm.growth.dense_to_moe \
  --src results/2b/checkpoint.pt \
  --dst results/5b/init_from_2b.pt \
  --strategy partition
NUM_GPUS=8 bash scripts/run_5b_stage.sh

python3 -m lightninglm.growth.depth_map \
  --src results/5b/checkpoint.pt \
  --dst results/9b/init_from_5b.pt \
  --mapping lightninglm_5b_to_9b
NUM_GPUS=8 bash scripts/run_9b_stage.sh

python3 scripts/build_120b_init.py \
  --src results/9b/checkpoint.pt \
  --dst results/120b/120b_init.pt \
  --config configs/train_120b_tqp.yaml \
  --ratio 0.5 --router_sigma 0.05 --seed 1337
NUM_GPUS=8 bash scripts/run_120b_tqp.sh

The full stage-by-stage workflow lives in docs/cookbook.md.

Documentation

Training cookbook — end-to-end stage-by-stage walkthrough
Data pipeline — shard preparation, tokenization, manifest generation
Tokenizer pipeline — building or adapting the included tokenizer
Runtime hot-config — operator-side controls for router balance, AON continuation, and 120B bring-up
Apache 2.0 license

Repository layout

lightninglm/      model code, training loop, data loading, OPUS, TQP, kernels, growth utilities
configs/          per-stage training and curriculum YAML configs
deepspeed/        DeepSpeed ZeRO configs (zero-1 for 120B TQP, zero-3 for smaller stages)
scripts/          launch scripts, setup, doctor, data and tokenizer tooling, 120B init, tensor hashing
manifests/        curriculum shard manifests (D1-D4 bulk pools, AON guaranteed pools)
tokenizer/        BrahmicTokenizer-131K artifacts and byte-level analysis tools
docs/             training cookbook, data pipeline, tokenizer pipeline, runtime hot-config
data/             local mount points for shard directories (.gitkept placeholders only)
requirements/     pinned dependency manifests
aws/              AWS-specific helpers
experiments/      per-team experiment history (preserved from the project's development)
tests/            test suite for the release pipeline

Companion papers

The LightningLM 0.1V family relies on two companion papers, both implemented in this repository:

BrahmicTokenizer-131K (./tokenizer/) - the 131K tokenizer covering English and the major Brahmic scripts. arXiv:2605.29379.
Kronecker Embeddings (./lightninglm/models/) - byte-level structured embeddings that replace the standard 537M-parameter embedding table with a 33.6M Kronecker construction. arXiv:2605.29459.

Citation

@article{shravan2026reversible,
  title  = {Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling},
  author = {Shravan, Rohan},
  journal = {arXiv preprint arXiv:TBD},
  year   = {2026},
  url    = {https://github.com/The-School-of-AI/LLM}
}

License

Contact

Issues and pull requests: github.com/The-School-of-AI/LLM/issues
Email: rshravan@theschoolofai.in

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
assets		assets
configs		configs
data		data
deepspeed		deepspeed
docs		docs
experiments		experiments
lightninglm		lightninglm
manifests		manifests
requirements		requirements
scripts		scripts
tests		tests
tokenizer		tokenizer
.gitignore		.gitignore
.gitleaksignore		.gitleaksignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.yamllint		.yamllint
CODEOWNERS		CODEOWNERS
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LightningLM

Paper

Released model

Quickstart

Documentation

Repository layout

Companion papers

Citation

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LightningLM

Paper

Released model

Quickstart

Documentation

Repository layout

Companion papers

Citation

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages