Two end-to-end fine-tuning notebooks for large language models using LoRA/QLoRA and the Hugging Face ecosystem.
Fine-tunes tiiuae/falcon-180B on the Databricks Dolly-15k instruction dataset using LoRA + DeepSpeed across 8 GPUs.
Key steps:
- Format Dolly-15k samples into instruction/context/answer prompts
- Tokenise with packed sequences for efficiency
- Launch distributed training via
torchrun --nproc_per_node 8
Hardware: 8× A100 80GB (or equivalent)
Fine-tunes a sharded LLaMA-2 7B model using Hugging Face AutoTrain Advanced with QLoRA (4-bit) in a single-command workflow.
autotrain llm --train \
--project_name "FineTuning Llama-2" \
--model TinyPixel/Llama-2-7B-bf16-sharded \
--data_path timdettmers/openassistant-guanaco \
--use_peft --use_int4 \
--learning_rate 2e-4 \
--num_train_epochs 3pip install -r requirements.txt
huggingface-cli login --token YOUR_HF_TOKENMIT