Fine-tuning LLMs with Training Hub
TOC
BackgroundSFT vs OSFTRequirementsData FormatDownload Notebooks and Run ExamplesStep 1 — Install DependenciesStep 2 — Upload or Prepare DataStep 3 — Open and Configure the NotebookStep 4 — Execute TrainingKey ParametersCommon Parameters (SFT and OSFT)OSFT-specific ParametersMulti-node TrainingBackground
training_hub is a Python library that provides a unified, high-level API for running Supervised Fine-Tuning (SFT) and Orthogonal Subspace Fine-Tuning (OSFT) on large language models. It abstracts away the complexity of distributed training configuration, memory management, and backend orchestration, letting you focus on experiment parameters.
Key benefits:
- Unified API: A single function call (
sft(...)orosft(...)) handles single-GPU, multi-GPU, and multi-node training without changing your code. - Automatic memory management: The
max_tokens_per_gpuparameter caps GPU memory usage and automatically computes micro-batch size and gradient accumulation to maintain your targeteffective_batch_size. - OSFT for continual learning: The
osftfunction implements Nayak et al. (2025), arXiv:2504 .07097, which restricts weight updates to orthogonal subspaces — preventing catastrophic forgetting without replay buffers or supplementary datasets. - Production-ready: Built-in checkpointing, experiment tracking, and Liger kernel support for throughput efficiency.
SFT vs OSFT
Requirements
- Alauda AI and Alauda AI Workbench must be installed in your cluster.
- A Workbench (Notebook) instance with:
- Access to install Python packages from the internet (or a configured internal PyPI mirror).
- GPU resources attached (at least one NVIDIA GPU).
- Sufficient shared storage for model checkpoints.
- A HuggingFace model (local path or model name resolvable from the instance).
- Training data in JSONL format (see Data Format below).
Data Format
Training data must be a JSON Lines (.jsonl) file where each line is a conversation:
Supported role values: system, user, assistant, pretraining.
Masking behavior:
- SFT (default) — only assistant responses contribute to the training loss. Add
"unmask": trueto a sample to include all non-system content in the loss (pretraining style). - OSFT — controlled via the
unmask_messagesparameter (Falseby default; setTruefor pretraining style).
Pre-processed datasets with input_ids and labels fields are also supported via use_processed_dataset=True.
Download Notebooks and Run Examples
Two comprehensive tutorial notebooks are provided. Download them to your Workbench instance and execute them cell by cell.
Step 1 — Install Dependencies
Open a terminal in your Workbench instance and install training-hub:
Step 2 — Upload or Prepare Data
Place your .jsonl training file in a path accessible to the notebook, for example /data/train.jsonl.
Step 3 — Open and Configure the Notebook
Open the downloaded notebook in your Workbench instance. The key cells to configure are:
Select your model (both notebooks):
Bundled model presets cover Qwen 2.5 7B, Llama 3.1 8B, Phi 4 Mini, and generic 7B/small models.
Set required paths (both notebooks):
OSFT only — set the orthogonality ratio:
Select distributed configuration:
Step 4 — Execute Training
Run all cells in sequence. The final training cell calls either:
Checkpoints are written to ckpt_output_dir at the end of each epoch (configurable via checkpoint_at_epoch).
Key Parameters
Common Parameters (SFT and OSFT)
OSFT-specific Parameters
Multi-node Training
For multi-node jobs, run the notebook (or equivalent script) on every node simultaneously with matching rdzv_id and rdzv_endpoint, varying only node_rank per node:
All nodes must have network connectivity to the rdzv_endpoint before training begins.