Bug
_slice_mm_inputs_for_sample is defined with batch_idx as its 4th positional parameter, but the call at line 263 passes it as sample_idx=sample_idx (wrong keyword name). This raises a TypeError on any multimodal training run where num_sub_seqs <= 1.
Affected file
src/llamafactory/data/collator.py, line 263
Root cause
# Function definition (line 43):
def _slice_mm_inputs_for_sample(
mm_inputs: dict[str, Any],
batch_imglens: list[int],
batch_vidlens: list[int],
batch_idx: int, # <-- parameter name is batch_idx
...
) -> dict[str, Any]:
# Buggy call (line 263) — wrong keyword name:
mm_inputs_for_sample = _slice_mm_inputs_for_sample(
mm_inputs, batch_imglens, batch_vidlens, sample_idx=sample_idx # ← TypeError here
)
# Correct call (line 281) — passes positionally:
mm_inputs_for_subseq = _slice_mm_inputs_for_sample(
mm_inputs, batch_imglens, batch_vidlens, sample_idx, ... # ← correct
)
Fix
Change line 264 from:
mm_inputs, batch_imglens, batch_vidlens, sample_idx=sample_idx
to either:
mm_inputs, batch_imglens, batch_vidlens, batch_idx=sample_idx
or positionally (matching the line 281 pattern):
mm_inputs, batch_imglens, batch_vidlens, sample_idx
Impact
Any multimodal training run that hits the num_sub_seqs <= 1 branch (the common non-packing case) will crash with:
TypeError: _slice_mm_inputs_for_sample() got an unexpected keyword argument 'sample_idx'
Detected by pact-tool, a static analysis tool for Python.
Bug
_slice_mm_inputs_for_sampleis defined withbatch_idxas its 4th positional parameter, but the call at line 263 passes it assample_idx=sample_idx(wrong keyword name). This raises aTypeErroron any multimodal training run wherenum_sub_seqs <= 1.Affected file
src/llamafactory/data/collator.py, line 263Root cause
Fix
Change line 264 from:
to either:
or positionally (matching the line 281 pattern):
Impact
Any multimodal training run that hits the
num_sub_seqs <= 1branch (the common non-packing case) will crash with:Detected by pact-tool, a static analysis tool for Python.