Skip to content

bug: TypeError in collator.py — _slice_mm_inputs_for_sample called with sample_idx= but param is batch_idx #10497

Description

@qizwiz

Bug

_slice_mm_inputs_for_sample is defined with batch_idx as its 4th positional parameter, but the call at line 263 passes it as sample_idx=sample_idx (wrong keyword name). This raises a TypeError on any multimodal training run where num_sub_seqs <= 1.

Affected file

src/llamafactory/data/collator.py, line 263

Root cause

# Function definition (line 43):
def _slice_mm_inputs_for_sample(
    mm_inputs: dict[str, Any],
    batch_imglens: list[int],
    batch_vidlens: list[int],
    batch_idx: int,          # <-- parameter name is batch_idx
    ...
) -> dict[str, Any]:

# Buggy call (line 263) — wrong keyword name:
mm_inputs_for_sample = _slice_mm_inputs_for_sample(
    mm_inputs, batch_imglens, batch_vidlens, sample_idx=sample_idx  # ← TypeError here
)

# Correct call (line 281) — passes positionally:
mm_inputs_for_subseq = _slice_mm_inputs_for_sample(
    mm_inputs, batch_imglens, batch_vidlens, sample_idx, ...  # ← correct
)

Fix

Change line 264 from:

mm_inputs, batch_imglens, batch_vidlens, sample_idx=sample_idx

to either:

mm_inputs, batch_imglens, batch_vidlens, batch_idx=sample_idx

or positionally (matching the line 281 pattern):

mm_inputs, batch_imglens, batch_vidlens, sample_idx

Impact

Any multimodal training run that hits the num_sub_seqs <= 1 branch (the common non-packing case) will crash with:

TypeError: _slice_mm_inputs_for_sample() got an unexpected keyword argument 'sample_idx'

Detected by pact-tool, a static analysis tool for Python.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions