cannot reproduce siqa numbers

hello @OmkarThawakar , I used the LLM360 Analysis repo to run eval for siqa task:

`python Analysis360/eval/harness/main.py --device cuda:0 --model=hf-causal-experimental --batch_size=auto:1 --model_args="pretrained=MBZUAI/MobiLlama-05B,trust_remote_code=True,dtype=bfloat16" --tasks=social_iqa --num_fewshot=0 --output_path=Analysis360-MobiLlama-05B.json`

it only gives 0.3327, which is close to random numbers, since there are only three choices.

|  Tasks   |Version|Filter|n-shot|Metric|Value |   |Stderr|
|----------|------:|------|-----:|------|-----:|---|-----:|
|social_iqa|      0|none  |     0|acc   |0.3327|±  |0.0107|


Could you share how you ran the siqa evaluation? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cannot reproduce siqa numbers #16

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

cannot reproduce siqa numbers #16

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions