Skip to content

Document Screenshot Embedding (DSE) - Qwen & Phi - Issue with training the model, says _save function doesn't support DSEModel #194

Description

@akashmadisetty

Hi,
I have tried working on DSE - Qwen and Phi versions, both of them, but not able to train the models, as I am facing this issue with saving the model. trainer.train() function when it goes inside it has supported only EncoderModel, and to change this I will have to change modeling file as well. I am trying this without using Deepseed. Is there any fix that can be applied?

I have attached the error I am getting.
Hoping this issue gets resolved soon!

2025-06-20 10:16:58,548 - INFO - Traceback (most recent call last):
2025-06-20 10:16:58,548 - INFO - File "/tevatron/examples/dse/train.py", line 91, in <module>
2025-06-20 10:16:58,548 - INFO - main()
2025-06-20 10:16:58,548 - INFO - File "/tevatron/examples/dse/train.py", line 84, in main
2025-06-20 10:16:58,548 - INFO - trainer.train()  # TODO: resume training
2025-06-20 10:16:58,549 - INFO - ^^^^^^^^^^^^^^^
2025-06-20 10:16:58,549 - INFO - File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 2240, in train
2025-06-20 10:16:58,549 - INFO - return inner_training_loop(
2025-06-20 10:16:58,549 - INFO - ^^^^^^^^^^^^^^^^^^^^
2025-06-20 10:16:58,549 - INFO - File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 2622, in _inner_training_loop
2025-06-20 10:16:58,549 - INFO - self._maybe_log_save_evaluate(
2025-06-20 10:16:58,549 - INFO - File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 3102, in _maybe_log_save_evaluate
2025-06-20 10:16:58,549 - INFO - self._save_checkpoint(model, trial)
2025-06-20 10:16:58,549 - INFO - File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 3199, in _save_checkpoint
2025-06-20 10:16:58,549 - INFO - self.save_model(output_dir, _internal_call=True)
2025-06-20 10:16:58,549 - INFO - File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 3911, in save_model
2025-06-20 10:16:58,549 - INFO - self._save(output_dir)
2025-06-20 10:16:58,549 - INFO - File "/tevatron/src/tevatron/retriever/trainer.py", line 30, in _save
2025-06-20 10:16:58,549 - INFO - raise ValueError(f"Unsupported model class {self.model}")
2025-06-20 10:16:58,549 - INFO - ValueError: Unsupported model class DSEModel(
2025-06-20 10:16:58,549 - INFO - (encoder): PeftModel(
2025-06-20 10:16:58,549 - INFO - (base_model): LoraModel(
2025-06-20 10:16:58,549 - INFO - (model): Phi3VForCausalLM(
2025-06-20 10:16:58,549 - INFO - (model): Phi3VModel(
2025-06-20 10:16:58,549 - INFO - (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
2025-06-20 10:16:58,549 - INFO - (embed_dropout): Dropout(p=0.0, inplace=False)
2025-06-20 10:16:58,549 - INFO - (vision_embed_tokens): Phi3ImageEmbedding(
2025-06-20 10:16:58,549 - INFO - (drop): Dropout(p=0.0, inplace=False)
2025-06-20 10:16:58,549 - INFO - (wte): Embedding(32064, 3072, padding_idx=32000)
2025-06-20 10:16:58,549 - INFO - (img_processor): CLIPVisionModel(
2025-06-20 10:16:58,549 - INFO - (vision_model): CLIPVisionTransformer(
2025-06-20 10:16:58,549 - INFO - (embeddings): CLIPVisionEmbeddings(
2025-06-20 10:16:58,549 - INFO - (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)
2025-06-20 10:16:58,549 - INFO - (position_embedding): Embedding(577, 1024)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
2025-06-20 10:16:58,550 - INFO - (encoder): CLIPEncoder(
2025-06-20 10:16:58,550 - INFO - (layers): ModuleList(
2025-06-20 10:16:58,550 - INFO - (0-23): 24 x CLIPEncoderLayer(
2025-06-20 10:16:58,550 - INFO - (layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
2025-06-20 10:16:58,550 - INFO - (mlp): CLIPMLP(
2025-06-20 10:16:58,550 - INFO - (activation_fn): QuickGELUActivation()
2025-06-20 10:16:58,550 - INFO - (fc1): Linear(in_features=1024, out_features=4096, bias=True)
2025-06-20 10:16:58,550 - INFO - (fc2): Linear(in_features=4096, out_features=1024, bias=True)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
2025-06-20 10:16:58,550 - INFO - (self_attn): CLIPAttentionFA2(
2025-06-20 10:16:58,550 - INFO - (k_proj): lora.Linear(
2025-06-20 10:16:58,550 - INFO - (base_layer): Linear(in_features=1024, out_features=1024, bias=True)
2025-06-20 10:16:58,550 - INFO - (lora_dropout): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Dropout(p=0.1, inplace=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_A): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Linear(in_features=1024, out_features=16, bias=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_B): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Linear(in_features=16, out_features=1024, bias=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_embedding_A): ParameterDict()
2025-06-20 10:16:58,550 - INFO - (lora_embedding_B): ParameterDict()
2025-06-20 10:16:58,550 - INFO - (lora_magnitude_vector): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): lora.dora.DoraLinearLayer()
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (v_proj): lora.Linear(
2025-06-20 10:16:58,550 - INFO - (base_layer): Linear(in_features=1024, out_features=1024, bias=True)
2025-06-20 10:16:58,550 - INFO - (lora_dropout): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Dropout(p=0.1, inplace=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_A): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Linear(in_features=1024, out_features=16, bias=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_B): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Linear(in_features=16, out_features=1024, bias=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_embedding_A): ParameterDict()
2025-06-20 10:16:58,550 - INFO - (lora_embedding_B): ParameterDict()
2025-06-20 10:16:58,550 - INFO - (lora_magnitude_vector): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): lora.dora.DoraLinearLayer()
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (q_proj): lora.Linear(
2025-06-20 10:16:58,550 - INFO - (base_layer): Linear(in_features=1024, out_features=1024, bias=True)
2025-06-20 10:16:58,550 - INFO - (lora_dropout): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Dropout(p=0.1, inplace=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_A): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Linear(in_features=1024, out_features=16, bias=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_B): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): Linear(in_features=16, out_features=1024, bias=False)
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,550 - INFO - (lora_embedding_A): ParameterDict()
2025-06-20 10:16:58,550 - INFO - (lora_embedding_B): ParameterDict()
2025-06-20 10:16:58,550 - INFO - (lora_magnitude_vector): ModuleDict(
2025-06-20 10:16:58,550 - INFO - (default): lora.dora.DoraLinearLayer()
2025-06-20 10:16:58,550 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (out_proj): lora.Linear(
2025-06-20 10:16:58,551 - INFO - (base_layer): Linear(in_features=1024, out_features=1024, bias=True)
2025-06-20 10:16:58,551 - INFO - (lora_dropout): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Dropout(p=0.1, inplace=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_A): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Linear(in_features=1024, out_features=16, bias=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_B): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Linear(in_features=16, out_features=1024, bias=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_embedding_A): ParameterDict()
2025-06-20 10:16:58,551 - INFO - (lora_embedding_B): ParameterDict()
2025-06-20 10:16:58,551 - INFO - (lora_magnitude_vector): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): lora.dora.DoraLinearLayer()
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (img_projection): Sequential(
2025-06-20 10:16:58,551 - INFO - (0): Linear(in_features=4096, out_features=3072, bias=True)
2025-06-20 10:16:58,551 - INFO - (1): GELU(approximate='none')
2025-06-20 10:16:58,551 - INFO - (2): Linear(in_features=3072, out_features=3072, bias=True)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (layers): ModuleList(
2025-06-20 10:16:58,551 - INFO - (0-31): 32 x Phi3DecoderLayer(
2025-06-20 10:16:58,551 - INFO - (self_attn): Phi3FlashAttention2(
2025-06-20 10:16:58,551 - INFO - (o_proj): lora.Linear(
2025-06-20 10:16:58,551 - INFO - (base_layer): Linear(in_features=3072, out_features=3072, bias=False)
2025-06-20 10:16:58,551 - INFO - (lora_dropout): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Dropout(p=0.1, inplace=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_A): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Linear(in_features=3072, out_features=16, bias=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_B): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Linear(in_features=16, out_features=3072, bias=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_embedding_A): ParameterDict()
2025-06-20 10:16:58,551 - INFO - (lora_embedding_B): ParameterDict()
2025-06-20 10:16:58,551 - INFO - (lora_magnitude_vector): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): lora.dora.DoraLinearLayer()
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (qkv_proj): lora.Linear(
2025-06-20 10:16:58,551 - INFO - (base_layer): Linear(in_features=3072, out_features=9216, bias=False)
2025-06-20 10:16:58,551 - INFO - (lora_dropout): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Dropout(p=0.1, inplace=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_A): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Linear(in_features=3072, out_features=16, bias=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_B): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): Linear(in_features=16, out_features=9216, bias=False)
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (lora_embedding_A): ParameterDict()
2025-06-20 10:16:58,551 - INFO - (lora_embedding_B): ParameterDict()
2025-06-20 10:16:58,551 - INFO - (lora_magnitude_vector): ModuleDict(
2025-06-20 10:16:58,551 - INFO - (default): lora.dora.DoraLinearLayer()
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (rotary_emb): Phi3SuScaledRotaryEmbedding()
2025-06-20 10:16:58,551 - INFO - )
2025-06-20 10:16:58,551 - INFO - (mlp): Phi3MLP(
2025-06-20 10:16:58,551 - INFO - (gate_up_proj): lora.Linear(
2025-06-20 10:16:58,551 - INFO - (base_layer): Linear(in_features=3072, out_features=16384, bias=False)
2025-06-20 10:16:58,551 - INFO - (lora_dropout): ModuleDict(
2025-06-20 10:16:58,552 - INFO - (default): Dropout(p=0.1, inplace=False)
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (lora_A): ModuleDict(
2025-06-20 10:16:58,552 - INFO - (default): Linear(in_features=3072, out_features=16, bias=False)
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (lora_B): ModuleDict(
2025-06-20 10:16:58,552 - INFO - (default): Linear(in_features=16, out_features=16384, bias=False)
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (lora_embedding_A): ParameterDict()
2025-06-20 10:16:58,552 - INFO - (lora_embedding_B): ParameterDict()
2025-06-20 10:16:58,552 - INFO - (lora_magnitude_vector): ModuleDict(
2025-06-20 10:16:58,552 - INFO - (default): lora.dora.DoraLinearLayer()
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (down_proj): lora.Linear(
2025-06-20 10:16:58,552 - INFO - (base_layer): Linear(in_features=8192, out_features=3072, bias=False)
2025-06-20 10:16:58,552 - INFO - (lora_dropout): ModuleDict(
2025-06-20 10:16:58,552 - INFO - (default): Dropout(p=0.1, inplace=False)
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (lora_A): ModuleDict(
2025-06-20 10:16:58,552 - INFO - (default): Linear(in_features=8192, out_features=16, bias=False)
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (lora_B): ModuleDict(
2025-06-20 10:16:58,552 - INFO - (default): Linear(in_features=16, out_features=3072, bias=False)
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (lora_embedding_A): ParameterDict()
2025-06-20 10:16:58,552 - INFO - (lora_embedding_B): ParameterDict()
2025-06-20 10:16:58,552 - INFO - (lora_magnitude_vector): ModuleDict(
2025-06-20 10:16:58,552 - INFO - (default): lora.dora.DoraLinearLayer()
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (activation_fn): SiLU()
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (input_layernorm): Phi3RMSNorm()
2025-06-20 10:16:58,552 - INFO - (resid_attn_dropout): Dropout(p=0.0, inplace=False)
2025-06-20 10:16:58,552 - INFO - (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
2025-06-20 10:16:58,552 - INFO - (post_attention_layernorm): Phi3RMSNorm()
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (norm): Phi3RMSNorm()
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (lm_head): Linear(in_features=3072, out_features=32064, bias=False)
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - )
2025-06-20 10:16:58,552 - INFO - (cross_entropy): CrossEntropyLoss()
2025-06-20 10:16:58,552 - INFO - )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions