hi, thanks for your great works. I have some issues.
Regarding the training setup of 40 A100 GPUs for 10,000 iterations, could you share the estimated runtime for a full experiment? It is very important for our subsequent work. I might have missed it, but I didn't see the training duration mentioned in the paper. This is crucial for our follow-up research.
hi, thanks for your great works. I have some issues.
Regarding the training setup of 40 A100 GPUs for 10,000 iterations, could you share the estimated runtime for a full experiment? It is very important for our subsequent work. I might have missed it, but I didn't see the training duration mentioned in the paper. This is crucial for our follow-up research.