We suggest using dtensor_weight_loader which gather the full model parameter layer by layer to reduce the peak memory usage. While for hf_weight_loader, users can directly. Verl 是一个灵活、高效且可用于生产环境的强化学习(rl)训练框架,专为大型语言模型(llms)的后训练设计。 它由字节跳动火山引擎团队开源,是 hybridflow 论文的开源.
While for hf_weight_loader, users can directly. We already support dtensor weight loader. Users only need to implement the corresponding dtensor_weight_loader for weight synchronization between fsdp and vllm.