Pretraining and Finetuning MosaicML Models
We run large language model (LLM) pretraining and finetuning end-to-end using Paperspace by DigitalOcean’s multinode machines with H100 GPUs. 4 nodes of H100×8 GPUs provide up to 127 petaFLOPS of compute power, enabling us to pretrain or finetune full-size state-of-the-art…