Introducing Unsupervised and Elastic Training on Amazon SageMaker HyperPod | Amazon Web Services
Today, we’re announcing two new AI model training features within Amazon SageMaker HyperPod: checkpoint-free training, an approach that alleviates the need for traditional checkpoint-based recovery by enabling peer-to-peer state recovery, and elastic training, which enables automatic scaling of AI workloads based on resource availability. Training without checkpoints – Training without checkpoints eliminates disruptive checkpoint restart … Read more