Meta's PyTorch team has unveiled Monarch, an open-source framework designed to simplify distributed AI workflows across multiple GPUs and machines. The system introduces a single-controller model that allows one script to coordinate computation across an entire cluster, reducing the complexity of large-scale training and reinforcement learning tasks without changing how developers write standard PyTorch code. Monarch replaces the traditional multi-controller approach, in which multiple copies of the same script run independently across machines, with a single-controller model.
The PyTorch team at Meta, stewards of the PyTorch open source machine learning framework, has unveiled Monarch, a distributed programming framework intended to bring the simplicity of PyTorch to entire clusters. Monarch pairs a Python-based front end, supporting integration with existing code and libraries such as PyTorch, and a Rust-based back end, which facilitates performance, scalability, and robustness, the team said. .
For Developers: * Never use pickle for untrusted data: This cannot be emphasized enough. * Never assume checkpoint files are safe: Checkpoint deserialization is vulnerable to supply chain attacks. * Always use weights_only=True when using PyTorch's load functions. * Restrict to trusted classes: Restrict deserialization to only trusted classes. * Implement defense in depth: Don't rely on a single security measure. * Consider alternative formats: Safetensors, ONNX, or other secure serialization formats should all be considered.