Petals

Petals

by BigScience

Run large AI models collaboratively in a distributed BitTorrent-style network

Open Source Neural Network API Python Linux
Visit Product
250 upvotes 635 views

About

Petals is an experimental platform for running large language models in a distributed manner, where different participants contribute GPU resources and the model's layers are split across many machines in a peer-to-peer network. This BitTorrent-inspired approach makes it possible to run models that would normally require a single expensive server by pooling the GPU memory of many volunteers across the internet.

The system is particularly interesting for models that are too large to fit on a single consumer GPU. By loading and running different "shards" (groups of layers) on different machines simultaneously, Petals enables inference on large models like BLOOM 176B through a collaborative network. Users connect to the network, load their portion of the model, and participate in serving inference requests.

Petals is primarily a research project exploring distributed inference for large models, demonstrating that AI compute doesn't have to be centralized in large data centers. While not production-ready for commercial applications, it provides a fascinating glimpse into alternative AI infrastructure models and has informed research on model parallelism and efficient distributed inference.

Product Features

- Distributed LLM inference across peer network
- Support for BLOOM 176B and other large models
- BitTorrent-style model layer distribution
- Python client for connecting to the network
- Collaborative GPU sharing from any machine
- Low barrier to join as a compute contributor
- Interactive generation with low per-token latency
- Support for fine-tuning distributed across the network
- Research platform for distributed AI experiments
- Open-source with academic paper documentation

About the Publisher

Petals was developed by researchers at HSE University, Yandex, and other institutions building on the BigScience ecosystem. It emerged as an experimental project to explore decentralized approaches to running AI models that challenge the assumption that large models require expensive centralized infrastructure. The project has been covered in major AI research venues and continues as a research initiative into distributed AI inference.