BLOOM
by BigScience
Multilingual open-source LLM trained on 46 languages by a global research collaboration
Visit Product
261 upvotes
4,768 views
About
BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is a 176-billion-parameter open-access multilingual language model trained by the BigScience Workshop — a collaborative research project involving over 1,000 researchers from 60+ countries and 250+ institutions. Released in 2022, it was at the time the largest open-source language model ever created.
What makes BLOOM unique is its multilingual focus: trained on 46 natural languages and 13 programming languages, it offers strong performance across languages that are severely underrepresented in most LLMs — including Arabic, Bengali, Swahili, Urdu, and numerous others. This makes it particularly valuable for applications serving non-English-speaking populations.
BLOOM's fully open training process — with published dataset, training code, and model weights — represents a landmark in AI transparency. The model demonstrated that the research community could collectively train frontier-class models outside of large technology companies, establishing a template for collaborative open AI development.
What makes BLOOM unique is its multilingual focus: trained on 46 natural languages and 13 programming languages, it offers strong performance across languages that are severely underrepresented in most LLMs — including Arabic, Bengali, Swahili, Urdu, and numerous others. This makes it particularly valuable for applications serving non-English-speaking populations.
BLOOM's fully open training process — with published dataset, training code, and model weights — represents a landmark in AI transparency. The model demonstrated that the research community could collectively train frontier-class models outside of large technology companies, establishing a template for collaborative open AI development.
Product Features
- 176 billion parameter open-access model
- 46 natural languages including underrepresented ones
- 13 programming languages supported
- Open weights available on Hugging Face
- Text generation, summarization, translation, and more
- Instruction-tuned variants (BLOOMZ, mT0)
- Cross-lingual transfer capabilities
- Available via Hugging Face Inference API
- Fine-tunable on custom datasets
- Fully documented training process and data
- 46 natural languages including underrepresented ones
- 13 programming languages supported
- Open weights available on Hugging Face
- Text generation, summarization, translation, and more
- Instruction-tuned variants (BLOOMZ, mT0)
- Cross-lingual transfer capabilities
- Available via Hugging Face Inference API
- Fine-tunable on custom datasets
- Fully documented training process and data
About the Publisher
BigScience was a collaborative research workshop organized by Hugging Face with support from GENCI and IDRIS, France's national supercomputing centers. The project ran from May 2021 to May 2022 and involved over 1,000 volunteer researchers from academia and industry worldwide. Its goal was to demonstrate that open, collaborative, and transparent large-scale AI research was possible. BLOOM was the flagship output and has been downloaded millions of times, serving as a foundation for multilingual AI research globally.