Skip to main content

Open-weight models

We open-source both pre-trained models and fine-tuned models. These models are not tuned for safety as we want to empower users to test and refine moderation based on their use cases. For safer models, follow our guardrailing tutorial.

ModelOpen-weightAPIDescriptionMax TokensEndpoint
Mistral 7B✔️
Apache2
✔️The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our blog post32kopen-mistral-7b
(aka mistral-tiny-2312)
Mixtral 8x7B✔️
Apache2
✔️A sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated blog post32kopen-mixtral-8x7b
(aka mistral-small-2312)
Mixtral 8x22B✔️
Apache2
✔️A bigger sparse mixture of experts model with larger context window. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated blog post64kopen-mixtral-8x22b
Codestral✔️
MNPL
✔️A cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion32kcodestral-latest
Codestral-Mamba✔️✔️A Mamba 2 language model specialized in code generation. Learn more on our blog post256kcodestral-mamba-latest
Mathstral✔️✔️A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our blog post32kNA

License

Downloading

ModelDownload linksFeatures
Mistral-7B-v0.1Hugging Face
raw_weights (md5sum: 37dab53973db2d56b2da0a033a15307f)
- 32k vocabulary size
- Rope Theta = 1e4
- With sliding window
Mistral-7B-Instruct-v0.2Hugging Face
raw_weights (md5sum: fbae55bc038f12f010b4251326e73d39)
- 32k vocabulary size
- Rope Theta = 1e6
- No sliding window
Mistral-7B-v0.3Hugging Face
raw_weights (md5sum: 0663b293810d7571dad25dae2f2a5806)
- Extended vocabulary to 32768
Mistral-7B-Instruct-v0.3Hugging Face
raw_weights (md5sum: 80b71fcb6416085bcb4efad86dfb4d52)
- Extended vocabulary to 32768
- Supports v3 Tokenizer
- Supports function calling
Mixtral-8x7B-v0.1Hugging Face- 32k vocabulary size
- Rope Theta = 1e6
Mixtral-8x7B-Instruct-v0.1Hugging Face
raw_weights (md5sum: 8e2d3930145dc43d3084396f49d38a3f)
- 32k vocabulary size
- Rope Theta = 1e6
Mixtral-8x7B-v0.3Updated model coming soon!- Extended vocabulary to 32768
- Supports v3 Tokenizer
Mixtral-8x7B-Instruct-v0.3Updated model coming soon!- Extended vocabulary to 32768
- Supports v3 Tokenizer
- Supports function calling
Mixtral-8x22B-v0.1Hugging Face
raw_weights (md5sum: 0535902c85ddbb04d4bebbf4371c6341)
- 32k vocabulary size
Mixtral-8x22B-Instruct-v0.1/
Mixtral-8x22B-Instruct-v0.3
Hugging Face
raw_weights (md5sum: 471a02a6902706a2f1e44a693813855b)
- 32768 vocabulary size
Mixtral-8x22B-v0.3raw_weights (md5sum: a2fa75117174f87d1197e3a4eb50371a)- 32768 vocabulary size
- Supports v3 Tokenizer
Codestral-22B-v0.1Hugging Face
raw_weights (md5sum: 1ea95d474a1d374b1d1b20a8e0159de3)
- 32768 vocabulary size
- Supports v3 Tokenizer
Codestral-Mamba-7B-v0.1Hugging Face
raw_weights(md5sum: d3993e4024d1395910c55db0d11db163)
- 32768 vocabulary size
- Supports v3 Tokenizer
Mathstral-7B-v0.1Hugging Face
raw_weights(md5sum: 5f05443e94489c261462794b1016f10b)
- 32768 vocabulary size
- Supports v3 Tokenizer

Sizes

NameNumber of parametersNumber of active parametersMin. GPU RAM for inference (GB)
Mistral-7B-v0.37.3B7.3B16
Mixtral-8x7B-v0.146.7B12.9B100
Mixtral-8x22B-v0.3140.6B39.1B300
Codestral-22B-v0.122.2B22.2B60
Codestral-Mamba-7B-v0.17.3B7.3B16
Mathstral-7B-v0.17.3B7.3B16

How to run?

Check out mistral-inference, a Python package for running our models. You can install mistral-inference by

pip install mistral-inference

To learn more about how to use mistral-inference, take a look at the README and dive into this colab notebook to get started:

Open In Colab