Mistral - Ollama

Mistral is a 7B parameter model, distributed with the Apache license. It is available in both instruct (instruction following) and text completion.

The Mistral AI team has noted that Mistral 7B:

Outperforms Llama 2 13B on all benchmarks
Outperforms Llama 1 34B on many benchmarks
Approaches CodeLlama 7B performance on code, while remaining good at English tasks

Performance in details

We compared Mistral 7B to the Llama 2 family, and re-run all model evaluations ourselves for fair comparison.

The performance comparison between Mistral 7B and various Llama models was evaluated across multiple benchmarks using a standardized evaluation pipeline for accurate results. Mistral 7B significantly outperformed Llama 2 13B on all metrics and performed on par with the Llama 34B model (since Llama 2 34B was not released, the Llama 34B results were used). Additionally, Mistral 7B demonstrated superior performance in coding and reasoning tasks.

The benchmarks were grouped based on the following themes:

Commonsense Reasoning: 0-shot average from Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
World Knowledge: 5-shot average from NaturalQuestions and TriviaQA.
Reading Comprehension: 0-shot average from BoolQ and QuAC.
Math: Averaged results from 8-shot GSM8K (maj@8) and 4-shot MATH (maj@4).
Code: Averaged results from 0-shot Humaneval and 3-shot MBPP.
Aggregated Popular Results: 5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions).

Run Mistral

ollama run mistral