Llama 3.1

The Llama 3.1 model family includes the following sizes:

  • 8B
  • 70B
  • 405B

Llama 3.1 405B stands out as the first open-access model that matches the top-tier AI models in areas like general knowledge, adaptability, mathematics, tool utilization, and multilingual translation.

The upgraded 8B and 70B models are multilingual, feature an extended context length of 128K, and offer advanced tool integration and stronger reasoning abilities. These improvements enable Meta’s latest models to excel in use cases like long-form text summarization, multilingual conversational agents, and coding assistants.

Meta has also updated its licensing terms, allowing developers to leverage Llama model outputs, including those from the 405B model, to enhance other models.

Model Evaluations

Meta evaluated Llama 3.1’s performance on more than 150 benchmark datasets covering a wide range of languages. Extensive human evaluations were also conducted, comparing Llama 3.1 to competing models in real-world scenarios. Results show that the flagship 405B model is competitive with top foundation models, including GPT-4, GPT-4o, and Claude 3.5 Sonnet, across various tasks.

Additionally, Meta’s smaller models hold their own against both open and closed models with a comparable number of parameters.

CategoryBenchmarkLlama 3.1 8BLlama 3 8B – AprilLlama 3.1 70BLlama 3 70B – AprilLlama 3.1 405B
GeneralMMLU73.065.386.080.988.6
MMLU PRO (5-shot, CoT)48.345.566.463.473.3
IFEval80.476.887.582.988.6
CodeHumanEval (0-shot)72.660.480.581.789.0
MBPP EvalPlus (base, 0-shot)72.870.686.082.588.6
MathGSM8K (8-shot, CoT)84.580.695.193.096.8
MATH (0-shot, CoT)51.929.168.051.073.8
ReasoningARC Challenge (0-shot)83.482.494.894.496.9
GPQA (0-shot, CoT)32.834.646.739.551.1
Tool UseAPI-Bank (0-shot)82.648.390.085.192.3
BFCL76.160.384.883.088.5
Gorilla Benchmark API Bench8.21.729.714.735.3
Nexus (0-shot)38.518.156.747.858.7
MultilingualMultilingual MGSM68.986.991.6
Llama 3.1 Model Evaluations

This table summarizes the performance of the Llama 3.1 and Llama 3 models across various benchmarks and tasks.

Llama 3.1 Model Evaluations

How to run Llama 3.1 using Ollama

Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes. Run Llama 3.1 using Ollama:

ollama run llama3.1