Skip to main content

Introducing FireOptimizer, an adaptation engine to customize latency and quality for production inference. Learn more

Meta logo

Meta Llama 3.1 405B

One of the largest open-source models, featuring a 128K context length, support for 8 languages, and powerful tool-calling capabilities, offering a more efficient and customizable alternative to GPT-4o, available as a production-grade API.

Try It Now

Featured Models

These models are deployed for industry-leading speeds to excel at production tasks

Language Models

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes. The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. 405B model is the most capable from the Llama 3.1 family. This model is served in FP8 closely matching reference implementation.accounts/fireworks/models/llama-v3p1-405b-instructServerlessContext 131,072

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes. The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.accounts/fireworks/models/llama-v3p1-70b-instructServerlessContext 131,072

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes. The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.accounts/fireworks/models/llama-v3p1-8b-instructServerlessContext 131,072

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.accounts/fireworks/models/llama-v3-70b-instructServerlessContext 8,192

Mixtral MoE 8x22B Instruct v0.1 is the instruction-tuned version of Mixtral MoE 8x22B v0.1 and has the chat completions API enabled.accounts/fireworks/models/mixtral-8x22b-instructServerlessContext 65,536

Mixtral MoE 8x7B Instruct is the instruction-tuned version of Mixtral MoE 8x7B and has the chat completions API enabled.accounts/fireworks/models/mixtral-8x7b-instructServerlessContext 32,768

Fireworks' latest and most performant function-calling model. Firefunction-v2 is based on Llama-3 and trained to excel at function-calling as well as chat and instruction-following. See blog post for more details https://fireworks.ai/blog/firefunction-v2-launch-postaccounts/fireworks/models/firefunction-v2ServerlessContext Unknown

Fireworks' open-sourced LLaVA vision-language model trained on OSS LLM generated instruction following data. See more details on the blog post here: https://fireworks.ai/blog/firellava-the-first-commercially-permissive-oss-llava-modelaccounts/fireworks/models/firellava-13bContext 4,096

(chronos-13b-v2 + Nous-Hermes-Llama2-13b) 75/25 merge. This offers the imaginative writing style of chronos while still retaining coherency and being capable. Outputs are long and utilize exceptional prose.accounts/fireworks/models/chronos-hermes-13b-v2Context 4,096

CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.accounts/fireworks/models/codegemma-2bContext 8,192

CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.accounts/fireworks/models/codegemma-7bContext 8,192

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the base 13B version.accounts/fireworks/models/code-llama-13bContext 32,768

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the 13B instruction-tuned version.accounts/fireworks/models/code-llama-13b-instructContext 32,768

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the 13B Python specialist version.accounts/fireworks/models/code-llama-13b-pythonContext 32,768

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the 34B base version.accounts/fireworks/models/code-llama-34bContext 32,768

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the 34B instruction-tuned version.accounts/fireworks/models/code-llama-34b-instructContext 32,768

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the 34B Python specialist version.accounts/fireworks/models/code-llama-34b-pythonContext 32,768

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the 70B Base version.accounts/fireworks/models/code-llama-70bContext 16,384

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the 70B instruction-tuned version.accounts/fireworks/models/code-llama-70b-instructContext 4,096

Code Llama is a collection of pretrained and fine-tuned Large Language Models ranging in scale from 7 billion to 70 billion parameters, specializing in using both code and natural language prompts to generate code and natural language about code. This is the 70B Python specialist version.accounts/fireworks/models/code-llama-70b-pythonContext 4,096

Image Models

All currently deployed image models.

Image generation model, produced by stability.ai.accounts/fireworks/models/stable-diffusion-xl-1024-v1-0Serverless

The most capable text-to-image model produced by stability.ai, with greatly improved performance in multi-subject prompts, image quality, and spelling abilities. The Stable Diffusion 3 API is provided by Stability and the model is powered by Fireworks. Unlike other models on the Fireworks playground, you'll need a Stability API key to use this model. To use the API directly, visit https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1sd3/postaccounts/stability/models/sd3Serverless

2 billion parameter SD3 model with optimized performance and excels in areas where previous models struggled, such as photorealism and typography. The Stable Diffusion 3 API is provided by Stability and the model is powered by Fireworks. Unlike other models on the Fireworks playground, you'll need a Stability API key to use this model. To use the API directly, visit https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1sd3/postaccounts/stability/models/sd3-mediumServerless

Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at playground.com.accounts/fireworks/models/playground-v2-1024px-aestheticServerless

Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2.accounts/fireworks/models/playground-v2-5-1024px-aestheticServerless

Image generation model. Distilled from Stable Diffusion XL 1.0 and 50% smaller.accounts/fireworks/models/SSD-1BServerless

Japanese Stable Diffusion XL (JSDXL) is a Japanese-specific SDXL model that is capable of inputting prompts in Japanese and generating Japanese-style images.accounts/fireworks/models/japanese-stable-diffusion-xlServerless

Distilled, few-step version of Stable Diffusion 3, the newest image generation model from Stability AI, which is equal to or outperforms state-of-the-art text-to-image generation systems such as DALL-E 3 and Midjourney v6 in typography and prompt adherence, based on human preference evaluations. Stability AI has partnered with Fireworks AI, the fastest and most reliable API platform in the market, to deliver Stable Diffusion 3 and Stable Diffusion 3 Turbo. To use the API directly, visit https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1sd3/postaccounts/stability/models/sd3-turboServerless