Models Directory

Browse all 45 available language models and their capabilities

Free Models17

These models are available to all users without any subscription or pay-as-you-go charges.

liquid/lfm-7b

LFM-7B, a new best-in-class language model. LFM-7B is designed for exceptional chat capabilities, including languages like Arabic and Japanese. Powered by the Liquid Foundation Model (LFM) architecture, it exhibits unique features like low memory footprint and fast inference speed.

LFM-7B is the world’s best-in-class multilingual language model in English, Arabic, and Japanese.

See the launch announcement for benchmarks and more info.

Context: 32768 tokens

Max output: N/A tokens

liquid/lfm-3b

Liquid's LFM 3B delivers incredible performance for its size. It positions itself as first place among 3B parameter transformers, hybrids, and RNN models It is also on par with Phi-3.5-mini on multiple benchmarks, while being 18.4% smaller.

LFM-3B is the ideal choice for mobile and other edge text-based applications.

See the launch announcement for benchmarks and more info.

Context: 32768 tokens

Max output: N/A tokens

mistralai/ministral-3b

Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference.

Context: 32768 tokens

Max output: N/A tokens

mistralai/ministral-8b

Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications.

Context: 128000 tokens

Max output: N/A tokens

gryphe/mythomax-l2-13b

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

Context: 4096 tokens

Max output: N/A tokens

amazon/nova-micro-v1

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities.

Context: 128000 tokens

Max output: 5120 tokens

microsoft/phi-4

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed.

At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs.

For more information, please see Phi-4 Technical Report

Context: 16384 tokens

Max output: N/A tokens

google/gemini-flash-1.5-8b

Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.

Click here to learn more about this model.

Usage of Gemini is subject to Google's Gemini Terms of Use.

Context: 1000000 tokens

Max output: 8192 tokens

mistralai/mistral-7b-instruct

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.

Context: 32768 tokens

Max output: 16384 tokens

meta-llama/llama-3.1-8b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 16384 tokens

Max output: 16384 tokens

meta-llama/llama-3-8b-instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 8192 tokens

Max output: 16384 tokens

mistralai/mistral-nemo

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.

The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

It supports function calling and is released under the Apache 2.0 license.

Context: 131072 tokens

Max output: 16384 tokens

nousresearch/hermes-2-pro-llama-3-8b

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Context: 131072 tokens

Max output: 131072 tokens

openchat/openchat-7b

undi95/toppy-m-7b:nitro

amazon/nova-lite-v1

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy.

With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input.

Context: 300000 tokens

Max output: 5120 tokens

mistralai/pixtral-12b

The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836.

Context: 32768 tokens

Max output: N/A tokens

Pro Models28

These models are available to Pro subscribers with unlimited usage included in the subscription.

thedrummer/unslopnemo-12b

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.

Context: 32768 tokens

Max output: N/A tokens

meta-llama/llama-3.1-70b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 131072 tokens

Max output: 16384 tokens

nousresearch/hermes-3-llama-3.1-70b

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Hermes 3 70B is a competitive, if not superior finetune of the Llama-3.1 70B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Context: 131072 tokens

Max output: 131072 tokens

deepseek/deepseek-chat

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.

For model details, please visit the DeepSeek-V3 repo for more information, or see the launch announcement.

Context: 163840 tokens

Max output: N/A tokens

microsoft/phi-3.5-mini-128k-instruct

Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.

The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.

Context: 128000 tokens

Max output: N/A tokens

ai21/jamba-1-5-mini

mistralai/codestral-mamba

openai/gpt-4o-mini

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs.

As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than GPT-3.5 Turbo. It maintains SOTA intelligence, while being significantly more cost-effective.

GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences common leaderboards.

Check out the launch announcement to learn more.

#multimodal

Context: 128000 tokens

Max output: 16384 tokens

anthropic/claude-3-haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance.

See the launch announcement and benchmark results here

#multimodal

Context: 200000 tokens

Max output: 4096 tokens

cognitivecomputations/dolphin-mixtral-8x22b

google/gemma-2-27b-it

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models.

Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.

See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.

Context: 8192 tokens

Max output: N/A tokens

mistralai/mixtral-8x7b-instruct

Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.

Instruct model fine-tuned by Mistral. #moe

Context: 32768 tokens

Max output: 16384 tokens

mistralai/mistral-small-24b-instruct-2501

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.

The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.

Context: 32768 tokens

Max output: N/A tokens

gryphe/mythomist-7b

anthropic/claude-instant-1:beta

nvidia/llama-3.1-nemotron-70b-instruct

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains.

Usage of this model is subject to Meta's Acceptable Use Policy.

Context: 131072 tokens

Max output: 16384 tokens

deepseek/deepseek-chat-v3-0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.

It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

Context: 163840 tokens

Max output: N/A tokens

thedrummer/rocinante-12b

Rocinante 12B is designed for engaging storytelling and rich prose.

Early testers have reported:

Expanded vocabulary with unique and expressive word choices
Enhanced creativity for vivid narratives
Adventure-filled and captivating stories

Context: 32768 tokens

Max output: N/A tokens

eva-unit-01/eva-qwen-2.5-14b

mistralai/mistral-tiny

Note: This model is being deprecated. Recommended replacement is the newer Ministral 8B

This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than Mistral 7B, inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.

Context: 32768 tokens

Max output: N/A tokens

mistralai/mistral-small

With 22 billion parameters, Mistral Small v24.09 offers a convenient mid-point between (Mistral NeMo 12B)[/mistralai/mistral-nemo] and (Mistral Large 2)[/mistralai/mistral-large], providing a cost-effective solution that can be deployed across various platforms and environments. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish.

Context: 32768 tokens

Max output: N/A tokens

qwen/qwen-turbo

Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.

Context: 1000000 tokens

Max output: 8192 tokens

qwen/qwen-plus

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

Context: 131072 tokens

Max output: 8192 tokens

deepseek/deepseek-r1-distill-qwen-32b

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.\n\nOther benchmark results include:\n\n- AIME 2024 pass@1: 72.6\n- MATH-500 pass@1: 94.3\n- CodeForces Rating: 1691\n\nThe model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Context: 131072 tokens

Max output: 16384 tokens

deepseek/deepseek-r1-distill-llama-70b

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:

AIME 2024 pass@1: 70.0
MATH-500 pass@1: 94.5
CodeForces Rating: 1633

The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Context: 131072 tokens

Max output: N/A tokens

qwen/qwen-2.5-coder-32b-instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

Significantly improvements in code generation, code reasoning and code fixing.
A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.

To read more about its evaluation results, check out Qwen 2.5 Coder's blog.

Context: 32768 tokens

Max output: 16384 tokens

mistralai/codestral-2501

Mistral's cutting-edge language model for coding. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.

Learn more on their blog post: https://mistral.ai/news/codestral-2501/

Context: 262144 tokens

Max output: N/A tokens

meta-llama/llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Model Card

Context: 131072 tokens

Max output: 131072 tokens

Pro Metered Models0

These premium models are available on a pay-as-you-go basis with per-token pricing.