ChatOutlines
This will help you getting started with Outlines chat models. For detailed documentation of all ChatOutlines features and configurations head to the API reference.
Outlines is a library for constrained language generation. It allows you to use large language models (LLMs) with various backends while applying constraints to the generated output.
Overview
Integration details
Class | Package | Local | Serializable | JS support | Package downloads | Package latest |
---|---|---|---|---|---|---|
ChatOutlines | langchain-community | ✅ | ❌ | ❌ |
Model features
Tool calling | Structured output | JSON mode | Image input | Audio input | Video input | Token-level streaming | Native async | Token usage | Logprobs |
---|---|---|---|---|---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
Setup
To access Outlines models you'll need to have an internet connection to download the model weights from huggingface. Depending on the backend you need to install the required dependencies (see Outlines docs)
Credentials
There is no built-in auth mechanism for Outlines.
Installation
The LangChain Outlines integration lives in the langchain-community
package and requires the outlines
library:
%pip install -qU langchain-community outlines
Instantiation
Now we can instantiate our model object and generate chat completions:
from langchain_community.chat_models.outlines import ChatOutlines
# For llamacpp backend
model = ChatOutlines(model="TheBloke/phi-2-GGUF/phi-2.Q4_K_M.gguf", backend="llamacpp")
# For vllm backend (not available on Mac)
model = ChatOutlines(model="meta-llama/Llama-3.2-1B", backend="vllm")
# For mlxlm backend (only available on Mac)
model = ChatOutlines(model="mistralai/Ministral-8B-Instruct-2410", backend="mlxlm")
# For huggingface transformers backend
model = ChatOutlines(model="microsoft/phi-2") # defaults to transformers backend
Invocation
from langchain_core.messages import HumanMessage
messages = [HumanMessage(content="What will the capital of mars be called?")]
response = model.invoke(messages)
response.content
Streaming
ChatOutlines supports streaming of tokens:
messages = [HumanMessage(content="Count to 10 in French:")]
for chunk in model.stream(messages):
print(chunk.content, end="", flush=True)
Chaining
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human", "{input}"),
]
)
chain = prompt | model
chain.invoke(
{
"input_language": "English",
"output_language": "German",
"input": "I love programming.",
}
)
Constrained Generation
ChatOutlines allows you to apply various constraints to the generated output: