Overview
SarvamLLMService provides chat completion capabilities using Sarvam’s API with OpenAI-compatible interface. It supports streaming responses, function calling, and Sarvam-specific features like wiki grounding and configurable reasoning effort levels.
Sarvam LLM API Reference
Pipecat’s API methods for Sarvam integration
Example Implementation
Function calling example with Sarvam
Sarvam Documentation
Official Sarvam documentation
Sarvam Platform
Access models and manage API keys
Installation
To use Sarvam LLM services, install the required dependencies:Prerequisites
Sarvam Account Setup
Before using Sarvam LLM services, you need:- Sarvam Account: Sign up at Sarvam
- API Key: Generate an API key from your account dashboard
- Model Selection: Choose from available models (sarvam-30b, sarvam-105b, etc.)
Required Environment Variables
SARVAM_API_KEY: Your Sarvam API key for authentication
Configuration
Sarvam API key used for both OpenAI auth and Sarvam subscription header.
Sarvam OpenAI-compatible base URL. Override if using a different endpoint.
Runtime-configurable model settings. See Settings below.
Additional HTTP headers to include in every request.
Settings
Runtime-configurable settings passed via thesettings constructor argument using SarvamLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "sarvam-30b" | Sarvam model identifier. Supported models: sarvam-30b, sarvam-30b-16k, sarvam-105b, sarvam-105b-32k. |
wiki_grounding | bool | None | Enable or disable wiki grounding feature. Sarvam-specific parameter. |
reasoning_effort | Literal["low", "medium", "high"] | None | Set reasoning effort level. Sarvam-specific parameter. |
temperature | float | NOT_GIVEN | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
max_tokens | int | NOT_GIVEN | Maximum tokens to generate. |
top_p | float | NOT_GIVEN | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
frequency_penalty | float | NOT_GIVEN | Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition. |
presence_penalty | float | NOT_GIVEN | Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics. |
NOT_GIVEN values are omitted from the API request entirely, letting the
Sarvam API use its own defaults. This is different from None, which would be
sent explicitly.Usage
Basic Setup
With Custom Settings
Updating Settings at Runtime
Model settings can be changed mid-conversation usingLLMUpdateSettingsFrame:
Notes
- OpenAI Compatibility: Sarvam’s API is OpenAI-compatible, allowing use of familiar patterns and parameters.
- Sarvam-Specific Features: The
wiki_groundingandreasoning_effortparameters are unique to Sarvam and provide additional control over model behavior. - Function Calling: Supports OpenAI-style tool/function calling format. When using
tool_choice, you must provide a non-emptytoolslist. - Unsupported Parameters: Some OpenAI parameters are not supported by Sarvam’s API and are automatically removed from requests:
stream_options,max_completion_tokens,service_tier.
Event Handlers
SarvamLLMService supports the following event handlers, inherited from LLMService:
| Event | Description |
|---|---|
on_completion_timeout | Called when an LLM completion request times out |
on_function_calls_started | Called when function calls are received and execution is about to start |