Sarvam

Overview

SarvamLLMService provides chat completion capabilities using Sarvam’s API with OpenAI-compatible interface. It supports streaming responses, function calling, and Sarvam-specific features like wiki grounding and configurable reasoning effort levels.

Sarvam LLM API Reference

Pipecat’s API methods for Sarvam integration

Example Implementation

Function calling example with Sarvam

Sarvam Documentation

Official Sarvam documentation

Sarvam Platform

Access models and manage API keys

Installation

To use Sarvam LLM services, install the required dependencies:

pip install "pipecat-ai[sarvam]"

Prerequisites

Sarvam Account Setup

Before using Sarvam LLM services, you need:

Sarvam Account: Sign up at Sarvam
API Key: Generate an API key from your account dashboard
Model Selection: Choose from available models (sarvam-30b, sarvam-105b, etc.)

Required Environment Variables

SARVAM_API_KEY: Your Sarvam API key for authentication

Configuration

api_key

str

required

Sarvam API key used for both OpenAI auth and Sarvam subscription header.

base_url

str

default:"https://api.sarvam.ai/v1"

Sarvam OpenAI-compatible base URL. Override if using a different endpoint.

settings

SarvamLLMService.Settings

default:"None"

Runtime-configurable model settings. See Settings below.

default_headers

Mapping[str, str]

default:"None"

Additional HTTP headers to include in every request.

Settings

Runtime-configurable settings passed via the settings constructor argument using SarvamLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`"sarvam-30b"`	Sarvam model identifier. Supported models: `sarvam-30b`, `sarvam-30b-16k`, `sarvam-105b`, `sarvam-105b-32k`.
`wiki_grounding`	`bool`	`None`	Enable or disable wiki grounding feature. Sarvam-specific parameter.
`reasoning_effort`	`Literal["low", "medium", "high"]`	`None`	Set reasoning effort level. Sarvam-specific parameter.
`temperature`	`float`	`NOT_GIVEN`	Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.
`max_tokens`	`int`	`NOT_GIVEN`	Maximum tokens to generate.
`top_p`	`float`	`NOT_GIVEN`	Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
`frequency_penalty`	`float`	`NOT_GIVEN`	Penalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition.
`presence_penalty`	`float`	`NOT_GIVEN`	Penalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics.

NOT_GIVEN values are omitted from the API request entirely, letting the Sarvam API use its own defaults. This is different from None, which would be sent explicitly.

Usage

Basic Setup

import os
from pipecat.services.sarvam import SarvamLLMService

llm = SarvamLLMService(
    api_key=os.getenv("SARVAM_API_KEY"),
)

With Custom Settings

import os
from pipecat.services.sarvam import SarvamLLMService

llm = SarvamLLMService(
    api_key=os.getenv("SARVAM_API_KEY"),
    settings=SarvamLLMService.Settings(
        model="sarvam-105b",
        temperature=0.7,
        max_tokens=1000,
        wiki_grounding=True,
        reasoning_effort="high",
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using LLMUpdateSettingsFrame:

from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.sarvam.llm import SarvamLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=SarvamLLMSettings(
            temperature=0.3,
            reasoning_effort="medium",
        )
    )
)

Notes

OpenAI Compatibility: Sarvam’s API is OpenAI-compatible, allowing use of familiar patterns and parameters.
Sarvam-Specific Features: The wiki_grounding and reasoning_effort parameters are unique to Sarvam and provide additional control over model behavior.
Function Calling: Supports OpenAI-style tool/function calling format. When using tool_choice, you must provide a non-empty tools list.
Unsupported Parameters: Some OpenAI parameters are not supported by Sarvam’s API and are automatically removed from requests: stream_options, max_completion_tokens, service_tier.

Event Handlers

SarvamLLMService supports the following event handlers, inherited from LLMService:

Event	Description
`on_completion_timeout`	Called when an LLM completion request times out
`on_function_calls_started`	Called when function calls are received and execution is about to start

@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Sarvam LLM API Reference

Example Implementation

Sarvam Documentation

Sarvam Platform

Installation

Prerequisites

Sarvam Account Setup

Required Environment Variables

Configuration

Settings

Usage

Basic Setup

With Custom Settings

Updating Settings at Runtime

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Sarvam LLM API Reference

Example Implementation

Sarvam Documentation

Sarvam Platform

​Installation

​Prerequisites

​Sarvam Account Setup

​Required Environment Variables

​Configuration

​Settings

​Usage

​Basic Setup

​With Custom Settings

​Updating Settings at Runtime

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Sarvam Account Setup

Required Environment Variables

Configuration

Settings

Usage

Basic Setup

With Custom Settings

Updating Settings at Runtime

Notes

Event Handlers