Skip to main content

Overview

SarvamLLMService provides chat completion capabilities using Sarvam’s API with OpenAI-compatible interface. It supports streaming responses, function calling, and Sarvam-specific features like wiki grounding and configurable reasoning effort levels.

Sarvam LLM API Reference

Pipecat’s API methods for Sarvam integration

Example Implementation

Function calling example with Sarvam

Sarvam Documentation

Official Sarvam documentation

Sarvam Platform

Access models and manage API keys

Installation

To use Sarvam LLM services, install the required dependencies:
pip install "pipecat-ai[sarvam]"

Prerequisites

Sarvam Account Setup

Before using Sarvam LLM services, you need:
  1. Sarvam Account: Sign up at Sarvam
  2. API Key: Generate an API key from your account dashboard
  3. Model Selection: Choose from available models (sarvam-30b, sarvam-105b, etc.)

Required Environment Variables

  • SARVAM_API_KEY: Your Sarvam API key for authentication

Configuration

api_key
str
required
Sarvam API key used for both OpenAI auth and Sarvam subscription header.
base_url
str
default:"https://api.sarvam.ai/v1"
Sarvam OpenAI-compatible base URL. Override if using a different endpoint.
settings
SarvamLLMService.Settings
default:"None"
Runtime-configurable model settings. See Settings below.
default_headers
Mapping[str, str]
default:"None"
Additional HTTP headers to include in every request.

Settings

Runtime-configurable settings passed via the settings constructor argument using SarvamLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstr"sarvam-30b"Sarvam model identifier. Supported models: sarvam-30b, sarvam-30b-16k, sarvam-105b, sarvam-105b-32k.
wiki_groundingboolNoneEnable or disable wiki grounding feature. Sarvam-specific parameter.
reasoning_effortLiteral["low", "medium", "high"]NoneSet reasoning effort level. Sarvam-specific parameter.
temperaturefloatNOT_GIVENSampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.
max_tokensintNOT_GIVENMaximum tokens to generate.
top_pfloatNOT_GIVENTop-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
frequency_penaltyfloatNOT_GIVENPenalty for frequent tokens (-2.0 to 2.0). Positive values discourage repetition.
presence_penaltyfloatNOT_GIVENPenalty for new topics (-2.0 to 2.0). Positive values encourage the model to talk about new topics.
NOT_GIVEN values are omitted from the API request entirely, letting the Sarvam API use its own defaults. This is different from None, which would be sent explicitly.

Usage

Basic Setup

import os
from pipecat.services.sarvam import SarvamLLMService

llm = SarvamLLMService(
    api_key=os.getenv("SARVAM_API_KEY"),
)

With Custom Settings

import os
from pipecat.services.sarvam import SarvamLLMService

llm = SarvamLLMService(
    api_key=os.getenv("SARVAM_API_KEY"),
    settings=SarvamLLMService.Settings(
        model="sarvam-105b",
        temperature=0.7,
        max_tokens=1000,
        wiki_grounding=True,
        reasoning_effort="high",
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using LLMUpdateSettingsFrame:
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.sarvam.llm import SarvamLLMSettings

await task.queue_frame(
    LLMUpdateSettingsFrame(
        delta=SarvamLLMSettings(
            temperature=0.3,
            reasoning_effort="medium",
        )
    )
)

Notes

  • OpenAI Compatibility: Sarvam’s API is OpenAI-compatible, allowing use of familiar patterns and parameters.
  • Sarvam-Specific Features: The wiki_grounding and reasoning_effort parameters are unique to Sarvam and provide additional control over model behavior.
  • Function Calling: Supports OpenAI-style tool/function calling format. When using tool_choice, you must provide a non-empty tools list.
  • Unsupported Parameters: Some OpenAI parameters are not supported by Sarvam’s API and are automatically removed from requests: stream_options, max_completion_tokens, service_tier.

Event Handlers

SarvamLLMService supports the following event handlers, inherited from LLMService:
EventDescription
on_completion_timeoutCalled when an LLM completion request times out
on_function_calls_startedCalled when function calls are received and execution is about to start
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")