Skip to main content

Documentation Index

Fetch the complete documentation index at: https://daily-docs-pr-4592.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

InceptionLLMService provides access to Inception’s Mercury-2 diffusion-based reasoning model through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management with advanced reasoning capabilities.

Inception LLM API Reference

Pipecat’s API methods for Inception integration

Example Implementation

Complete example with function calling

Inception Labs

Access models and manage API keys

Installation

To use Inception services, install the required dependency:
uv add "pipecat-ai[inception]"

Prerequisites

Inception Account Setup

Before using Inception LLM services, you need:
  1. Inception Account: Sign up at Inception Labs
  2. API Key: Generate an API key from your account dashboard
  3. Model Selection: Access to Mercury-2, Inception’s diffusion-based reasoning model

Required Environment Variables

  • INCEPTION_API_KEY: Your Inception API key for authentication

Configuration

api_key
str
required
Inception API key for authentication.
base_url
str
default:"https://api.inceptionlabs.ai/v1"
Base URL for Inception API endpoint.
settings
InceptionLLMService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using InceptionLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details. This service extends OpenAILLMService.Settings with Inception-specific parameters:
model
str
default:"mercury-2"
Model identifier to use. Defaults to “mercury-2”, Inception’s diffusion-based reasoning model.
reasoning_effort
Literal['instant', 'low', 'medium', 'high'] | None
default:"None"
Controls how much reasoning the model applies. Options are “instant”, “low”, “medium”, or “high”. When unset, the parameter is omitted and Inception’s server-side default applies.
realtime
bool | None
default:"None"
When True, reduces time to first diffusion block (TTFT) for faster initial response times.
For additional settings inherited from OpenAI, see OpenAI LLM Settings.

Usage

Basic Setup

import os
from pipecat.services.inception import InceptionLLMService

llm = InceptionLLMService(
    api_key=os.getenv("INCEPTION_API_KEY"),
)

With Custom Settings

from pipecat.services.inception import InceptionLLMService

llm = InceptionLLMService(
    api_key=os.getenv("INCEPTION_API_KEY"),
    settings=InceptionLLMService.Settings(
        model="mercury-2",
        reasoning_effort="instant",
        realtime=True,
        temperature=0.7,
        max_tokens=2048,
    ),
)

With Function Calling

from pipecat.services.inception import InceptionLLMService
from pipecat.services.llm_service import FunctionCallParams

async def get_weather(params: FunctionCallParams):
    await params.result_callback({"temperature": "75", "conditions": "sunny"})

llm = InceptionLLMService(
    api_key=os.getenv("INCEPTION_API_KEY"),
    settings=InceptionLLMService.Settings(
        reasoning_effort="low",
    ),
)

llm.register_function("get_weather", get_weather)

Notes

  • Inception does not support the "developer" message role. Use "system" instead.
  • The Mercury-2 model uses a diffusion-based reasoning approach, which can be controlled via the reasoning_effort parameter.
  • Setting realtime=True optimizes for lower time-to-first-token at the potential cost of reasoning depth.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.