> ## Documentation Index
> Fetch the complete documentation index at: https://developers.datagrid.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Start voice session

> Prepare a real-time voice conversation with an AI Agent.

Returns a WebSocket URL and a ready-made `start` message. Open a WebSocket
connection to the returned `url`, send `start_message` as the first frame,
then stream audio back and forth.

This REST flow depends on Redis to issue a short-lived REST-to-WebSocket
handoff token. During a Redis incident, clients that can construct their
own `start` message may use the direct WebSocket flow below with a raw
API key.

You can also skip this endpoint and connect directly:
`wss://api.datagrid.com/ws/voice?token=YOUR_API_KEY`

**WebSocket Protocol:**

Once connected, send a JSON message with `type: "start"` and the session parameters as the payload.
The server responds with `type: "started"` containing the session and conversation IDs,
followed by `type: "ready"` when the agent is ready to receive audio.

**Audio Format:**
- Client → Server: 16-bit mono PCM at 16kHz, base64-encoded
- Server → Client: 16-bit mono PCM at 24kHz, base64-encoded

**Message Types:**
- Client: `start`, `audio`, `stop`, `interrupt`, `text`
- Server: `started`, `ready`, `audio`, `tool_call`, `interrupted`, `error`, `transcript`, `citation`, `ended`




## OpenAPI

````yaml post /voice
openapi: 3.0.3
info:
  version: 0.1.1
  title: Datagrid API
  description: Datagrid API
servers:
  - url: https://api.datagrid.com/v1
security:
  - BearerAuth: []
paths:
  /voice:
    post:
      tags:
        - Voice
      summary: Start voice session
      description: >
        Prepare a real-time voice conversation with an AI Agent.


        Returns a WebSocket URL and a ready-made `start` message. Open a
        WebSocket

        connection to the returned `url`, send `start_message` as the first
        frame,

        then stream audio back and forth.


        This REST flow depends on Redis to issue a short-lived REST-to-WebSocket

        handoff token. During a Redis incident, clients that can construct their

        own `start` message may use the direct WebSocket flow below with a raw

        API key.


        You can also skip this endpoint and connect directly:

        `wss://api.datagrid.com/ws/voice?token=YOUR_API_KEY`


        **WebSocket Protocol:**


        Once connected, send a JSON message with `type: "start"` and the session
        parameters as the payload.

        The server responds with `type: "started"` containing the session and
        conversation IDs,

        followed by `type: "ready"` when the agent is ready to receive audio.


        **Audio Format:**

        - Client → Server: 16-bit mono PCM at 16kHz, base64-encoded

        - Server → Client: 16-bit mono PCM at 24kHz, base64-encoded


        **Message Types:**

        - Client: `start`, `audio`, `stop`, `interrupt`, `text`

        - Server: `started`, `ready`, `audio`, `tool_call`, `interrupted`,
        `error`, `transcript`, `citation`, `ended`
      operationId: Voice.startSession
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/VoiceSessionRequest'
      responses:
        '200':
          description: Voice session prepared
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/VoiceSessionResponse'
        '429':
          description: >-
            Rate limit exceeded. The request has been throttled because the rate
            limit for this endpoint has been reached. Check the `Retry-After`
            response header and retry after the specified number of seconds.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/RateLimitError'
      x-codeSamples:
        - lang: JavaScript
          source: |-
            import Datagrid from 'datagrid-ai';

            const client = new Datagrid({
              apiKey: process.env['DATAGRID_API_KEY'], // This is the default and can be omitted
            });

            const voiceSessionResponse = await client.voice.startSession();

            console.log(voiceSessionResponse.agent_id);
        - lang: Python
          source: |-
            import os
            from datagrid_ai import Datagrid

            client = Datagrid(
                api_key=os.environ.get("DATAGRID_API_KEY"),  # This is the default and can be omitted
            )
            voice_session_response = client.voice.start_session()
            print(voice_session_response.agent_id)
components:
  schemas:
    VoiceSessionRequest:
      type: object
      properties:
        agent_id:
          type: string
          description: >-
            The ID of the agent to use for a direct-agent voice conversation.
            Ignored when voice_mode is orchestrator.
          nullable: true
        voice_mode:
          type: string
          description: >-
            Controls whether the session uses the voice orchestrator, with
            delegation tools, or talks directly to a specific agent. Defaults to
            orchestrator when agent_id is omitted and direct_agent when agent_id
            is provided.
          enum:
            - orchestrator
            - direct_agent
          nullable: true
        conversation_id:
          type: string
          description: >-
            The ID of an existing conversation to continue. If not provided, a
            new conversation will be created.
          nullable: true
        config:
          type: object
          description: >
            Override the agent config for this voice session.

            Only prompt overrides are supported — voice sessions always use
            Gemini Live,

            so LLM model, agent model, planning prompt, and tool settings are
            not applicable.
          nullable: true
          properties:
            system_prompt:
              type: string
              description: >-
                Directs your AI Agent's operational behavior during the voice
                session.
              nullable: true
            custom_prompt:
              type: string
              description: Custom instructions for the AI Agent during the voice session.
              nullable: true
        file_ids:
          type: array
          description: Array of file IDs to attach to the voice conversation.
          items:
            type: string
          nullable: true
        secret_ids:
          type: array
          description: Array of secret IDs to include in the context.
          items:
            type: string
          nullable: true
        knowledge_ids:
          type: array
          description: Array of knowledge IDs to make accessible to the agent.
          items:
            type: string
          nullable: true
        page_ids:
          type: array
          description: >-
            Array of page IDs to make accessible to the agent. The page and all
            knowledge under it will be accessible.
          items:
            type: string
          nullable: true
        user:
          description: Override user information for this voice session.
          nullable: true
          allOf:
            - $ref: '#/components/schemas/UserOverride'
        initial_context:
          type: string
          description: >-
            Optional context text for the voice session. When provided, the AI
            will start by briefly explaining this content before listening for
            user input.
          nullable: true
          maxLength: 2000
        initial_message:
          type: string
          description: >-
            Optional initial user message. When provided, the system greeting is
            skipped and the AI responds directly to this text (e.g. a suggested
            prompt). Takes precedence over initial_context.
          nullable: true
          maxLength: 2000
        ephemeral:
          type: boolean
          description: >-
            When true, the session is ephemeral and will not save messages to
            conversation history.
          default: false
          nullable: true
        voice_config:
          type: object
          description: Voice session configuration options.
          nullable: true
          properties:
            voice_preset:
              type: string
              description: >-
                Voice preset to use (e.g., 'sage', 'nova', 'spark'). If not
                provided, uses the agent's configured voice preset or the
                default.
              nullable: true
            silence_commit_ms:
              type: number
              description: >-
                Duration of silence (no agent audio) in milliseconds before
                auto-committing a segment. Default: 30000 (30 seconds).
              nullable: true
            segment_max_duration_ms:
              type: number
              description: >-
                Maximum duration in milliseconds of a buffered segment before
                force-commit. Default: 180000 (3 minutes).
              nullable: true
            silence_discard_ratio:
              type: number
              description: >-
                Discard a segment if this fraction (0-1) of its audio is
                silence. Default: 0.9 (90% silence threshold).
              nullable: true
            input_transcription:
              type: boolean
              description: 'Enable transcription of user input audio. Default: true.'
              nullable: true
            output_transcription:
              type: boolean
              description: 'Enable transcription of agent output audio. Default: true.'
              nullable: true
            silence_timeout:
              type: boolean
              description: >-
                When true, the server closes the connection after the client has
                been continuously sending audio frames below the
                speech-detection threshold for 60 seconds. Frames are still
                required — a fully muted microphone (no frames sent) pauses the
                countdown. Disabled by default.
              nullable: true
            silent_start:
              type: boolean
              description: >-
                When true, skip the launch greeting and start directly in
                listening mode. Disabled by default.
              nullable: true
    VoiceSessionResponse:
      type: object
      required:
        - object
        - url
        - agent_id
        - start_message
      properties:
        object:
          type: string
          description: Object type discriminator.
          enum:
            - voice.session
        url:
          type: string
          description: >-
            WebSocket URL to connect to. Includes the authentication token as a
            query parameter.
        agent_id:
          type: string
          description: >-
            The resolved agent ID. If no agent was specified in the request,
            this is the default agent.
        start_message:
          type: object
          description: >-
            Ready-made JSON message to send as the first WebSocket frame after
            connecting. Contains `type: "start"` and a `payload` with all
            session parameters.
    RateLimitError:
      type: object
      description: >-
        Returned when the rate limit is exceeded. Rate limits are enforced per
        teamspace, endpoint path, and HTTP method over a 60-second sliding
        window. Each endpoint may have its own limit — check the
        X-RateLimit-Limit response header for the effective value.
      required:
        - error
        - message
        - retryable
        - status_code
      properties:
        status_code:
          type: integer
          description: The HTTP status code (429).
        statusCode:
          type: integer
          deprecated: true
          description: Deprecated. Use status_code instead.
        error:
          type: string
          enum:
            - rate_limit_exceeded
          description: The error code identifying this as a rate limit error.
        message:
          type: string
          description: A human-readable error message.
        mitigation:
          type: string
          description: Suggested action to resolve the error.
        retryable:
          type: boolean
          description: Whether the request can be retried after a delay.
        details:
          type: object
          properties:
            reason:
              type: string
              description: A detailed explanation of why the rate limit was exceeded.
    UserOverride:
      type: object
      description: >-
        User information override for converse calls. All fields are optional -
        only provided fields will override the default user information.
      properties:
        first_name:
          type: string
          description: Override the user's first name for this converse call.
          nullable: true
        last_name:
          type: string
          description: Override the user's last name for this converse call.
          nullable: true
        email:
          type: string
          description: Override the user's email for this converse call.
          nullable: true
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer

````