Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developers.datagrid.com/llms.txt

Use this file to discover all available pages before exploring further.

Batch Predictions lets you submit a shared prompt and output schema once, then process many files asynchronously through a single batch job. Each item references an existing Datagrid file and returns one NDJSON result line keyed by your custom_id.
  1. Upload or identify the files you want to process with the Files API.
  2. Call Create batch prediction with your shared prompt, output_schema, and item list.
  3. Poll Retrieve batch prediction or subscribe to a terminal-state webhook until the batch reaches completed, failed, expired, or cancelled.
  4. Read Retrieve batch prediction results and parse the NDJSON stream one line at a time.
  5. If needed, stop pending work with Cancel batch prediction.

Webhooks

You can subscribe to terminal batch lifecycle events with Webhooks. Batch prediction webhook event types are:
  • batch_prediction.completed
  • batch_prediction.failed
  • batch_prediction.expired
  • batch_prediction.cancelled
Webhook deliveries include the same batch prediction object returned by Retrieve batch prediction. Use the webhook as a notification that the batch reached a terminal state, then call the results endpoint when you need the full NDJSON result stream.

Lifecycle

StatusMeaning
validatingThe batch was accepted and Datagrid is validating file access, file types, and page references.
in_progressItems are actively being processed.
finalizingAll item work is done and Datagrid is finishing the batch.
completedThe batch finished successfully. Some individual items may still contain per-item errors in the results stream.
failedValidation or processing failed before the batch could complete normally.
cancellingA cancel request was accepted and remaining work is being stopped.
cancelledThe batch reached a terminal cancelled state. Completed items remain available in the results stream.
expiredThe completion window elapsed before all items finished. Completed items remain available in the results stream.
completion_window is currently fixed to 24h.

Request limits and validation

  • A batch must contain between 1 and 5,000 items.
  • The JSON request body must be no larger than 100 MiB.
  • Each custom_id must be unique within the batch and 128 characters or fewer.
  • page is optional, 1-indexed, and only valid for paged file formats.
  • metadata can contain up to 16 string values. Keys must be 64 characters or fewer, and values must be 512 characters or fewer.
  • output_schema must be a valid JSON Schema Draft 2020-12 object with root type: "object". $defs, $ref, allOf, anyOf, not, oneOf, and patternProperties are not supported.

Idempotency

POST /v1/batch-predictions accepts an optional Idempotency-Key header.
  • Reusing the same key with the same request body replays the original response.
  • Reusing the same key with a different request body returns 409 Conflict.
  • Idempotency records expire after 24 hours.

Results format

The results endpoint returns application/x-ndjson, not a JSON array. Split on newlines and JSON parse each non-empty line independently.
{"object":"batch_prediction.result","batch_id":"bpred_123","custom_id":"drawing_001","status":"succeeded","output":{"architect_name":"Smith & Co."},"error":null}
{"object":"batch_prediction.result","batch_id":"bpred_123","custom_id":"drawing_002","status":"errored","output":null,"error":{"type":"https://api.datagrid.com/errors/prediction_failed","title":"Prediction Failed","status":422,"detail":"The model returned an invalid response."}}
Each line corresponds to one submitted item and preserves your custom_id. output is only populated when status is succeeded; error is populated when status is errored, canceled, or expired.

Retention and cleanup

  • Terminal batch metadata remains retrievable after processing.
  • Retained result lines are currently eligible for cleanup 29 days after batch creation.
  • When that happens, results_url becomes null and the results endpoint returns 410 Gone.