Dry Run
batchling allow users to declare that they want to launch a dry run for their batching.
This feature exists for users to be able to debug and better understand what WILL happen when they ultimately disable the flag, giving them the transparency required to be confident in the library.
In practice, dry-run deactivates all provider submissions while keeping the internal batching path active (queueing, windowing, and per-queue grouping).
To put it simply, it provides users with an exact breakdown of what their batched inference run would have been for real.
Sample output:
╭────────────────────────────────────────────── batchling dry run summary ───────────────────────────────────────────────╮
│ Batchable Requests: 8 - Cache Hit Requests: 0 │
│ ┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ │
│ ┃ provider ┃ endpoint ┃ model ┃ expected reques… ┃ expected batch… ┃ │
│ ┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │
│ │ anthropic │ /v1/messages │ claude-haiku-4-5 │ 1 │ 1 │ │
│ │ doubleword │ /v1/responses │ openai/gpt-oss-20b │ 1 │ 1 │ │
│ │ gemini │ /v1beta/models/gemini-2.5-flash-… │ gemini-2.5-flash-lite │ 1 │ 1 │ │
│ │ groq │ /openai/v1/chat/completions │ llama-3.1-8b-instant │ 1 │ 1 │ │
│ │ mistral │ /v1/chat/completions │ mistral-medium-2505 │ 1 │ 1 │ │
│ │ openai │ /v1/responses │ gpt-4o-mini │ 1 │ 1 │ │
│ │ together │ /v1/chat/completions │ google/gemma-3n-E4B-it │ 1 │ 1 │ │
│ │ xai │ /v1/chat/completions │ grok-4-1-fast-non-reasoning │ 1 │ 1 │ │
│ └─────────────┴───────────────────────────────────┴─────────────────────────────┴──────────────────┴─────────────────┘ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Avoid partial counts
Dry-run exits as soon as the first intercepted request returns, which can lead
to partial totals if requests are awaited one by one. To let batchling see the
full request set before exit, schedule requests together and await them with
asyncio.gather.
Activating dry run
Dry run is activated by setting up a flag in the CLI or SDK:
-
dry_run=Trueif using the SDK -
--dry-runif using the CLI
Next Steps
- See how cache is saved and for how long it is kept.