Reducing OpenAI Bills by 50% with the OpenAI Batch API

Processing sentiment classification, intent recognition, or theme tagging on thousands of social media comments is a perfect job for LLMs. However, if you hit OpenAI's synchronous endpoints (like /v1/chat/completions) for millions of records, you'll run into two massive roadblocks: astronomical API costs and strict rate limit blockages.

To run large-scale comment classifications in production, you should migrate to the OpenAI Batch API. By executing jobs asynchronously, you get a 50% discount on tokens, massive rate limit thresholds, and clear execution guarantees.

The Batch Paradigm: Synchronous vs. Asynchronous

Traditional APIs require waiting for a HTTP response. Under the Batch API pattern:

You bundle hundreds of thousands of prompts into a single .jsonl (JSON Lines) payload file.
You upload the payload file to OpenAI's files endpoint.
You trigger a Batch Job referencing that file ID.
OpenAI processes the batch asynchronously in their background queue within 24 hours (usually completing in minutes).
You download the output JSONL file containing all classified results.

Step 1: Compiling the JSONL Batch File

Each line in the JSONL batch file must represent an independent request with a unique custom_id so we can map results back to our database records:

import json
import pandas as pd

def compile_sentiment_batch_file(comments_csv: str, output_jsonl: str):
    df = pd.read_csv(comments_csv)
    
    with open(output_jsonl, 'w') as f:
        for idx, row in df.iterrows():
            request_payload = {
                "custom_id": f"comment-{row['id']}",
                "method": "POST",
                "url": "/v1/chat/completions",
                "body": {
                    "model": "gpt-4o-mini",
                    "temperature": 0.0,
                    "response_format": {"type": "json_object"},
                    "messages": [
                        {"role": "system", "content": "You are a movie sentiment analyst. Classify the comment as POSITIVE, NEGATIVE, or NEUTRAL. Return JSON: {\"sentiment\": \"value\"}"},
                        {"role": "user", "content": row['comment_text']}
                    ]
                }
            }
            f.write(json.dumps(request_payload) + "\n")
            
    print(f"Successfully compiled {len(df)} prompts into {output_jsonl}")

Step 2: Uploading and Executing the Batch Job

Next, we write a Python script that uploads the compiled JSONL file and enqueues the batch job using the official OpenAI client:

from openai import OpenAI

def execute_openai_batch(file_path: str):
    client = OpenAI()

    # 1. Upload the JSONL file to OpenAI
    print("Uploading file to OpenAI...")
    upload_response = client.files.create(
        file=open(file_path, "rb"),
        purpose="batch"
    )
    file_id = upload_response.id
    print(f"Uploaded successfully. File ID: {file_id}")

    # 2. Trigger the Batch execution job
    print("Triggering batch job...")
    batch_job = client.batches.create(
        input_file_id=file_id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    print(f"Batch Job enqueued successfully. Job ID: {batch_job.id}")
    return batch_job.id

Step 3: Monitoring Batch Execution

To track execution, we can write a simple shell script (e.g. openai_batch_checker.sh) to query the job status programmatically until it finishes:

#!/usr/bin/env bash
# openai_batch_checker.sh

BATCH_ID=$1
if [ -z "$BATCH_ID" ]; then
    echo "Usage: ./openai_batch_checker.sh <batch_id>"
    exit 1
fi

while true; do
    # Fetch status using curl and openai CLI
    STATUS=$(openai api batches.retrieve -i "$BATCH_ID" | jq -r '.status')
    echo "Current Batch Status: $STATUS"
    
    if [ "$STATUS" == "completed" ]; then
        OUTPUT_FILE_ID=$(openai api batches.retrieve -i "$BATCH_ID" | jq -r '.output_file_id')
        echo "Batch finished! Output File ID: $OUTPUT_FILE_ID"
        # Download results
        openai api files.retrieve -i "$OUTPUT_FILE_ID" > batch_results.jsonl
        break
    elif [ "$STATUS" == "failed" ] || [ "$STATUS" == "cancelled" ]; then
        echo "Batch failed or was cancelled."
        exit 1
    fi
    
    # Poll status every 30 seconds
    sleep 30
done

Key Batch API Cost Takeaways

50% Cost Savings: Asynchronous batch processing is billed at exactly half the price of standard real-time endpoints.
Huge Rate Limits: Batch execution uses a separate, massive pool of rate limit quotas, preventing your production web servers from experiencing token exhaustion or throttle locks.
Ensure Idempotency: Always map each prompt in your JSONL payload to a unique database primary key via the custom_id property to safely serialize results on import.

By moving massive background text classifications from synchronous API endpoints to the OpenAI Batch API, you can cut your model expenses in half while gaining institutional-grade scaling safety.