OpenAI’s New Flex Processing: A Cost-Effective Way to Handle AI Tasks

OpenAI is stepping up its game with a fresh offering called Flex processing. If you’re looking to save on AI costs, this might just be the thing for you. Flex is all about making AI more affordable, even if it means a bit of a trade-off with slower response times and occasional hiccups in availability. But let’s face it, not every task needs lightning speed, right?

Currently in beta, Flex is being tested with OpenAI’s o3 and o4-mini reasoning models. It’s perfect for those lower-priority, non-urgent tasks—think model evaluations, data enrichment, and asynchronous workloads. By opting for Flex, you can slash your API costs by half. That’s a pretty sweet deal if speed isn’t your top concern.

Let’s dive into the numbers a bit. For the o3 model, Flex pricing is set at $5 per million input tokens (that’s roughly 750,000 words) and $20 per million output tokens. Compare that to the standard rates of $10 and $40, and you can see the savings. The o4-mini model follows suit, with Flex costing $0.55 per million input tokens and $2.20 per million output tokens, down from $1.10 and $4.40.

This launch comes at a time when AI costs are on the rise, and competition is heating up. Google recently rolled out Gemini 2.5 Flash, a reasoning model that’s giving others a run for their money with its lower cost per input token.

OpenAI is also tightening up its security measures. If you’re in tiers 1-3 of their usage hierarchy, you’ll need to go through a new ID verification process to access the o3 model features. This step is all about ensuring that only verified users can tap into advanced functionalities like reasoning summaries and streaming API support.

In essence, OpenAI’s Flex processing is a smart move to adapt to the growing demands and costs in the AI world. It offers a practical solution for those willing to trade a bit of speed for significant cost savings.