On Thursday, OpenAI launched its latest model, GPT-5.3-Codex-Spark, on Cerebras Systems’ AI hardware, aiming to enhance the user experience with its Codex code assistant. This deployment leverages the revolutionizing capacity of Cerebras’ SRAM-based CS3 accelerators, reaching output speeds of over 1,000 tokens per second. Just last month, OpenAI engaged in a $10 billion agreement with Cerebras to expand their AI silicon deployment substantially.
Cerebras advantages lie in their waferscale architecture, tapping into ultra-fast on-chip memory known as SRAM. This advancement stands significantly quicker than Nvidia’s newly introduced Rubin GPUs. These developments have refined OpenAI’s system, achieving higher speed without compromising the model’s elegance in generating seamless code interactions. Despite the impressive speed, the Spark model doesn’t provide full transparency regarding parameters like the parameter count, as seen with previous OpenAI model releases such as gpt-oss.
GPT-5.3-Codex-Spark’s 128,000-token context window allows it to deliver prompt responses while managing substantial data streams during code generation and editing. Though the model can swiftly exceed this capacity in specific scenarios, it is designed to perform targeted edits efficiently, refraining from unnecessary debug runs unless commanded.
While GPUs remain at the heart of OpenAI’s general processing strategies due to their cost-effectiveness, Cerebras systems are particularly suited for environments craving minimal latency. As Cerebras continues to scale its computing capacity, OpenAI plans to adapt its larger models for this platform, enhancing high-speed inference capabilities.
OpenAI has made GPT-5.3-Codex-Spark available in preview for Codex Pro users and select partners via API.
/ Daily News…