Evaluating Claude and Codex
Over the past few weeks, I’ve been working on a moderately sized PHP Laravel project originally built by contractors. My focus has been on cleaning up the codebase and building new functionality. Along the way, I used both Claude Code (3.7 Sonnet) and Codex cli (o4-mini) to implement the changes.
This post is a comparision between the two based on my experience and which one I’ll be sticking with going forward.
Claude vs. Codex
Maturity
Claude feels more mature than Codex. While both offer similar features, Codex still runs into edge-case issues that Claude seems to avoid (this might change in the future ofcourse).
Examples:
- Codex frequently breaks due to API rate limits (200k tokens/minute). #88
- It crashes unexpectedly, like when referencing missing files (this happened a few times when I asked it to check screenshots). #382
I didn’t encounter such issues with Claude, even though I used it more heavily. That consistency makes Claude a more stable experience overall. It’s really annoying when Codex breaks half way through a big multi-file refactor.
Coding Performance
When Codex does work, it performs similarly to Claude. I implemented equivalent features using both, and their code suggestions and analysis quality were often comparable.
This is backed by benchmarks too:
Interestingly, o4-mini even ranks higher than Claude 3.7 Sonnet in some of these tests.
Cost
Given how close their performance is, cost becomes a deciding factor for me. And Codex wins here, by far.
In my case:
- Claude was costing around €20/day.
- Codex was less than half of that (not counting OpenAI’s free daily credits).
Token cost comparison:
- Claude 3.7 Sonnet: $15 per 1M output tokens
- Codex o4-mini: $4.40 per 1M output tokens
Conclusion
Between the two, Claude offers a slightly smoother and more stable experience. But Codex is catching up fast—and when the bugs are resolved, I’ll probably switch to it full-time. The cost savings are too significant to ignore.