OpenAI’s GPT-5 Faces Early Criticism Over Logic Flaws

GPT-5 openai — In medical testing, GPT-5 outperformed human doctors in reasoning. Image: Medium

OpenAI’s latest large language model, GPT-5, has received a mixed reception following its release, Coin Telegraph reports. While promoted by CEO Sam Altman as a major step forward, the model’s early public performance fell short of expectations.

Early User Feedback and Market Reaction

Soon after launch, social media posts highlighted instances of GPT-5 providing incorrect answers, including basic numerical errors and flawed logical reasoning. The prediction market Polymarket saw OpenAI’s odds of having the «best model by end of August» drop from 75% to 8%, later recovering to 24%.

Industry commentary was divided. The New Yorker wrote that GPT-5 «is the latest product to suggest that progress on large language models has stalled», reviving discussion on the scalability limits of such systems. AI researcher Gary Marcus argued that the results undermine the notion that scaling alone can lead to artificial general intelligence.

Technical Rollout Challenges

GPT-5 introduced a routing system designed to switch between models of varying capability, such as GPT-5, GPT-5 mini and GPT-5 nano, to balance performance and operating costs. However, at launch this feature did not function as intended, potentially contributing to underwhelming benchmark results.

OpenAI has since added the option for users to manually select between «auto», «fast» and «thinking» modes. Paid subscribers also regained access to GPT-4o, with Altman pledging to provide notice before any future retirement of the model.

Social Media Experiment Findings

Separate research from Cornell University and the University of Amsterdam examined potential algorithmic changes to reduce division on social media platforms. AI-driven simulations indicated that chronological feeds lowered attention inequality but increased exposure to extreme content, while «bridging algorithms» reduced partisanship but raised attention inequality. Enhancing viewpoint diversity had no measurable effect.

AI Risk Debate

Ethereum founder Vitalik Buterin endorsed the book If Anyone Builds It, Everyone Dies, by Eliezer Yudkowsky and Nate Soares, which argues that superintelligent AI would inevitably develop goals misaligned with human interests and could not be contained. Buterin described the work as useful for understanding why some technologists view advanced AI as uniquely high risk.

Performance Improvements and Medical Applications

Despite initial criticism, GPT-5 has shown measurable gains over previous models. It produces fewer factual errors, demonstrates stronger reasoning, is less susceptible to jailbreak exploits and scores higher in formal benchmarks. Contrary to early reports, GPT-5 Pro achieved a score of 148 on the Mensa Norway test, with GPT-5 scoring 120 and GPT-5 Pro (Vision) scoring 138.

In medical testing, GPT-5 outperformed human doctors in reasoning (by 24.23%) and understanding (by 29.4%). Professor Derya Unutmaz of the Jackson Laboratory for Genomic Medicine reported significant research benefits when using the GPT-5 thinking mode in a month-long project on engineered cells for lymphoma treatment. The model accurately predicted experimental results and proposed refinements, reducing projected timelines from years to weeks.

Unutmaz described the implications for scientific research as transformative, enabling faster and more targeted discovery.

Kursiv also reports that OpenAI CEO Sam Altman has raised the alarm about a major gap in legal privacy protections.