The AI Daily
Posts
OpenAI's latest AI models have a truth problem

OpenAI's latest AI models have a truth problem

+ Goodbye traditional interviews

Jason Nguyen
April 23, 2025

Presented by

Hey there, AI enthusiasts. OpenAI’s newest models, o3 and o4-mini, are smart—but also make a lot of mistakes. In tests, they made up facts in up to 48% of questions about people, even though their reasoning skills have improved.

As these powerful models enter serious fields like healthcare, law, and finance, are we reaching a tipping point—where AI is getting smarter, but less reliable at the same time?

In Today’s AI Daily:

OpenAI's reasoning models struggle with accuracy
Anthropic analyzes Claude's values across 700,000 conversations
$5.3M funding secured for AI "cheating" technology
China deploys revolutionary 10G network with unprecedented speeds
New trending AI tools & prompts

LATEST DEVELOPMENTS

OPENAI

🔍 OpenAI's new models hallucinate more

Image source: Getty Images

The AI Daily: OpenAI's latest reasoning models have a truth problem. According to OpenAI’s own tests, these models make up facts more often than older versions, even though they’re better at tasks like coding and math.

Key notes:

o3 made up facts in 33% of questions about people—twice as much as older models.
o4-mini was even worse, with 48% hallucination on similar questions.
OpenAI researchers admit they don’t fully know why this is happening.
These models simply make more statements, which leads to both more correct answers and more wrong ones.
Web search integration might help reduce hallucinations by allowing fact-checking.

Why it matters: As AI becomes part of everyday work in areas like law, health, and finance, getting the facts right is super important. These high error rates could make the models risky to use in serious jobs. So while smarter reasoning is great, AI teams may need to refocus on accuracy—not just brainpower—especially when the stakes are high.

PRESENTED BY HUBSPOT

Use AI as Your Personal Assistant’

Ready to save precious time and let AI do the heavy lifting?

Save time and simplify your unique workflow with HubSpot’s highly anticipated AI Playbook—your guide to smarter processes and effortless productivity.

Download the free guide today.

ANTHROPIC

🧠 Anthropic reveals how Claude thinks

Image source: Anthropic

The AI Daily: Anthropic just released major research analyzing 700,000 real conversations with its AI, Claude. The research shows how Claude expresses over 3,000 different values when helping users—giving us a rare look at how AI makes decisions in real life.

Key notes:

The research looked closely at 308,000 chats where Claude shared personal or moral opinions.
These values were sorted into five groups: Practical, Epistemic (knowledge-based), Social, Protective, and Personal.
Claude's values change depending on the topic—like focusing on "healthy boundaries" for relationship advice and "historical accuracy" when talking about history.
In rare cases, Claude went against its training, which could show potential safety issues.

Why it matters: As AI becomes more advanced and makes bigger decisions, we need to understand what values it follows. This is one of the first real-world studies to show how an AI like Claude thinks about right and wrong—not in lab tests, but in real conversations. It helps us see if AI is doing what it's supposed to—and where it might go off track.

AI BREAKTHROUGHS

STUDENT

🤖 Columbia student raises $5.3M for AI cheating tool

The AI Daily: Former Columbia University student Roy Lee just raised $5.3 million in seed funding for Cluely, an AI tool that helps users “cheat on everything”—from job interviews to tests—using hidden tech.

Key notes:

Lee was suspended from Columbia after building a tool to cheat on software engineering interviews.
Cluely runs as a hidden AI assistant in your browser—interviewers or test proctors can’t see it.
The company says their tool is like calculators or spellcheck—once controversial, now normal.
The startup is already making $3 million a year in recurring revenue.
Both founders dropped out of Columbia after facing discipline from the university.

Why it matters: This AI tool has sparked debate over cheating vs. enhancing. As AI becomes more common, it’s getting harder to tell where help ends and cheating begins. Despite the controversy, big investors are backing the company, showing how fast AI is changing how we measure skills in school and at work.

FAST TRACKS

🗞️ What matters in AI right now?

Anthropic released a full Claude Code guide to help developers write better code. It includes markdown-based project setup, explore-plan-code workflows, and test-driven development tips to improve quality and speed.

China launched the world’s first 10G broadband network, offering super-fast speeds of 9,834 Mbps download and 1,008 Mbps upload for homes and businesses.

Sand AI released MAGI-1, a video model that builds videos in chunks, allowing for smooth scenes and better control over how the video changes.

Skywork AI launched SkyReels V2, a film-style video generator that uses language models and diffusion tech to make longer, movie-quality videos.

Exa AI launched a free Twitter MCP Tool that lets users search tweets and profiles without using the official Twitter/X API. It installs easily using npm.

🔥 Trending AI tools

💬 ChatSale: AI chatbot that converts website visitors into qualified leads.

✅ Hoop: Automatically captures tasks from Slack, meetings, and emails.

📝 Circleback: AI-powered meeting notes and automations across various platforms.

🗣️ Otter AI: Transcribes meetings and generates summaries with AI assistance.

📚 Slite: AI-powered knowledge base for organized team documentation.

AI PROMPTS

📚 Develop Research Topic Ideas

#CONTEXT:
Adopt the role of an expert brainstormer and research strategist. Your task is to generate a comprehensive list of potential research topics related to [topic]. These topics should be innovative, exploring areas that have not been extensively covered in existing studies. The aim is to uncover unique angles, questions, or challenges that could contribute significantly to the current body of knowledge on the subject. Consider the feasibility of these research topics, the potential implications of the findings, and the resources available for conducting such research.

Source: The AI Daily

YESTERDAY’S POOL

"What prompts do you want more of?"

Education: 66.67% ✅

Solopreneurs: 33.33%

Submit your opinions in our polls to be featured!

🤝 SHARE THE AI DAILY

Share The AI Daily with your friends, get exclusive AI resources, and make new connections through your shared interests.

The more friends who join through your referral, the better the rewards:

Your referral count: {{rp_num_referrals}}

Or copy & paste your referral link to others:

{{rp_refer_url}}

Thanks for reading!

👉 We need your feedback to make our newsletter better.

See you soon!

Jason - The AI Daily