The Blacklynx Brief
Posts
From Search to Research

From Search to Research

Jan Verhulst
February 07, 2025

Good morning.

After last week’s rant and our emergency edition, let’s tone it down and try to take a big picture view of what is happening.

Because things are moving so fast, it’s valuable to just take a step back sometimes. Calm down. Breathe. Add some nuance.

There’s a lot of screaming going on.

But first : some food for thought.

Ouch!

So … let’s get into our helicopter and take a slightly broader view.

Last weekend, while everyone was busy being outraged at Trump’s trade war, OpenAI pushed out their newest product. And it might prove very significant.

In 2024, two major AI revolutions have unfolded in parallel: the rise of autonomous Agents and the emergence of powerful Reasoners.

Now, for the first time, these two forces have converged into something truly game-changing—AI systems that can conduct research with the depth and nuance of human experts, but at machine speed.

OpenAI's Deep Research is the first real glimpse of what this means.

But to understand its significance, we need to start with the building blocks: Reasoners and Agents.

The Rise of Reasoners

For years, chatbots operated in a simple way: you typed a message, and the AI responded token by token. It could only "think" while generating an answer. This led to tricks like chain-of-thought prompting ("think step by step before answering"), which significantly improved reasoning capabilities.

Enter Reasoners—AI models designed to actively produce "thinking tokens" before responding. ChatGPT o1 and o3 for example are “Reasoners”. This small shift led to two major breakthroughs:

Smarter AI – Instead of relying on brute-force model size, Reasoners learn from expert problem-solvers, refining their thought processes and dramatically improving their ability to tackle complex problems like logic and math.
Better Performance Over Time – The longer a Reasoner thinks, the better its answers become. This means AI performance can scale with computational power rather than requiring massive pre-training, making improvements much more efficient.

OpenAI’s o3 models, China’s DeepSeek r1, and Google's entry into the space show that the Reasoner race is heating up. These systems are already surpassing previous AI capabilities at an astonishing pace.

See the benchmark below - from Ethan Mollick’s blogpost at the release of o3.

The GPQA Diamond test is a subset of the Graduate-Level Google-Proof Q&A (GPQA) Benchmark, specifically designed to evaluate the capabilities of advanced AI models in handling extremely challenging scientific questions. The GPQA Benchmark comprises 448 multiple-choice questions across biology, physics, and chemistry, crafted by domain experts to ensure high quality and difficulty. Within this benchmark, the Diamond subset focuses on the most difficult 198 questions.

A PhD level math student would score perhaps 80% on math questions and about 34% in other fields.

o3 - for the first time - surpasses any human. (Deepseek is not yet on here but is thought to be somewhere in between o1 and o3.

But look at the curve - if this doesn’t look like the start of an exponential curve - i don’t know what does ..

The Age of Agents

If Reasoners are the brains, agents are the hands. An AI agent is simply a system that’s given a goal and autonomously works to achieve it. Right now, there’s a race to build general-purpose agents—AI that can handle any task thrown at them. OpenAI’s Operator is one of the most polished examples to date.

However, even the best agents struggle when facing real-world barriers. A test run of Operator revealed both its strengths and weaknesses: it flawlessly navigated a website, read content, and attempted to generate an image—but when it hit OpenAI's own security restrictions, it spiraled into an endless loop of workaround attempts.

For now, general-purpose agents remain unreliable. But specialized AI agents—narrow, domain-specific systems—are already delivering real economic value. And that brings us to Deep Research.

Deep Research: A First Look at AI-Led Discovery

OpenAI’s Deep Research is a narrow AI agent, built on its o3 Reasoner and enhanced with special tools. It’s designed to conduct rigorous research—digging into academic literature, resolving conflicting sources, and producing near PhD-level analysis.

You need to be on the 200$/month subscription to experience it.

Users on X report that when asked to do research into for example nutrition and diet, It actively engages with the research, cross-referencing findings, analyzing conflicts, and even working around paywalled sources. In one example, it took five minutes to produce a 18-page, 5000-word draft with 12 solid citations and additional references—work that would typically take hours for a human researcher.

Compared to Google’s similarly named Deep Research, the difference is clear. Google’s version aggregates information, providing a well-structured undergraduate-level summary.

OpenAI’s system, powered by Reasoners, takes a more curiosity-driven approach, mimicking the deep, analytical process of a human scholar.

Deep research—over the last week—has been the thing that has woken people up who were skeptical about artificial intelligence before that.

I read somewhere that experiencing DeepResearch made someone feel like Wile.E.Coyote going over the cliff - right before gravity kicks in.

Puzzle pieces

The convergence of Reasoners and Agents is setting the stage for something bigger.

Right now, we’re in the era of narrow AI agents like Deep Research—highly effective within specific domains but not yet capable of true general-purpose autonomy.

However, this is only the beginning.

Going forward, these agents will start working together. And then... you will gradually get the first AI-only companies.

Entrepreneurs are already building completely automated AI-driven accountancy firms and digital marketing agencies.

Somewhere in the future we’ll have AI companies where robots are manufacturing the products and the entire back office will be automated.

It’s very difficult to predict what that will look like and how this will impact society.

——

Before I let you go on and read the actual news - know that I am going around giving ‘AI inspiration’-sessions to companies.

If you need help discovering what you would be able to automate or accelerate within your company : from social media posting, notetaking , accounting tasks, report writing, email drafting, … to you name it !

If you’re interested - book a timeslot in my calendar, and let’s talk about it. If you would just like to catch up and talk about AI in general - feel free to click the button below !

AI News

OpenAI is teaming up with U.S. National Laboratories to provide 15,000 government scientists access to its AI models for research in cybersecurity, power grid protection, disease treatment, and nuclear security. The company will also deploy an AI model on Los Alamos’ Venado supercomputer and have security-cleared researchers consult on nuclear safety. This partnership highlights AI’s growing role in national security and OpenAI’s expanding influence in government tech development.
Google has introduced two experimental features that handle phone calls on behalf of users: "Ask for Me" gathers business information like pricing and availability, while "Talk to a Live Representative" waits on hold and alerts users when an agent is available. Both features use Google’s advanced AI to sound natural and provide call summaries via text or email. As AI takes over more phone interactions, the future may see automated systems talking to each other instead of humans.
San Francisco startup Riffusion has launched Fuzz, a free AI-powered music platform that generates full-length songs based on text prompts, audio snippets, or images. The platform learns users’ musical preferences over time, and its development was supported by The Chainsmokers. With AI music tools rapidly improving, more songs may already be AI-generated without listeners realizing it.
OpenAI has introduced Deep Research, a ChatGPT feature that conducts in-depth web research and delivers detailed reports with citations in under 30 minutes. The tool, available to Pro subscribers ($200/month), analyzes text, images, and PDFs, producing comprehensive summaries with clarifying questions at the start.
OpenAI has launched o3-mini, a cost-efficient AI model with strong math and coding skills, offering advanced reasoning to both free and paid users. The model responds 24% faster than its predecessor and allows developers to adjust reasoning effort for speed or accuracy, all while reducing operational costs by 63%. With o3-mini paving the way, OpenAI is expected to release the full o3 model in the coming months.
During a Reddit AMA, OpenAI CEO Sam Altman acknowledged the company may have been "on the wrong side of history" regarding open-source AI but said no immediate changes are planned. He hinted that o3 will launch in "less than a few months" and praised rival DeepSeek as a strong competitor. OpenAI also teased upcoming features, including more AI agents and a new image generator arriving soon.
SoftBank is investing $3 billion per year in OpenAI’s technology while launching Cristal Intelligence, a joint venture to provide customized OpenAI tools exclusively for Japanese businesses. The initiative will offer a specialized business version of ChatGPT and secure enterprise integrations, potentially competing with consulting firms. This deepens SoftBank’s AI ambitions following its involvement in Stargate, a $500B data center initiative with OpenAI and Oracle.
Anthropic has introduced Constitutional Classifiers, an AI-powered safety system designed to prevent models from being manipulated. In tests, it blocked 95.6% of advanced jailbreak attempts—far outperforming previous safeguards—and survived 3,000 hours of public bug bounty testing without being fully compromised. The system is now open for public testing until February 10, highlighting Anthropic’s push for stronger AI security.
The European Union is funding OpenEuroLLM, a multilingual open-source AI model designed for European businesses and governments. Built using EU supercomputers, the project aims to provide fully open AI models tailored to sectors like healthcare and banking. While $56M is small compared to major AI investments, the initiative could help Europe develop industry-specific AI solutions with regional values in mind.
Researchers from ByteDance have introduced OmniHuman-1, an AI system capable of generating highly realistic deepfake videos from a single image and audio input. The model can create videos of any length, modify existing footage, and adapt to various styles, including cartoons and complex human movements. While OmniHuman-1 isn’t publicly available, its realism raises major concerns about AI-generated misinformation and the challenge of verifying authenticity.
Apple has released Invites, a new AI-driven app that creates customized invitations and manages events by integrating Apple services like Photos, Music, and Maps. The app uses AI to generate images and text, supports RSVPs from non-Apple users, and marks Apple's first standalone AI-powered product. While competitors focus on large-scale AI models, Apple is taking a more practical approach by embedding AI into everyday tools.
Johns Hopkins researchers have built AbdomenAtlas, an AI-powered dataset of 45,000 3D CT scans with detailed organ and tumor annotations. The project, 36 times larger than previous datasets, was completed in two years using AI and expert radiologists—work that would have taken humans 2,500 years manually. Expected to accelerate early cancer detection, AbdomenAtlas is set for public release, but it still represents only a fraction of the data needed for fully comprehensive medical AI.

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

Quickfire News

OpenAI is reportedly in talks to raise up to $40 billion at a $340 billion valuation, potentially more than doubling its worth from late 2024.
Google is rolling out Gemini 2.0 Flash across its mobile and web apps, offering faster responses, improved image generation via Imagen 3, and enhanced overall performance.
Krea AI teased Krea Chat, an upcoming tool powered by DeepSeek that provides a text interface for generating and editing images and videos on its platform.
Mistral released Small 3, a 24B-parameter open-source model that matches the performance of 70B models at three times the speed while being deployable on consumer hardware.
Sakana AI introduced TinySwallow-1.5B, a compact Japanese language model that runs offline on smartphones and achieves top performance among similarly sized models.
ElevenLabs officially announced a $180 million Series C funding round, bringing the AI speech startup’s valuation to over $3 billion.
AI2 unveiled Tülu 3 405B, its largest open-source model to date, outperforming DeepSeek V3 and GPT-4o on select benchmarks.
U.S. AI czar David Sacks shared a report estimating DeepSeek has spent over $1 billion on computing, calling the previously reported $6 million training cost "highly misleading."
The EU activated the first phase of its AI Act, banning AI systems deemed "unacceptably risky" and introducing penalties of up to €35 million for violations.
Google’s X moonshot lab launched Heritable Agriculture, a new AI-driven company focused on accelerating plant breeding to improve crop yields using machine learning.
Microsoft AI CEO Mustafa Suleyman announced a new cross-disciplinary research unit, hiring economists, psychologists, and other experts to study AI’s societal impact.
MIT researchers unveiled ChromoGen, an AI model that predicts 3D genome structures in minutes instead of days, aiding DNA analysis and understanding disease impact.
Security researchers found an exposed DeepSeek database containing over 1 million user prompts and API key records, raising concerns over vulnerabilities and data privacy.
ARC Prize found OpenAI’s new o3-mini model successfully patched o1 on its ARC-AGI-1 Semi-Private Test Set while being 100 times cheaper.
Meta released its Frontier AI Framework, reaffirming its commitment to open-source development while addressing cybersecurity and weaponization risks.
The Beatles won a Grammy for Best Rock Performance with Now and Then, an AI-enhanced song that used noise reduction to restore an old John Lennon demo.
OpenAI expanded ChatGPT’s WhatsApp integration globally, adding support for image uploads and voice messages via the number 1-800-CHATGPT (1-800-242-8478).
UK researchers developed self-healing asphalt using biomass waste and Google Cloud’s AI, aiming to address the country’s long-standing pothole crisis.
Microsoft launched the Advanced Planning Unit (APU) within its AI division to study AI’s broader impacts on society, health, and the future of work.
Figure ended its collaboration agreement with OpenAI, hinting at a major breakthrough in end-to-end robot AI to be revealed within 30 days.
Kanye West confirmed that AI is being used on his upcoming album BULLY, likening its role in music production to that of autotune.
LiveKit introduced a new transformer model for AI voice conversations, reducing unintentional interruptions by 85% through improved end-of-turn detection.
Google released its 2024 Responsible AI Progress Report and updated its Frontier Safety Framework, adding new protocols for managing AI risks and security.
Hugging Face launched open-Deep-Research, an open-source alternative to OpenAI's Deep Research, achieving 55% accuracy on the GAIA benchmark with autonomous web navigation capabilities.
Adobe enhanced Acrobat’s AI Assistant with contract intelligence features to help users understand complex legal documents and identify key terms.
Snap unveiled a mobile-first AI text-to-image model capable of generating high-resolution images in 1.4 seconds on an iPhone 16 Pro Max, with plans to integrate it into Snapchat features.

Like newsletters?

Here are some newsletters our readers also enjoy:

How did we do today ?

Closing Thoughts

That’s it for us this week.

If you find any value from this newsletter, please pay it forward !

Thank you for being here !

Reply

or to participate.