AI agents and the 90% problem

Dec 5, 2025

Useful AI agents are still limited by their reliability on real-world tasks. But Chinese AI agents are pushing the limit.

Read →

5 Comments

Neural Foundry

Dec 5

The binary vs nonbinary task distinction you draw here is really sharp and explains so much about why agents haven't taken off yet. Most companies are throwing agents at nonbinary tasks where they can be helpful but still need oversight, while users expect agents to nail binary tasks like booking flights or scheduling meetings, which demands near perfection. What's intresting is that China's willingness to ship imperfect agents might actually create a feedback loop where rapid iteration leads to faster improvement, even if the intial experience is rougher. Your radiologist example perfecty captures how AI can make experts more valuable rather than replacing them

Hollis Robbins

Dec 8

Yes figuring out which "last mile" problems to solve and which to ignore is the challenge ahead.

Synthetic Civilization

Dec 8

The binary vs nonbinary distinction is right, but the missing layer is governance. Most US apps aren’t designed to be “operated” by machine agents. The whole software stack assumes a human in the loop.

China is testing the opposite: OS-level frameworks where agents are first-class citizens. If the environment becomes machine-legible by default, the binary/nonbinary boundary shifts dramatically.

The agent revolution might not come from better agents but from re-architecting the environment to meet them halfway.

afra

Dec 17

I’ve heard a Hangzhou-based tech reviewer friend say some amazing things about the Doubao phone, so I’m really curious about it...

And this is such a comprehensive article, with so much clarity. Thank you for the piece, Kyle.

Rainbow Roxy

Dec 14

Hey, great read as always. You really hit the nail on the head with the gap between benchmark performance and real-world agentic adoption. I'm curious if you think the 'mediocre performance' of agents like Comet is more about foundational models limitations on complex, unstructured tasks, or if it's an integration/user experience challenge that's harder to solve?