The binary vs nonbinary task distinction you draw here is really sharp and explains so much about why agents haven't taken off yet. Most companies are throwing agents at nonbinary tasks where they can be helpful but still need oversight, while users expect agents to nail binary tasks like booking flights or scheduling meetings, which demands near perfection. What's intresting is that China's willingness to ship imperfect agents might actually create a feedback loop where rapid iteration leads to faster improvement, even if the intial experience is rougher. Your radiologist example perfecty captures how AI can make experts more valuable rather than replacing them
The binary vs nonbinary distinction is right, but the missing layer is governance. Most US apps aren’t designed to be “operated” by machine agents. The whole software stack assumes a human in the loop.
China is testing the opposite: OS-level frameworks where agents are first-class citizens. If the environment becomes machine-legible by default, the binary/nonbinary boundary shifts dramatically.
The agent revolution might not come from better agents but from re-architecting the environment to meet them halfway.
Hey, great read as always. You really hit the nail on the head with the gap between benchmark performance and real-world agentic adoption. I'm curious if you think the 'mediocre performance' of agents like Comet is more about foundational models limitations on complex, unstructured tasks, or if it's an integration/user experience challenge that's harder to solve?
The binary vs nonbinary task distinction you draw here is really sharp and explains so much about why agents haven't taken off yet. Most companies are throwing agents at nonbinary tasks where they can be helpful but still need oversight, while users expect agents to nail binary tasks like booking flights or scheduling meetings, which demands near perfection. What's intresting is that China's willingness to ship imperfect agents might actually create a feedback loop where rapid iteration leads to faster improvement, even if the intial experience is rougher. Your radiologist example perfecty captures how AI can make experts more valuable rather than replacing them
Yes figuring out which "last mile" problems to solve and which to ignore is the challenge ahead.
The binary vs nonbinary distinction is right, but the missing layer is governance. Most US apps aren’t designed to be “operated” by machine agents. The whole software stack assumes a human in the loop.
China is testing the opposite: OS-level frameworks where agents are first-class citizens. If the environment becomes machine-legible by default, the binary/nonbinary boundary shifts dramatically.
The agent revolution might not come from better agents but from re-architecting the environment to meet them halfway.
I’ve heard a Hangzhou-based tech reviewer friend say some amazing things about the Doubao phone, so I’m really curious about it...
And this is such a comprehensive article, with so much clarity. Thank you for the piece, Kyle.
Hey, great read as always. You really hit the nail on the head with the gap between benchmark performance and real-world agentic adoption. I'm curious if you think the 'mediocre performance' of agents like Comet is more about foundational models limitations on complex, unstructured tasks, or if it's an integration/user experience challenge that's harder to solve?