Back to Articles
By Meghna Sinha

The Era of Specialized AI - Part III

Part III: The Misplaced Urgency With AI Agents

Whether it is AI models or AI agents, the fundamentals are the same. The AI model or agent needs to be explainable, transparent, scalable, measurable/trackable, and tested for harmful risks and biases. What we have today is a handful of semi-transparent models with some esoteric benchmark that does not shed any light on how these models will perform for a unique set of use cases. There are no guarantees on how these models, when applied in production, will scale or how they will behave over time as data and usage evolve. And there is minimal effort put in to test for biases and risks. Hence, this is a very early phase of model proliferation that is not going to lead to transformative changes just by themselves and will certainly not lead to AGI any time soon.

What excites me the most with so many open source and open weight models, is the ability to develop unified representations of business, operations, process, systems. Connecting data and models from various sources and modalities will be the real unlock in business applications and AI Agents are certainly part of this vision but are not really that useful without domain specific integration.

It takes work and deep domain understanding to apply any AI model or agent to a business process at scale and demonstrate financial and operational impact. There is no shortcut to an intelligent system/agent to scale and adapt to a domain. For example, if you want an agent to handle your checkout system at a store, you need to give it time to learn and handle all anomalies. Think of self-checkout in a store; it is simply an automated process, but think of how often the store employees have to step in to approve something on the screen or fix any user errors that could happen during self-checkout. If you want an autonomous AI agent working fully unassisted by humans, you will need to generate enormous amounts of synthetic data to test and fine-tune the model to operate within prescribed guardrails. AI scales on anomalies, not averages. Not impossible to do but also not feasible for all businesses that already lack trained data scientists and machine learning engineering talent.

So to those claiming this is the year of AI Agents, start by prioritizing evaluation and testing capabilities, measurement and trackability and most important invest time in developing deep domain understanding. I too am extremely energized by what is possible with AI Agents but I see us rushing through without developing proper testing and evaluation capabilities and that to me is misplaced urgency towards an agentic future without proper systems and methods in place to monitor, track, and scale and manage them. In this rush we are missing the unique opportunity to rethink our jobs, rethink human to machine interactions, and most importantly create new jobs that will be needed in this Agentic AI future. After all, who is all this technological progress really for if we do not bring the society we exist in along?

Thanks for reading Meghna's Substack! Subscribe for free to receive new posts and support my work.