Unpacking Agentic AI

January 13, 2025

technology

Joannes Vermorel

Generative AI is dead. Long live Agentic AI…maybe.

A vector robot in a 60s-style suit stands in front of a computer tablet.

Many software vendors, encouraged by not-entirely-reasonable market valuations, are doubling down on the hype around artificial intelligence. I am not usually in the business of making predictions, but I prophesize that in 2025, agentic artificial intelligence will be a major buzzword. As is customary with tech buzzwords, one can expect tidbits of actual novelty diluted within an ocean of inflated expectations.

Let’s start by clarifying a bit about what is at stake. Oversimplifying a bit¹, LLMs (large language models) are, at their core, text completion models. They take raw text as input and generate raw text as output. As those models are cleverly pre-trained over “teranormous” amounts of web materials, they can readily be used for a large variety of tasks (e.g., translation, summarization, idea generation, etc.). LLMs have, in fact, made obsolete the entire prior field of NLP (natural language processing).

Considering the current performance and price point of LLMs, it is evident that this technology has the potential to deliver a lot of value add for any business employing white collars. The fine print is less evident, however. Here, agentic AI (or, more accurately, its vendors) proposes to bridge the gap between the raw capabilities of LLMs and their IT environments.

Regarding the specifics, Erik Pounds² (Nvidia) proposed in October 2024 the following definition for agentic AI which, I believe, aptly captures what is generally understood under this new buzzword rally flag:

Agentic AI uses a four-step process for problem-solving: Perceive: AI agents gather and process data from various sources […]; Reason: A large language model acts as the orchestrator. This step uses techniques like retrieval-augmented generation (RAG) […]; Act: By integrating with external tools and software via application programming interfaces, agentic AI can quickly execute tasks […]; Learn: Agentic AI continuously improves through a feedback loop, or “data flywheel” […]

The grand vision of agentic AI is that it paves the way for a “fully digital employee” (my term, not Pounds’) functionally equivalent to a white-collar worker. With, give or take, about one billion white-collar workers worldwide, it’s not too difficult to understand why markets seem to be losing their heads over this perspective.

Upon closer inspection, we see that there are two, sharply distinct, fundamental hurdles that the agentic AI is trying to address: instrumentation and learning.

Instrumentation: The first, most obvious, hurdle is that the LLM can’t be leveraged in a vacuum. LLMs are software and, thus, IT plumbing of some kind is needed. This plumbing ensures that the LLM can retrieve relevant information from its environment, and output commands—intended to complete whatever is expected from the LLM. For IT departments—usually already drowning in years of backlog—devising this plumbing is a challenge of its own. However, LLMs themselves may alleviate the challenge.

Learning: As strange as it may appear, LLMs, for the most part, do not learn anything ever after their inception. This is our second hurdle. All the LLM ever knows is either public information (hence part of the pretraining) or part of the prompt. There is almost³ no in-between. After each completion, the LLM is reset to its original state. However, if the knowledgebase supporting the prompt could be updated by the LLM itself, then, this hurdle could be conceptually alleviated as well.

If agentic AI were to solve these two hurdles—without resorting to LLMs beyond the ones we have now—then it would indeed pave the way for generic digital white-collar workers . This is, however, a very bold proposition, and despite the market enthusiasm, addressing the aforementioned hurdles may involve considerable efforts.

On the instrumentation front, the proposition of having a digital agent directly interacting with screen and keyboard—like a human would—is attractive, primarily as it seemingly side-steps entirely the IT plumbing challenges mentioned earlier. However, it is also the most monumentally over-engineered way to solve the challenge. To perceive the graphical user interface, dozens (hundreds?) of screen captures will have to be piped into the LLM even for the simplest interaction. In order to act, dozens (hundreds?) of commands—e.g., mouse commands—will also have to be issued.

While I do not doubt that such a feat is already possible with present-day LLMs, I do question the practicality and maintainability of this approach. While the processing of the images themselves represents a massive overhead in computing resources, this is not the true showstopper (given the progress of computer hardware will probably, over time, make this overhead much lower than the cost of a full-time employee).

The crux of the problem is this: unambiguously spelling out (via prompts) every minute aspect of the interaction(s) with the business apps required to carry out a task is a considerable effort. This is an effort that requires, at minimum, decent IT skills—if not a well-developed IT mindset. I very much doubt that this task can be accomplished by someone otherwise incapable of programming—or incapable of becoming an entry-level programmer within a few months. Moreover, as the IT landscape of any sizeable company is constantly changing, the adequacy of the prompts will have to be monitored. Furthermore, the prompts themselves will have to be routinely updated. Thus, this effort will be an ongoing one.

Is agentic AI going to truly alleviate the need for human digital talent—i.e., the IT backlog problem—considering that it comes with its own considerable requirements of human digital talent? I do not think so. This brings us back to the starting point which is that if human digital talent has to be brought in, then let’s use this talent to frontally address the IT plumbing itself.

By exposing the raw relevant data (typically relational in nature) to the LLM (instead of channeling everything through the graphical user interface) the prompts themselves can be expected to be simplified by orders of magnitude. 5-line SQL queries should be expected to replace 5-page prompts. Moreover, the human operator could even be assisted by the LLM when it comes to writing those SQL queries.

Naturally, juggling SQL queries—possibly performed against multiple, heterogeneous databases—requires instrumentation. Yet, this sort of instrumentation is vastly simpler than the one envisioned by agentic AI. It is so simple that, in fact, many IT departments will probably roll out tools of their own making for this very purpose—as they routinely do for minor utilities.

In time, software vendors themselves will probably adjust their own products to facilitate this sort of LLM-driven plumbing, although it’s not entirely clear which form it will take (doubling down on APIs is one option, text-based interfaces are another).

On the learning front, I am skeptical. Agentic AI is presented as a step toward general artificial intelligence, addressing one of the most fundamental limitations of LLMs: the lack of genuine learning capabilities. Yet, Pounds’ proposed solution—a “data flywheel” powered by retrieval augmented generation (RAG)—is nothing but an easy hack overlaid on top an otherwise impressive piece of technology (the LLM itself).

It is conceivable to have the LLM issue commands to incrementally enrich and update its own “data flywheel.” It is also conceivable that the LLM could generate its own fine-tuning dataset by collapsing N-shot attempts into 1-shot attempts, and then issuing a command to trigger a fine-tuning phase.

However, it’s not clear that LLMs—as they currently exist—represent a viable path toward such a feat. I strongly suspect that maintaining a healthy flywheel over time may prove challenging, and that this maintenance will—assuming it works at all—require a substantial amount of very human technically-inclined intelligence.

Here, we are touching on a fundamental limitation of the LLM paradigm as it presently exists. It’s unclear that this limitation can be lifted by merely adding stuff on top of LLMs. My gut feeling is that addressing this limitation will require rethinking the LLMs themselves. It might be a relatively minor change, as chain-of-thought turned out to be—or require a complete revamp of the whole thing⁴.

Overall, while I remain enthusiastic about LLMs, I am not convinced that the hype around their spin-off, agentic AI, is warranted. I have little doubt that companies will roll out “agents” to mechanize various tasks—just as my own company, Lokad, has been doing for the last two years. However, if anything, this process has made us even more dependent on a talented, tech-savvy workforce. Furthermore, looking at those initiatives, the “agentic” parts were always the most pedestrian pieces. We struggled, and occasionally failed, at putting LLM-powered pieces into production, but the “agentic” aspect was, at best, a very distant concern.

Present-day LLMs operate over tokens, not Unicode characters, although this constraint might be lifted in the future. LLMs can also process input images, if said images are linearized (embedded) within the latent space of the context window. ↩︎
Curious readers are invited to check the source material at https://blogs.nvidia.com/blog/what-is-agentic-ai ↩︎
Fine-tuning is the process of taking a pre-trained model and continuing its training on a specialized dataset or for a specific task, thereby adapting the model based on private information. Fine-tuning, however, relies on the availability of a high-quality corpus, i.e., manual contributions from experts. ↩︎
The o1 model released by OpenAI in December 2024 elevates the chain-of-thought technique as a first-class citizen, letting the LLM start with an inner monologue discussing the prompt before jumping into producing the final completion. This relatively modest variation on the existing LLMs is nevertheless delivering substantial improvements for certain classes of tasks, such as mathematics and programming. ↩︎

Back to blog ›

Unpacking Agentic AI

More Posts