Why most LLM integration projects fail

2026-02-02 — Software Engineering, Product Management, Large Language Model — 5 min read

AI, and large language models (LLMs) in particular, are driving a must-have feature craze in the software industry. Many companies are currently seeking to retrofit their products with some form of LLM integration, allowing users to interact with the product via natural-language prompts. In some cases, this works very well, especially when the use case itself is language-based and therefore a natural fit for large language models.

A common example is customer support. LLMs that help users navigate help-centre databases to figure out how to achieve certain outcomes with a product have been implemented with great success.

Encouraged by these wins, engineers and product managers alike are eager to connect LLM-based agents to the existing APIs of SaaS applications, hoping to unlock smart automation and productivity gains. A common approach is to build an MCP server that exposes existing REST-style endpoints, with the expectation that this will materialise quick wins.

While such integrations are often implemented quickly, the results are almost always disappointing once the initial excitement, driven by a few successful tests, wears off. The success rate of prompts yielding what the user actually intended, as well as the repeatability of those results, is often low. Long wait times caused by retry loops, where the LLM struggles to submit valid API requests, lead to a poor user experience. Even when requests succeed, hallucinations frequently make the feature less than useful.

Depending on organisational culture, some teams choose to ship anyway, eroding user trust and the perceived quality of their product just to be able to put an "AI button" into the UI and marketing material, often styled in some pinkish hue, of course. Product organisations with a strong focus on quality quickly arrive at the conclusion that features built like this do more harm than good; beyond simple demonstrations, they simply do not work well.

The exact number of failed LLM integration projects is unknown, but it is likely very large. Many of these projects are doomed from the start, usually due to a combination of limited experience, missing skills, and an overly optimistic view of what LLMs are capable of.

In this article, I will focus on three categories of issues that typically arise.

1. Your APIs are unsuitable for LLM interoperability

The success rate of turning prompts into API requests is directly related to the fit between the user's language and your API. APIs are often built around technical, or even internal, terms and keywords that users would never use themselves.

In addition, most existing APIs are designed to support web or mobile interaction patterns. They expose resources, endpoints, and actions that are tied together via rigid workflows implemented in a graphical user interface.

Every step in a GET or POST workflow is a potential weak link. Breakages are common and result in brittle interactions and leftover artefacts. While retry mechanisms can mitigate some of these issues, assuming idempotency is implemented correctly, the user experience suffers and costs caused by token churn spiral quickly.

Consider the following example:

"Create a promotion of 20% for Valentine's Day and attach it to all products belonging to the apparel and fr category."

For this workflow to succeed, the LLM has to:

Identify the correct endpoints to use and determine the correct sequence, for example promotions and product catalogue
Execute a series of information-fetching requests to identify the relevant products
Correctly use foreign keys, such as the promotion ID returned by a previous request, when attaching a promotion to a product or category
Potentially iterate over every individual product if no bulk-action endpoint exists, resulting in slow execution
Resolve semantic ambiguities. For example, if products can belong to multiple categories, is the above prompt meant as an AND or an OR condition?

Even seemingly simple requests can explode into fragile, multi-step orchestration problems.

2. The user lacks transparency into the actions of the LLM

Prompt-based interaction can be effective for simple actions and can lower the accessibility ceiling of complex applications. However, the real value of LLMs lies in more complex, multi-step workflows.

At the same time, mutating a large number of entities in an application can be intimidating due to the inherent risk of things going wrong. When users lack visibility into what the LLM is about to do, or what it has already done, trust quickly erodes.

LLMs will make mistakes. If those mistakes cannot be rectified easily, either by rolling back an entire transaction or via bulk-edit actions, the user is left with tedious manual cleanup work. This is not only frustrating, it also negates any potential time or efficiency gains promised by automation.

In information-gathering use cases, the problem manifests differently. For a prompt such as:

"What was my total revenue yesterday at my Newmarket location?"

The user should be able to inspect the data used to compute the result. Without this transparency, answers remain unverifiable, and trust remains low.

3. Security and data protection issues

LLMs that interact with user-generated content, names, descriptions, and notes are inherently exposed to prompt-injection attacks. These vulnerabilities can be exploited to leak personally identifiable information (PII) or internal data, especially when the same agent is capable of communicating with the outside world, such as sending emails.

Due to the additional attack surface LLMs introduce, users should be able to decide which information the model has access to, which entities, which fields, and which operations. In many cases, this implies that the LLM must be authenticated using a different account than the user themselves.

However, almost no application is designed with a fine-grained, end-to-end access-control system. Transitive or computed information does not automatically inherit the most restrictive permissions of its underlying data sources.

Even when roles and permissions exist, the LLM may still need awareness of information it cannot directly access in order to provide a satisfying user experience. Missing permissions naturally result in an inability to fulfil certain requests, but without clear user-facing explanations, such as "I am sorry, I cannot access this information", these failures appear arbitrary and confusing.

What are the solutions?

If many LLM integration efforts fail, it is rarely because LLMs themselves are fundamentally flawed. Instead, failure is usually rooted in APIs, workflows, security models, and product assumptions that were never designed for intent-driven, probabilistic systems in the first place.

In the next article, we will explore how to design products, APIs, and user experiences that allow LLM integrations to be reliable, auditable, secure, and actually useful.