By Deepa Seetharaman, Supantha Mukherjee and Krystal Hu
SAN FRANCISCO/STOCKHOLM, Dec 16 – Last spring, CellarTracker, a wine-collection app, built an AI-powered sommelier to make unvarnished wine recommendations based on a person’s palate. The problem was the chatbot was too nice.
“It’s just very polite, instead of just saying, ‘It’s really unlikely you’ll like the wine,’” CellarTracker CEO Eric LeVine said. It took six weeks of trial and error to coax the chatbot into offering an honest appraisal before the feature was launched.
Since ChatGPT exploded three years ago, companies big and small have leapt at the chance to adopt generative artificial intelligence and stuff it into as many products as possible. But so far, the vast majority of businesses are struggling to realize a meaningful return on their AI investments, according to company executives, advisors and the results of seven recent executive and worker surveys.
One survey of 1,576 executives conducted during the second quarter by research and advisory firm Forrester Research showed just 15% of respondents saw profit margins improve due to AI over the last year. Consulting firm BCG found that only 5% of 1,250 executives surveyed between May and mid-July saw widespread value from AI.
Executives say they still believe generative AI will eventually transform their businesses, but they are reconsidering how quickly that will happen within their organizations. Forrester predicts that in 2026 companies will delay about 25% of their planned AI spending by a year.
“The tech companies who have built this technology have spun this tale that this is all going to change quickly,” Forrester analyst Brian Hopkins said. “But we humans don’t change that fast.”
AI companies including OpenAI, Anthropic and Google are all doubling down on courting business customers in the next year. During a recent lunch with media editors in New York, OpenAI CEO Sam Altman said developing AI systems for companies could be a $100 billion market.
All this is happening against the backdrop of unprecedented tech investment in everything from chips, to data centers, to energy sources.
Whether these investments can be justified will be determined by companies’ ability to figure out how to use AI to boost revenue, fatten margins or speed innovation. Failing that, the infrastructure build-out could trigger the kind of crash reminiscent of the dot-com bust in the early 2000s, some experts say.
THE ‘EASY’ BUTTON
Soon after ChatGPT’s launch, companies worldwide created task forces dedicated to finding ways to embrace generative AI, a type of AI that can create original content like essays, software code and images through text prompts.
One well-known issue with AI models is their tendency to please the user. This bias – what’s called “sycophancy” – encourages users to chat more, but can impair the model’s ability to give better advice.
CellarTracker ran into this problem with its wine-recommendation feature, built on top of OpenAI’s technology, CEO LeVine said. The chatbot performed well enough when asked for general recommendations. But when asked about specific vintages, the chatbot remained positive – even if all signals showed a person was highly unlikely to enjoy them.
“We had to bend over backwards to get the models (any model) to be critical and suggest there are wines I might not like,” LeVine said.
Part of the solution was designing prompts that gave the model permission to say no.
Companies have also struggled with AI’s lack of consistency.
Jeremy Nielsen, general manager at North American railroad service provider Cando Rail and Terminals, said the company recently tested an AI chatbot for employees to study internal safety reports and training materials.
But Cando ran into a surprising stumbling block: the models couldn’t consistently and correctly summarize the Canadian Rail Operating Rules, a roughly 100-page document that lays out the safety standards for the industry.
Sometimes the models forgot or misinterpreted the rules; other times they invented them from whole cloth. AI researchers say models often struggle to recall what appears in the middle of a long document.
Cando has dropped the project for now, but is testing other ideas. So far the company has spent $300,000 on developing AI products.
“We all thought it’d be the easy button,” Nielsen said. “And that’s just not what happened.”
HUMANS MAKE A COMEBACK
Human-staffed call centers and customer service were supposed to be heavily disrupted by AI, but companies quickly learned there are limits to the amount of human interaction that can be delegated to chatbots.
In early 2024, Swedish payments company Klarna rolled out an OpenAI-powered customer service agent that it said could do the work of 700 full-time customer service agents.
In 2025, however, CEO Sebastian Siemiathowski was forced to dial that back and acknowledge that some customers preferred to talk with humans.
Siemiathowski said AI is reliable on simple tasks and can now do the work of about 850 agents, but more complex issues quickly get referred to human agents.
For 2026, Klarna is focused on building its second-generation AI chatbot, which it hopes to ship soon, but human beings will remain a big part of the mix.
“If you want to stay customer-obsessed, you can’t rely [entirely] on AI,” he said.
Similarly, U.S. telecommunications giant Verizon is leaning back into human customer service agents in 2026 after attempts to delegate calls to AI.
“I think 40% of consumers like the idea of still talking to a human, and they’re frustrated that they can’t get to a human agent,” said Ivan Berg, who leads Verizon’s AI-driven efforts to enhance service operations for business customers, in a Reuters interview this fall.
The company, which has about 2,000 frontline customer service agents, still uses AI to screen calls, get information on customers, and direct them to either self-service systems or to human agents.
Using AI to handle routine questions frees up agents to handle complex issues and try new things, such as making outbound calls and doing sales.
“Empathy is probably the key thing that’s holding us from having AI agents talk to customers holistically right now,” Berg said.
Shashi Upadhyay, president of product, engineering and AI at customer-service platform Zendesk, says AI excels in three areas: writing, coding and chatting. Zendesk’s clients rely on generative AI to handle between 50% and 80% of their customer-support requests. But, he said, the idea that generative AI can do everything is “oversold.”
THE ‘JAGGED FRONTIER’
Large language models are rapidly conquering complex tasks in math and coding, but can still fail at comparatively trivial tasks. Researchers call this contradiction in capabilities the “jagged frontier” of AI.
“It might be a Ferrari in math but a donkey at putting things in your calendar,” said Anastasios Angelopoulos, the CEO and cofounder of LMArena, a popular benchmarking tool.
Seemingly small issues can unexpectedly trip up AI systems.
Many financial firms rely on data compiled from a broad range of sources, all of which can be formatted very differently. These differences might prompt an AI tool to “read patterns that don’t exist,” said Clark Shafer, director at advisory firm Alpha Financial Markets Consulting.
Many companies are now looking into the potentially expensive, lengthy and complex process of reformatting their data to take advantage of AI, Shafer said.
Dutch technology investment group Prosus says one of its in-house AI agents is meant to answer questions about its portfolio, similar to what the group’s data analysts on staff already do.
Theoretically, an employee could ask how often a Prosus-backed food-delivery firm was late to deliver sushi orders in Berlin last week.
But for now, the tool doesn’t always understand what neighborhoods are part of Berlin or what “last week” means, said Euro Beinat, head of AI for Prosus.
“People thought AI was magic. It’s not magic,” Beinat said. “There’s a lot of knowledge that needs to be encoded in these tools to work well.”
MORE HANDHOLDING
OpenAI is working on a new product for businesses and recently created internal teams, such as the Forward Deployed Engineering team, to work directly with clients to help them use OpenAI’s technology to tackle specific problems, a spokesperson said.
“Where we do see failure is people that jump in too big, they find that billion-dollar problem—that’s going to take a few years,” said Ashley Kramer, OpenAI’s head of revenue, during an onstage interview at Reuters Momentum AI conference in November.
Specifically, OpenAI is working with companies to find areas where AI can have a “high impact but maybe low lift at first,” said Kramer.
Rival AI lab Anthropic, which draws 80% of its revenue from business customers, is hiring “applied AI” experts who will embed with companies.
For AI companies to succeed, they will have to view themselves as “partners and educators, rather than just deployers of technology,” said Mike Krieger, Anthropic’s head of product, in an interview earlier this year.
An increasing number of startups, many founded by former OpenAI employees, are developing AI tools for specific sectors such as financial services or legal. These founders say companies will benefit from specialized models more than general-purpose or consumer tools like ChatGPT.
It’s a playbook that Writer, a San Francisco–based AI application startup, has been adopting. The company, which is now building AI agents for finance and marketing teams at large firms such as Vanguard and Prudential, puts its engineers on calls directly with clients to understand their workflows and co-build the agents.
“Companies need more handholding in actually making AI tools useful for them,” said May Habib, CEO of Writer.
(Reporting by Deepa Seetharaman and Krystal Hu in San Francisco and Supantha Mukherjee in Stockholm. Editing by Kenneth Li and Michael Learmonth.)