Most LLM Apps Are Bullshit, And That’s OK

Peter Gao
Sep 6, 2023

This past month, I’ve been hitting up the AI meetup circuit as we look to market our new ML product, Tidepool. Tidepool is a product analytics tool for LLM text interfaces, and it’s really most useful for companies building LLM products that already have users. (If you don’t have users submitting queries, there’s nothing for us to analyze).

So I’ve been going around to demo days and hackathons, cocktail hours and happy hours, all over the Bay Area, trying to drum up potential customers. And what I’m finding is that… no one has any users.

In fact, the majority of attendees at these AI events don’t even have a product – they’re there for the vibes and to speculate on AI or what can be built with AI. Fewer still have products with any real traction. Even among current YC startups, where generative AI companies make up 80% of the batch, around 60% are still pivoting just weeks before Demo Day. After the initial rush of euphoria and investment, people are finding that it’s a lot harder to make AI work than you think.

Part of the issue is that AI startups face fierce competition from incumbents who can match their capabilities. Unlike other technologies, generative AI is relatively easy for established companies to integrate. Without a technical moat, network effects and branding win. Why use an unproven “AI-first” customer service tool when trusted brands like Zendesk offer the same?

But it’s not easy for more established companies either. Over the past month, I’ve probably talked to 10-15 of these companies, each frantically trying to come up with a coherent AI initiative. These AI initiatives usually involve trying five or six different things, of which only 1or 2 end up being successful.

It’s tempting at moments like this to dismiss generative AI as overhyped and any LLM app as valueless.

I’m personally less cynical. Some of this is just a symptom of a thriving ecosystem with a lot of interest. Most LLM apps are bullshit, and that’s ok. If we look back at the early days of mobile, everyone and their mother was out launching an app. Remember the I am Rich app? Most were completely useless, but there were a few that made billions of dollars and completely changed the way we interact with the world.

We’re in a similar “app store” moment for AI. A combination of a red-hot AI market and a cooling market for every other kind of startup mean that engineers have every incentive to participate.

At the end of the day, most LLM apps will die. But some will survive and carve out a niche.

Some further takeaways from my month attending AI meetups:

Only a small proportion of things are working, but the things that are working are working really well

While only a small proportion of AI/LLM applications being developed today are actually delivering significant value, the applications that do work are working extremely well. For example, Ironclad, a legal contracts software platform, has seen 60% of its customers adopt its AI features.

Perhaps even more compellingly, for startups that do have an AI feature that is working well, people are coming to them BECAUSE of their AI feature. Superpowered was originally a vitamin at best. But its AI-powered meeting notes feature, which transcribes meetings and pulls out action items and bullet points, has made its product 5x more valuable and allowed their team to charge significantly more for it.

Try to start with a problem rather than a solution

When developing AI applications, it’s important to start with identifying a real problem or need, rather than just applying AI technology for its own sake. The most successful AI applications are enhancing existing products that already have strong product-market fit. AI then acts as an accelerator.

A great example of this is Tome, the slide deck creator. Tome existed before gen AI, and had racked up a loyal following as an alternative to Google Slides. Then they put the product on steroids by adding AI generation of imagery and text.

To draw an analogy again to the mobile era, the most successful mobile apps were always tapping into extant demand. Instagram would have never been possible without smartphones, but at the end of the day, it was tapping into an existing desire to take pictures and share them with your friends.

Try to have differentiation in your pitch

One of the problems right now is that even in areas that do have product-market fit, some of the market pull is getting drowned out by the simple fact that there are SO MANY companies trying to sell the same thing.

For example, customer support is a real problem. AI chat for support is a real, value-add solution. But currently, if you actually need a chat-for-support product, you don’t even know who to talk to because there are 50 startups claiming to do this reaching out to you on LinkedIn. There’s a paralysis of choice.

If you’re building a gen AI product right now, it behooves you to pick a greenfield subject, something that you truly could not do before the development of these models. As an analogy, the mobile apps that were ultimately the most impactful were apps like Lyft and Spotify that leveraged smartphones’ GPS and network connectivity to enable a new experience.

Certain use cases work better with Gen AI than others

If you’re an existing company with users, looking to incorporate AI into your product, some key use cases seem to generate value consistently:

Content generation for copy, descriptions: automating text content creation like copywriting, product descriptions, or API documentation. Since this was previously a purely manual task, the bar for improvement is low.

Conversational interfaces for complex products: many data analytics platforms like Looker have intricate UIs with steep learning curves. By adding an AI layer that lets users describe what they want in plain English, these tools become far more intuitive. Rather than navigating nested menus and constructing cryptic SQL queries, you can simply say “Show me a chart of weekly active users in Europe over the past year.”

Classifiers and traditional NLP that were previously inaccessible: enabling companies to tackle traditional NLP tasks like classification, entity extraction, summarization without needing to build large ML engineering teams. Models like GPT-4 and Claude can handle text analysis and synthesis challenges fairly well with zero-shot prompting. This makes it feasible for companies to implement features like sentiment analysis, smart tagging, and text summarization that would have required significant data annotation and model training efforts previously.


In short, this is a period of extreme experimentation, and failures will far outnumber successes. But to dismiss all LLM applications as BS risks missing the golden nuggets hidden in the dirt.