The Problem at a 10,000ft View

feature-image

It’s 2025…LLMs are all the rage. OpenAI, Anthropic, Google, and Mistral are all building & training LLMs that companies are leveraging to bring AI into their products. Code is being shipped quicker due to GitHub CoPilot, Cursor, Windsurf, and Aider being more available to devs.

While all of this is happening, tools like DataDog, Dynatrace, NewRelic, Sentry, and many observability tools stayed stagnant. But…customer issues are growing at a rapid rate in-sync with the code velocity.

What happened? Simply put – these traditional observability tools were so behind and they lack core data models that can enable key functionality modern managers want. Modern managers want to quickly triage and fix bugs – gathering the 5 W’s surrounding a bug:

  • Who was using the product?
  • What were they doing?
  • What did they expect to happen -vs- what happened?
  • What device/platform/code revision were they on?
  • What code sequence/API endpoint/etc did the user experience/run through?

But modern managers need this information to properly triage bugs. Often, managers lack data to draw insights on how many customers a bug is impacting. A manager has to make a stab in the dark – making educated guesses as to how big the impact a customer is, or they waste countless hours for themselves or data analysts to get them the data around how big an issue is.

So – while the manager is gathering information and triaging a bug, they are gathering data for engineers who will, most likely, throw out this data. See, engineers love their own tools and hate tools made for CS (customer support) and PMs (project, product, and program managers) – they know observability tools. But these observability tools are clunky. Devs spend hours debugging issues because the platforms aren’t organized, observability tools lack pointed insights, and these tools have not done a great job with code explosion or microservices.

And then, before the data gets to the PM, customer support is trying, but failing, to gather the right data so downstream teams can do their work to fix the issue. Yet, most of what they collect is also thrown out! The sequence is a game of telephone that, while it’s happening, your customer is getting more frustrated because their issue isn’t getting fixed.

Once a code fix is deployed, the issue isn’t over! Often, engineers forget to relay when the codefix is released – so internal stakeholders lack insights to if/when an issue is fixed. Simply put – the entire bug fixing ecosystem is extremely disorganized, heavily-fragmented, and all dev tools lack the data backbones to discover issues quickly. It’s not something an LLM can fix – you need to know the 5W’s, along with the event timeline for things like database object manipulations, the various system states, and many other things to solve all of these issues relatively fast.

It’s a complex problem – and it’s just a piece of Sailfish AI’s $1T Secret Plan…