2026-02-18

Introducing Faultline

How pasting screenshots into Claude during incidents led us to build an AI SRE agent.

We built an internal AI agent to help us debug infrastructure issues at Chatwoot. It got useful enough that we decided to open source it.

The problem

We had a string of incidents over the last few weeks. In most cases, the alerting should have caught it earlier. We have been improving that. But the bigger issue was the investigation itself.

A typical example: DB CPU spikes to 94% on RDS. The on-call engineer gets paged. Now they need to figure out why. The checklist looks something like this:

  1. Open CloudWatch. Check CPUUtilization, ReadIOPS, WriteIOPS.
  2. Jump to RDS Performance Insights. Find the top wait event. Identify the offending query.
  3. Check if this query existed before or was recently introduced. git log, git blame.
  4. Open Sentry. Are users seeing errors? Which endpoints? Since when?
  5. Open New Relic. Check transaction response times, throughput, error rate. Look at the deployment markers — did something go out recently?
  6. Cross-reference timestamps across all of this.

That is six tools, ten tabs, and about 30 minutes before you have the full picture. And this is a well-understood scenario. A misconfigured load balancer or a networking issue across VPCs is worse because you do not even know which tool to open first.

How we started using AI for this

We started pasting screenshots from New Relic and Sentry into Claude during incidents. Just screenshots — not structured data, not API responses, just what was on screen. Claude would read the graphs, correlate the response time spike with a deploy 20 minutes prior, notice the query pattern in the Sentry trace, and point to the commit.

We did this for a few weeks. It kept working. We started leaning on it during every incident.

The realization was simple: the investigation pattern is the same every time. The tools are the same. The sequence of checks is the same. We were acting as the middleware — copying data from one tool, pasting it into another, asking the same questions. An agent with API access to these tools could skip all of that.

How it works

The agent connects to your tools via read-only API credentials. Currently supported:

  • PagerDuty — incidents, timelines, on-call schedules, escalations
  • New Relic — transactions, error rates, Apdex, NRQL queries
  • Sentry — issues, stack traces, release correlation, affected user counts
  • AWS — RDS Performance Insights, CloudWatch metrics
  • GitHub — blame, commit history, recent deploys, PR diffs

When you ask a question, the agent decides which tools to call and cross-references the responses. Under the hood it uses OpenAI's tool-calling — each integration is exposed as a set of tools the model can invoke. A single investigation typically makes 25-30 API calls across integrations.

Starting with read-only mode — it can query everything but change nothing. As we get more confident, we want to suggest actions with human approval before anything runs.

A note on the UX

Most AI tools in this space treat debugging as a single-player activity. You ask a question and get an answer, or you get dumped with a wall of information upfront. That works when you are debugging alone.

Incidents are not like that. One person knows the deploy history. Another knows the customer whose workload changed. Another fixed a similar issue last month. That context lives in people's heads, and a chat between one person and an AI misses all of it.

We wanted two things that we could not find in existing tools:

Private notes — team members can talk to each other inside the same investigation thread. The agent does not see these. An engineer writes "last time this happened it was the nightly sync job." Another confirms. Then someone points the agent in that direction. It re-investigates with the new context.

Visible tool calls — every API call the agent makes is shown in the thread. "Queried CloudWatch — CPUUtilization for RDS", "Queried GitHub — blame on handler.py:42". If it is looking at the wrong resource or the wrong time window, you see it and redirect.

Where we see this going

Proactive investigation. Right now, someone has to ask a question. We want the agent to start investigating the moment a PagerDuty alert fires — pull the initial context, check the obvious things, and have a first-pass ready before the on-call engineer even opens a browser.

Parallel sub-agents. Two engineers might have different theories — one thinks it is the database, the other thinks it is a recent deploy. Today the agent can only chase one at a time. We want it to investigate both in parallel and bring the results together.

Deeper service catalog. We already generate dependency graphs automatically. We want to go further — traffic flow, ownership, runbooks — so the agent has richer context before an investigation even starts.

More integrations. Datadog, GCP, Azure, GitLab are next.

Why open source

It needs your AWS keys, your API tokens, access to your codebase. You should be able to read every line of code that touches those credentials. Self-hosted, MIT licensed. We built Chatwoot the same way and it worked, so we are doing it again.

The core product will always be open source. If enough teams adopt it and need things like SSO, audit logs, or role-based access, we will build an enterprise edition around it.

Current state

We are running it on our own incidents at Chatwoot. It works. It finds the right resources, pulls the right metrics, and gives us a starting point for the investigation. It is not always right — sometimes it goes down the wrong path, sometimes it misses context that an experienced engineer would catch. That is expected. The conversational structure exists specifically so you can correct it and steer.

The stack: Python agent with OpenAI for reasoning, Vue.js frontend, Rails backend with PostgreSQL. Setup is: clone the repo, add your API keys, deploy.

It is early. We are working on it every day. If you try it out, open an issue or reach out — we read everything.

— The Chatwoot team