How to Audit Technical Debt in a SaaS Codebase

Most technical debt content is either too academic ("debt is the cost of future change") or too generic ("review your architecture quarterly"). Neither helps when you're a CTO who knows the codebase has problems and needs to communicate their severity to a non-technical CEO or board.

This post is a practical audit framework. Specific signals to measure, specific thresholds for each, and a triage model that distinguishes between debt you fix this sprint, debt you schedule, and debt you accept as a cost of doing business.

Why Audits Fail

The typical technical debt audit ends with a Google Doc full of findings that nobody acts on. The reason is usually one of two things: the findings aren't prioritized (a list of 47 issues is not an action plan), or they aren't connected to business outcomes (engineers care about the clean code argument; founders care about the revenue and risk argument).

A useful audit produces a short list of high-impact items tied to measurable outcomes — slower feature velocity, incident risk, customer-facing latency — and a clear triage recommendation for each.

What to Measure

Query Performance

Pull your P99 response times from your APM tool (Datadog, New Relic, Sentry, or query logs). The threshold: P99 above 500ms on any endpoint in your critical path is actionable debt, not aspirational improvement. "Critical path" means anything in the user's primary workflow — login, dashboard load, core CRUD operations.

For database-level analysis, enable pg_stat_statements in PostgreSQL and pull the top 20 queries by total time. A query that runs 5,000 times/day at 50ms each is a higher-priority target than one that runs twice a day at 2 seconds. Total time in production is the metric, not per-execution time.

Missing indexes are the most common source of P99 debt in early-stage SaaS. Run EXPLAIN ANALYZE on your top 10 slow queries. Sequential scans on tables above 100k rows where an index would apply are immediate fixes — typically an afternoon of work that drops latency by 60–80%.

Bundle Size and Frontend Performance

Run ANALYZE=true next build or the equivalent bundle analyzer. The threshold: above 400kB gzipped JavaScript on the initial bundle is structural debt. Code splitting handles the rest — this is specifically about what ships on first load.

Check LCP against field data in Google Search Console, not Lighthouse lab scores. LCP above 2.5 seconds on your core pages is a retention problem, not just a performance score. Common culprits: unoptimized images above 200kB, synchronous third-party scripts in <head>, and large dependencies that aren't tree-shaken (moment.js, full lodash, full icon libraries).

Type Coverage

Run tsc --strict --noEmit and count errors. In a codebase with real type discipline, this should return zero errors. Count how many files include @ts-ignore or @ts-expect-error suppressions with no explanatory comment — these are load-bearing type suppressions, not temporary shortcuts.

The threshold: above 500 strict-mode TypeScript errors or above 5% any coverage in domain logic is type system debt that will manifest as runtime bugs. TypeScript's value comes from making incorrect states unrepresentable. If the types are lying, the safety net is gone.

Test Coverage

Pull test coverage on your critical paths — not overall coverage, which is gamed by testing utility functions. You want: coverage on authentication flows, payment processing, data mutation operations, and any business logic that's wrong in production more than once a year.

The threshold: below 60% branch coverage on critical paths means refactoring those paths is risky. Not because coverage is the goal, but because coverage correlates with whether you'll catch regressions before users do.

Low coverage by itself isn't urgent debt — it's a risk multiplier. It makes every other piece of debt harder to fix because you can't validate that your fixes didn't break something adjacent.

Observability Gaps

If your team's primary incident detection method is a user email, you have observability debt. Audit: Are critical path errors alerted on? Do you have P50/P95/P99 latency on your most important endpoints? Do you have structured logging with request tracing IDs?

The threshold: if you can't answer "how many users hit a 500 error in the last 24 hours" within 60 seconds, you're finding out about incidents after users do.

Dependency Drift

Run npm outdated. Flag any dependency more than two major versions behind on your critical path — auth libraries, ORM, framework. Security CVEs accrue on outdated dependencies non-linearly.

The threshold: any auth, data access, or networking dependency more than 18 months behind its latest major version needs a scheduled update, not just a backlog entry.

Triage Framework

Once you have findings, sort them into three categories:

Fix now (this sprint). Missing indexes on tables above 100k rows. P99 above 1 second on the primary user workflow. Critical-path dependencies with known CVEs. Observable security gaps (missing authentication checks found during audit). These have immediate revenue or risk impact.

Fix in next quarter (scheduled). P99 above 500ms on secondary paths. Bundle size above 400kB. Type coverage below 80% on core domain logic. Test coverage below 60% on critical paths. These degrade your ability to move fast but aren't causing incidents today.

Accept as cost of doing business. Inconsistent naming conventions in utilities. Test coverage below 40% on low-change, low-risk modules. Minor dependency drift on non-security-relevant libraries. These are real debt but the cost of fixing them exceeds the cost of living with them.

The "accept" category is important and underused. Not everything is worth fixing. Debt triage that doesn't include explicit acceptance decisions produces a list that grows forever and gets ignored.

Connecting Findings to Business Outcomes

The conversation with a non-technical CEO should not be "we have 47 technical debt items." It should be: "We have two findings that are costing us feature velocity — our P99 on the dashboard is 1.2 seconds and our type system has 800 suppressed errors. Here's what fixing them costs and what the outcome is."

The architecture diagnostic maps findings to time-to-feature and incident-rate estimates, not just code quality scores.

For upstream architectural decisions that prevent debt accumulation, see web architecture decisions that scale. For deciding whether your debt load has crossed the threshold where refactoring stops being viable, see when to rebuild vs. refactor. For database-specific debt patterns, see database decisions at SaaS scale.