Web Architecture

The 5 Web Architecture Decisions That Will Cost You $100k If You Get Them Wrong

Most scaling pain isn't bad luck — it's the consequence of five specific architecture decisions made in year one. Here's what they are and how to make them correctly the first time.

April 24, 202611 min readFeatured
web architectureNext.jsscalabilitytechnical decisionsSaaS

Last year I rebuilt a $3M ARR DTC storefront from scratch. The company's previous agency had done everything technically correctly — the code worked, the tests passed, the deploys were automated. And yet they were paying $18k/month in infrastructure costs for a site that went down every Black Friday and loaded in 4.2 seconds on mobile.

The problem wasn't incompetence. It was five architecture decisions, each made reasonably in isolation, that compounded into a system that couldn't survive its own success.

These are the decisions I see get made wrong most often. Each one is recoverable — but recovering from them is expensive. Making them right the first time is not.

Decision 1: Where you put your data layer

This is the decision that causes the most pain at scale, and it's usually made in the first sprint.

The common mistake: using your database as your application's memory. Every user action triggers a direct database query. For most apps under a certain load, this works fine — until it doesn't. When traffic spikes, the database becomes the bottleneck, and no amount of horizontal scaling fixes it because horizontal scaling adds servers, not database connections.

The right decision: design your data access layer with explicit caching from day one. This doesn't mean Redis from the start — it means making a deliberate decision about what data is read-heavy versus write-heavy, and treating them differently. Read-heavy data (product catalog, user profile, navigation content) should hit a cache layer. Write-heavy data (orders, form submissions, real-time events) goes directly to the database.

The company I mentioned earlier was making eleven database queries on every product page load. After rebuilding with proper caching strategy, it was one — and that one was cached at the CDN edge for most users.

Decision 2: Server-side vs. client-side rendering (and the hybrid trap)

With Next.js App Router, you have more rendering options than ever: Server Components, Client Components, Static Generation, Incremental Static Regeneration, streaming. This is genuinely powerful. It's also where I see the most expensive mistakes.

The most common mistake isn't choosing the wrong rendering mode — it's not choosing deliberately. Teams default to client-side rendering because it's familiar, or default to full SSR because it feels "safer," without mapping their rendering decisions to their actual data patterns.

The right framework: every route has a data freshness requirement. Ask two questions. First, how often does this data change? Second, does this data change per-user or per-page? The answers map cleanly to rendering strategies:

The DTC storefront I rebuilt had fully dynamic SSR on product pages that hadn't changed in three months. We moved them to ISR with a sixty-second revalidation. Time to first byte dropped from 340ms to 40ms. That's not optimization — that's just using the right tool.

Decision 3: How you handle authentication

Authentication is the decision that most developers treat as solved infrastructure. Pick a library, wire it up, move on. This works until you need to customize something — and you always need to customize something.

The problems I inherit most often:

The right architecture: auth is infrastructure, not a feature. Pick one auth boundary and enforce it there — in Next.js, this is middleware. Middleware runs before every request, checks the session token (stored in an httpOnly cookie, not localStorage), and redirects or rewrites based on the result. Your API routes and server components don't need to duplicate this logic — they trust the middleware.

Role-based access control lives in a separate layer from authentication. Checking "is this user logged in" and "is this user allowed to do this action" are different questions and should be answered by different systems.

Decision 4: Your deployment and infrastructure model

Most early-stage companies default to whatever their first developer was comfortable with. This results in one of two extremes: over-engineered Kubernetes clusters that a three-person startup doesn't need, or under-engineered single-server setups that can't survive a traffic spike.

The right starting model for most $1M-$20M ARR companies is simpler than they think: Vercel for the Next.js frontend, a managed database (Supabase, PlanetScale, or Neon depending on your data model), and Cloudflare in front of everything.

This stack costs less than a dedicated DevOps hire, handles global sub-50ms response times, scales to millions of page views without configuration changes, and gives you zero-downtime deploys from day one. It's not the right stack for every company — but it's right for far more companies than currently use it.

What this stack doesn't give you: full control over your infrastructure. If you have data residency requirements, complex networking needs, or are running stateful workloads that need persistent compute, you need something different. Most companies don't.

Decision 5: Observability

This is the decision that nobody wants to make in sprint one because the app isn't in production yet. It's also the decision that causes the most pain when something goes wrong in production.

The mistake: treating observability as an afterthought. Adding Sentry "when there's time." Building without structured logging. Deploying without any performance monitoring.

The result: when something breaks in production — and something always breaks in production — you're flying blind. You know that users are experiencing errors. You don't know which users, what they were doing, or what caused it.

The right approach: three tools, configured before you deploy to production.

Sentry for errors. Every unhandled exception, every failed API call, every React error boundary trigger. Not because you'll read every error — you won't — but because Sentry's grouping and alerting means you'll know when an error is happening at scale, not when a user emails you.

PostHog for product analytics. Page views, feature usage, funnel drop-off. Free up to a million events/month, self-hostable if you need it. The goal isn't to track everything — it's to know which pages users actually use and where they leave.

Structured logging for your API layer. Every request logged as JSON with a request ID, user ID (if authenticated), response time, and status code. When a customer reports an issue, you can query logs by user ID and see exactly what happened.

This stack takes about four hours to set up properly. The ROI appears the first time something goes wrong in production and you can diagnose it in twenty minutes instead of four hours.


The pattern underneath all five decisions

Look at what these five decisions have in common: they're all about designing for a future you haven't reached yet, without over-engineering for a future you may never reach.

The caching strategy doesn't require a full Redis cluster on day one. The rendering decisions don't require a dedicated platform engineer. The auth architecture doesn't require a custom identity provider. The infrastructure model doesn't require a DevOps hire. The observability stack doesn't require a custom monitoring solution.

Each one is a deliberate decision, made early, that costs roughly the same to implement correctly as it does to implement incorrectly — but has wildly different consequences at scale.

The $100k number in the title isn't an exaggeration. It's what it costs, in engineering time, infrastructure, and downtime, to fix these decisions after they've calcified into a production system with four engineers who have opinions about how it should work.

Make them right the first time. It's not harder. It's just more intentional.

If you're making these decisions right now, I do architecture reviews.

Apply

If this maps to a problem you're working on.

I work with $1M–$20M ARR founders whose digital investment isn't producing the return it should. Applications reviewed personally within 48 hours.

2 Diagnostic slots / month · 2–3 full engagements / quarter · 48h review