Web Architecture

The Database Decisions That Will Define Your SaaS at Scale

Most early-stage SaaS teams make their database decisions based on what the first developer was comfortable with. By the time they hit 100k users, those decisions cost $200k to unwind. Here's how to make them correctly the first time.

February 12, 202610 min read
database architecturePostgreSQLSaaSscalabilityweb architecture

The database is the one part of your stack that doesn't forgive you. You can refactor your frontend in a weekend. You can swap your authentication provider with a sprint. But migrating 80 million rows of production data while your app is live, without downtime, without data loss, while your engineering team is also supposed to be shipping features — that's a different category of problem.

I've worked with enough SaaS teams at the $1M–$10M ARR stage to see a pattern: the database decisions made in the first three months become the most expensive decisions the company ever made. Not because those decisions were obviously wrong at the time, but because they were never really made at all. They defaulted.

The first developer used MongoDB because their previous job used MongoDB. Nobody challenged it. Two years and $2M in revenue later, the team is running complex join-equivalent operations in application code, consistency is a prayer, and a migration to PostgreSQL is now on the roadmap as a "six-month infrastructure project."

That's the moment companies discover what database decisions actually cost.

Why Database Decisions Compound

The schema you design at 1,000 users is not neutral. Every subsequent table, every subsequent query, every subsequent data relationship gets built on top of that foundation. If the foundation has a flaw — a multi-tenancy model that doesn't isolate cleanly, a normalization decision that made sense for reporting but kills transactional throughput — the flaw compounds with every new feature.

At 1,000 users you have 12 tables, 300 queries, and a team of two who understand the data model. At 100,000 users you have 60 tables, 4,000 queries across multiple services, three engineers who've left and taken their context with them, and a customer expecting 99.9% uptime.

The migration doesn't just cost engineering time. It costs product velocity during the migration period, it introduces regression risk, and it requires a data integrity audit that nobody planned for. A team I spoke with in 2025 estimated their PostgreSQL migration — from a document store that had seemed perfectly adequate at launch — ran to $185,000 in engineering cost and a four-month product freeze. Their core product had $4M ARR at the time. They could absorb it. Barely.

Decision 1: SQL vs. NoSQL (This Is Usually the Wrong Question)

The SQL vs. NoSQL debate is mostly noise by now. The useful version of the question is: what is the primary access pattern for your core data?

If your core data is relational — users have organizations, organizations have projects, projects have tasks, tasks have comments — you want a relational database. The joins aren't overhead; they're the point. PostgreSQL handles this, it handles it extremely well, and it scales further than most SaaS companies will ever need before other bottlenecks become the constraint.

NoSQL genuinely wins in specific scenarios: document storage where structure is truly variable per record, high-write-volume time-series data, global distribution with eventual consistency trade-offs you've consciously accepted. These are real use cases. They're just not most SaaS products.

The mistake I see repeatedly is reaching for MongoDB, DynamoDB, or Firestore because they seem "simpler" to start with — no schema migrations, flexible documents, easy to iterate. That flexibility is real in month one. By month eighteen, when your "flexible documents" contain six different shapes of the same entity because the schema evolved without enforcement, the flexibility has become the problem.

Default to PostgreSQL. Reach for NoSQL when a specific access pattern genuinely demands it, not as a general default.

Decision 2: Connection Pooling from Day One

This one bites teams when they least expect it. PostgreSQL handles connections as processes, not threads — each connection has real overhead (around 5–10MB of memory). A Node.js application that naively opens a new database connection per request will start hitting connection limits well before the database is under genuine load.

At 50 concurrent users during a morning traffic spike, a Next.js app deployed as serverless functions can attempt hundreds of simultaneous connections. PostgreSQL's default max_connections of 100 means you've already hit a wall with minimal real load.

The solution is a connection pooler — PgBouncer is the standard, Supabase bundles one, and Prisma Accelerate handles it at the ORM layer. The decision that matters is making this part of your architecture from the start, not a fix you bolt on when production starts throwing too many connections errors at 2am.

Set your connection pooling strategy before you write your first query. The configuration cost is one afternoon. The alternative is an incident at scale.

Decision 3: Read Replicas vs. Caching (These Solve Different Problems)

I've seen teams reach for Redis caching when they actually needed a read replica, and spin up read replicas when they actually needed a cache. They're not interchangeable.

Caching (Redis, Memcached) makes sense for computed values, rendered output, or data that changes infrequently relative to how often it's read. Your homepage's featured content doesn't need a database round-trip on every request. User session data that's read on every authenticated API call doesn't need to hit Postgres.

Read replicas make sense when you have genuinely heavy read load that's hitting the primary — complex analytical queries, reporting, search. A read replica keeps your primary available for writes while offloading read-heavy workloads.

The wrong move is using caching as a band-aid for a slow query that should be optimized or moved to a replica. The right move is understanding which of these you're solving for before you add infrastructure.

At the $1M–$5M ARR stage, a single well-configured PostgreSQL primary with Redis for session and hot-data caching handles most SaaS load profiles. Read replicas become relevant when your reporting queries start competing with your transactional queries — typically around 50,000–100,000 active users depending on your usage patterns.

Decision 4: Multi-Tenancy Schema Design

This is the decision that bites SaaS teams hardest. There are three approaches to multi-tenancy in a relational database:

Row-level isolation: All tenants share tables, separated by a tenant_id column. Simple to implement, simple to migrate. The risk: a missing WHERE tenant_id = ? clause in a query exposes cross-tenant data. This has happened, and the consequence is catastrophic.

Schema-level isolation: Each tenant gets their own PostgreSQL schema within a shared database. Better isolation, but schema migrations now have to run across every tenant schema — a migration that takes 30 seconds on one schema takes 30 minutes across 600 tenants.

Database-level isolation: Each tenant gets their own database. Maximum isolation, highest operational overhead, and connection pooling becomes dramatically more complex.

My default recommendation for early-stage SaaS: row-level isolation with PostgreSQL Row-Level Security policies enabled. You define the isolation rule at the database level rather than relying on application code to always include the right WHERE clause. You get the simplicity of a shared schema with enforcement that doesn't depend on every developer remembering to filter correctly.

PostgreSQL vs. Alternatives in 2026

The landscape has matured. Here's where things actually stand:

PostgreSQL remains the correct default for most SaaS applications. The ecosystem is deep, managed offerings (AWS RDS, Supabase, Neon, Railway) remove most operational burden, and the feature set — JSONB for semi-structured data, full-text search, extensions like pgvector for embeddings — means you rarely need another database for your primary workload.

PlanetScale (MySQL-compatible) had a moment, introduced a compelling branching workflow for schema migrations, then pulled their free tier. Worth evaluating for teams with MySQL expertise, but not the default.

Supabase is worth mentioning separately because it's become a genuine stack choice, not just a database — Postgres with auth, storage, and edge functions built in. For early-stage teams that want to move fast, the integrated stack reduces decisions.

CockroachDB and Spanner are global distributed SQL databases. If you're building something that genuinely requires multi-region writes with strong consistency, these are serious options. If you're not, they add complexity for no benefit.

The Practical Checklist

If you're building a SaaS today and want to make these decisions correctly:

The teams that get this right don't do it because they're smarter. They do it because they treated these as deliberate decisions rather than defaults. That distinction is the entire ballgame.

Apply

If this maps to a problem you're working on.

I work with $1M–$20M ARR founders whose digital investment isn't producing the return it should. Applications reviewed personally within 48 hours.

2 Diagnostic slots / month · 2–3 full engagements / quarter · 48h review