Scaling our database without sharding

Sharding Postgres is a project. We have a 2GB database and aren't close to needing it. But it's worth being explicit about the path we'd take before we got there.

What we've done already

Indexes on every (tenant_id, created_at) filter. The non-trivial ones get covering indexes that include the columns we project.
Partial indexes for hot queries. tenant_users_google_id_key WHERE google_id IS NOT NULL means we don't walk a B-tree of nulls.
Auto-vacuum tuned per-table. autovacuum_vacuum_scale_factor is 0.05 for the transaction tables.

What's next, in order

Read replica for reporting queries. The dashboard's aggregates can run there with a 1-2s lag and we don't care.
Partitioning the transactions table by month. Quick win once it crosses 50M rows.
Move logs / audit trails to a cheaper store (BigQuery or ClickHouse). They're append-only and we never join against them in app code.
Then — only then — sharding by tenant_id. Citus is the path of least resistance.

If you find yourself reaching for sharding before you've done (1)-(3), there's a good chance you're solving a problem you don't have yet.

What we've done already

What's next, in order

More posts