🐘

PostgreSQL on Kubernetes: Research & Recommendation

Tags
Engineering
Published
April 11, 2026
Last Updated
Last updated April 10, 2026
Author
Claude Code Research Team
Description
Comprehensive research on running PostgreSQL in production on Kubernetes (Hetzner K3s). Covers operators, multi-tenancy patterns, self-hosted alternatives, HA, storage, performance, backup, and monitoring.

Executive Summary

After researching PostgreSQL operators, multi-tenancy patterns, self-hosted alternatives, HA configurations, storage options, performance tuning, backup strategies, and monitoring -- here is the unified recommendation for our Hetzner K3s cluster.
βœ…
Recommendation: Use CloudNativePG operator with a hybrid multi-tenancy approach (start with one shared HA cluster, promote critical projects to dedicated clusters as needed). Use local NVMe storage, PgBouncer in transaction mode, and backups to Hetzner Object Storage.

1. Operator Comparison

CloudNativePG (CNPG) -- STRONGLY RECOMMENDED

πŸ†
The clear winner for our use case. CNCF Sandbox project (applying for Incubation). 7,700+ GitHub stars, 880 commits/year, 132M+ image downloads. Apache 2.0 license.
  • HA: Kubernetes-native failover (no Patroni/etcd dependency). Quorum-based failover (stable in v1.28). Self-healing with auto pod restart, replica promotion, rolling updates.
  • Backup: Built on Barman. S3-compatible storage, continuous WAL archiving, full PITR, scheduled base backups, compression & encryption.
  • Monitoring: Built-in Prometheus exporter with customizable SQL metrics. PodMonitor auto-creation. Official Grafana dashboard.
  • Connection Pooling: Native PgBouncer via dedicated Pooler CRD. Separate, scalable PgBouncer pods.
  • K3s/Hetzner: Proven in production on K3s + Hetzner (Brella case study: zero issues after 7 months).
  • GitOps: Fully declarative CRDs -- perfect for infrastructure-as-code repos.
  • Multi-tenancy: Namespace-based isolation. Cluster-wide or namespace-scoped operator installation.
One caveat: Failover time on Hetzner K3s can be ~5 minutes for node failures (vs ~30s on cloud providers) due to Hetzner's node detection speed. This is infrastructure-level, not a CNPG issue.

Zalando Postgres Operator -- 3rd Place

  • ~4,100 GitHub stars. NOT a CNCF project. Release cadence slowing.
  • Built on Patroni + Spilo. Proven at scale inside Zalando.
  • Unique Team API for multi-tenancy (best among all operators).
  • WAL-G for backups.
  • Community momentum shifting to CNPG. Harder to recommend for new deployments in 2026.

CrunchyData PGO -- 2nd Place

  • ~3,900 GitHub stars. Oldest operator (production since 2017). Backed by Crunchy Data.
  • Built on Patroni. Best reliability test results in independent benchmarks.
  • pgBackRest for backup (gold standard for large databases -- block-level incremental, parallel backup/restore).
  • More complex initial setup than CNPG. Kustomize-first approach.
  • Strong choice if you need pgBackRest or already have Patroni expertise.

Percona Operator -- 4th Place

  • ~72 GitHub stars. Built on top of CrunchyData PGO.
  • Smallest community -- significant risk for small teams needing community support.
  • PMM integration for monitoring (can be heavy).
  • Not recommended unless already invested in the Percona ecosystem.

Operator Comparison Table

Feature
CloudNativePG
Zalando
CrunchyData PGO
Percona
GitHub Stars
~7,700
~4,100
~3,900
~72
CNCF Status
Sandbox (applying Incubation)
None
None
None
HA Foundation
K8s-native
Patroni
Patroni
Patroni (via PGO)
Backup Tool
Barman
WAL-G
pgBackRest
pgBackRest
PgBouncer
Yes (Pooler CRD)
Yes (sidecar)
Yes
Yes
Prometheus
Built-in exporter
Community add-on
Built-in
PMM / Prometheus
K3s/Hetzner Tested
Yes (production)
Yes
Yes
Yes (documented)
Multi-tenancy
Namespace RBAC
Team API (best)
Namespace RBAC
Namespace RBAC
Release Cadence
Very high
Slowing
Moderate
Moderate
Complexity
Low
Medium
Medium-High
Medium-High

2. Multi-Tenancy Patterns

Pattern 1: One PG Instance Per Project/Namespace

Each project gets its own dedicated PostgreSQL cluster (primary + replicas).
βž•
Pros: Full isolation, independent scaling, independent backups/PITR, independent upgrades, strongest security boundary, no noisy neighbors.
βž–
Cons: ~2GB memory per project minimum (primary + replica). 10 projects = ~20-40GB memory. More clusters to monitor, more backup schedules, storage fragmentation.
When to use: Strict compliance/security requirements, very different workload profiles, when you can afford the resource overhead.

Pattern 2: Shared HA Cluster with Multiple Databases

One HA PostgreSQL cluster shared across all projects with database-level isolation.
βž•
Pros: Resource efficient (4-8GB for 10-15 databases vs 20-40GB). One cluster to monitor/backup/upgrade. Simpler networking. PgBouncer routes per-database.
βž–
Cons: Full blast radius (cluster down = ALL projects down). Noisy neighbor risk. PITR is all-or-nothing (cannot restore one database independently). Shared upgrade cycle. Security relies on PostgreSQL RBAC, not network isolation.
When to use: Resource-constrained environments, small team, similar workload profiles, non-critical applications.

Pattern 3: Hybrid Approach -- RECOMMENDED

🎯
Best of both worlds. Critical apps get dedicated PG instances; less critical apps share a common cluster.
Tier 1 (Dedicated): Production-critical, high-traffic, or compliance-sensitive projects.
Tier 2 (Shared): Internal tools, dev environments, low-traffic microservices.
A project should get a dedicated instance when:
  • It handles PII, payment data, or has compliance requirements
  • High write throughput or large dataset (>50GB)
  • Higher SLA than other projects
  • Needs independent scaling or upgrade schedule
A project can use the shared cluster when:
  • Internal tool or low-traffic service
  • Non-sensitive data
  • Occasional latency spikes are acceptable
  • Small dataset (<5GB)

Resource Comparison (10 projects)

Resource
10 Dedicated
1 Shared
Hybrid (2+1)
Memory
20-40 GB
4-8 GB
8-16 GB
CPU
5-10 cores
2-4 cores
3-6 cores
PVCs
20-30
3
9
PgBouncer Instances
10
1
3
Backup Schedules
10
1
3

3. Self-Hosted Alternatives Assessment

Solution
Production Ready
K3s/Hetzner
Complexity
Verdict
Neon (self-hosted)
No
Poor (needs NVMe + S3)
Very High
NOT RECOMMENDED
Supabase (self-hosted)
Partial
Yes
High
NOT RECOMMENDED (overkill)
Bitnami Helm Charts
No (deprecated)
Yes
Low
NOT RECOMMENDED
StackGres
Yes
Yes
Medium
CONDITIONALLY RECOMMENDED
Tembo
Early GA
Yes (untested)
Medium
WATCH
  • Neon: Serverless PG features (scale-to-zero, branching) unavailable in self-hosted mode. Operationally demanding. Not production-ready.
  • Supabase: Full platform (auth, APIs, realtime) is overkill if you just need PostgreSQL. Community-supported only, no official K8s support.
  • Bitnami: Being deprecated. No automated failover, no backup management, no PITR out of the box.
  • StackGres: Solid choice if you want batteries-included with a web console. Patroni HA + WAL-G backups + PgBouncer + Prometheus. Heavier pod footprint than CNPG.
  • Tembo: Interesting Rust-based operator with 200+ extensions and pre-built Stacks. Too young for critical production bet.
Key takeaway: None beat a well-configured CloudNativePG operator for our use case.

4. High Availability

Replication

  • Async replication (recommended default): Primary doesn't wait for replicas. RPO = replication lag (1-5s). No write latency penalty.
  • Sync replication: Primary waits for replica confirmation. RPO approaches zero. Adds 1-3ms latency within same DC. Use only for zero-RPO databases.
  • Quorum-based sync: ANY 1 (replica1, replica2) provides sync durability without single-replica dependency.

Automatic Failover

  • CNPG uses K8s-native leader election (no Patroni/etcd needed)
  • Self-healing: auto pod restart, replica promotion, rolling updates
  • Split-brain prevention through K8s leader election primitives

Failover Timing

Scenario
RTO
RPO
Procedure
Single replica failure
0 (no impact)
0
Operator auto-recreates
Primary failure (same DC)
30-60s
0-5s (async) / 0 (sync)
Auto-failover
Full DC failure
5-15 min
Replication lag
Promote cross-DC replica
Data corruption
15-60 min
To point before corruption
PITR restore
Complete cluster loss
1-4 hours
Last WAL archived
Restore from S3 backup

5. Storage

Hetzner Storage Options

Option
IOPS
Latency
Replication
Best For
Local NVMe (recommended)
100K+
Microseconds
None (use PG replication)
Primary DB, max performance
Longhorn
~19K
Higher
Built-in 2-3x
Simpler ops
OpenEBS Mayastor
~28K
NVMe-over-TCP
Configurable
High-perf with replication
Hetzner Volumes
~15K
Milliseconds
Hetzner-managed
Avoid for primary PG
Recommendations:
  • Use local NVMe via LocalPV for primary PostgreSQL (within 5-10% of bare-metal performance)
  • Use Longhorn if you prefer storage-level replication as a safety net (~30-40% IOPS cost)
  • Use separate WAL volume via CNPG walStorage spec for parallel I/O
  • StorageClass: reclaimPolicy: Retain, allowVolumeExpansion: true, WaitForFirstConsumer
Minimum IOPS targets: 3,000+ random read, 1,000+ random write, <1ms p99 latency for 8K reads.

6. Performance Tuning

PostgreSQL Configuration

Parameter
Formula
Example (8GB RAM, 4 CPU)
shared_buffers
25% of RAM
2GB
effective_cache_size
50-75% of RAM
6GB
work_mem
RAM / (max_connections * 4)
16MB
maintenance_work_mem
5-10% of RAM
512MB
max_connections
Low (use PgBouncer)
100-200
random_page_cost
1.1 for NVMe/SSD
1.1
effective_io_concurrency
200 for NVMe/SSD
200
max_wal_size
2-4GB for write-heavy
4GB

Critical Notes

⚠️
K8s defaults /dev/shm to 64MB. If shared_buffers exceeds this, PostgreSQL will FAIL to start. Most operators handle this automatically -- verify yours does.
⚑
CPU pinning is the single biggest tuning lever. Use Guaranteed QoS (requests == limits) with CPU Manager static policy. Benchmarks show +22% average read/write TPS and -76% write latency with NUMA affinity.

PgBouncer Sizing

Setting
Value
Rationale
pool_mode
transaction
Stateless apps (most common)
default_pool_size
20-30
Per user/database pair
max_client_conn
1000-5000
PgBouncer connections are lightweight
max_db_connections
100
Should be < max_connections

K8s vs Bare-Metal Performance

  • Local NVMe + CPU pinning + huge pages: Within 5-10% of bare-metal
  • Local NVMe, no CPU pinning: Within 15-20%
  • Longhorn/network storage: 30-50% slower

7. Backup & Recovery

Strategy

  • Full backup: Weekly (Sunday)
  • Incremental backup: Daily
  • WAL archiving: Continuous (every completed 16MB WAL segment)
  • Retention: 30 days of backups, 7 days of WAL
  • Target: Hetzner Object Storage (S3-compatible) or MinIO

Backup Verification

πŸ”‘
Backups don't protect your business -- proven restores do. Schedule weekly automated restore tests to a temporary cluster. Monitor backup age (alert if >25 hours), size trends, and WAL archiving lag.

DR Testing Calendar

  • Monthly: Backup restore verification (automated)
  • Quarterly: Simulated primary failure + failover drill
  • Semi-annually: Full DR exercise (restore from backup to fresh cluster)

8. Monitoring & Alerting

Key Metrics to Monitor

  • Health: pg_up, postmaster uptime
  • Connections: Active count by state, utilization vs max_connections (alert > 80%)
  • Performance: Cache hit ratio (should be > 99%), TPS, deadlocks
  • Replication: Replay lag in seconds (alert > 30s warning, > 300s critical)
  • Storage: Database size growth, disk usage (alert > 85%)
  • Backups: Last backup age (alert > 25 hours), WAL archiving failures

Minimum Alert Rules

  1. PostgreSQL down (critical)
  1. Connection utilization > 80% (warning)
  1. Replication lag > 30s / > 300s (warning / critical)
  1. Cache hit ratio < 99% (warning)
  1. Backup age > 25 hours (critical)
  1. Disk usage > 85% (warning)
  1. WAL archiving failures (warning)
  1. Deadlocks detected (warning)

Grafana Dashboards

Recommended community dashboards: IDs 9628 (PostgreSQL Database) and 14114 (PostgreSQL Overview).

9. Final Architecture

Layer
Choice
Rationale
Operator
CloudNativePG
CNCF, K3s-proven, lightest, most active
Multi-tenancy
Hybrid (start shared)
Resource efficient, clear upgrade path
Storage
Local NVMe via LocalPV
Best IOPS, within 5-10% of bare-metal
WAL Volume
Separate (CNPG walStorage)
Parallel I/O, disk-full protection
Replication
Async (default)
Sync only for zero-RPO databases
Connection Pooling
PgBouncer (CNPG Pooler CRD)
Transaction mode, 20-30 pool size
Backup Target
Hetzner Object Storage (S3)
Weekly full + daily incr + continuous WAL
Retention
30 days
With weekly automated restore verification
Monitoring
Built-in CNPG Prometheus + Grafana
Connections, TPS, replication, cache, disk

10. Sources

Operators

  • Brella Case Study (CNPG on Hetzner K3s)

Alternatives

Multi-Tenancy

  • CNPG Discussion #497 -- Multiple Databases
  • CNPG Discussion #2357 -- Multi-tenant Architecture

Infrastructure

  • PostgreSQL Tuning for Kubernetes best practices
  • Zalando Engineering -- PgBouncer on Kubernetes

Research conducted April 2026 by a Claude Code agent team: K8s Operator Specialist, Database Architect, Solutions Architect, and Infrastructure Engineer.