Kubernetes and Databases - Best Practices for Running Databases in Containers
Discover the key strategies and challenges of running stateful databases in Kubernetes clusters
As Kubernetes adoption grows, so does the desire to run everything—including stateful databases—inside clusters. While Kubernetes excels at managing stateless applications, running databases in containers presents unique challenges. This guide explores the best practices for running databases in Kubernetes, offering a deep technical dive for intermediate and advanced users.
Should You Run Databases in Kubernetes?
Before diving into technical details, it’s important to ask: Should you run your databases in Kubernetes?
Pros:
- Unified platform for stateless and stateful apps
- Automated scaling and orchestration
- Easier DevOps pipeline integration
- Platform-agnostic deployments
Cons:
- Complexity in storage and persistence
- Risk of data loss without careful configuration
- Stateful applications require extra effort for HA
If your team is prepared to handle these challenges, Kubernetes can be a powerful environment for running production-grade databases.
Use StatefulSets for Databases
Kubernetes provides StatefulSets to manage stateful applications like databases. Unlike Deployments, StatefulSets:
- Assign stable network identities (DNS) to Pods
- Preserve persistent volumes across Pod restarts
- Maintain ordered, rolling updates
For example, running a PostgreSQL cluster with StatefulSets ensures that replicas can identify each other via predictable hostnames like:
postgres-0.postgres.default.svc.cluster.local
Persistent Volume Management
Databases need persistent storage that outlives Pods. This is handled using:
- Persistent Volume Claims (PVCs): Requested by StatefulSets
- StorageClasses: Define how volumes are provisioned (e.g., AWS EBS, GCP PD, Ceph)
Best Practices
- Use ReadWriteOnce (RWO) volumes for most single-node databases
- Choose fast, durable storage (SSD-backed where possible)
- Enable volume snapshots for disaster recovery
- Avoid using ephemeral storage for production data
Example PVC template in a StatefulSet:
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
storageClassName: fast-ssd
Network Configuration and DNS
Databases often require stable endpoints for replication and client connections. StatefulSets offer:
- Stable Pod DNS (essential for clustering)
- Headless Services for direct Pod access
Example headless service:
apiVersion: v1
kind: Service
metadata:
name: postgres
spec:
clusterIP: None
selector:
app: postgres
ports:
- port: 5432
This enables discovery of each replica with DNS like postgres-0.postgres
, postgres-1.postgres
, etc.
Database Configuration and Initialization
Automate initialization with Kubernetes-native patterns:
- Init Containers: Set up schemas or seed data
- ConfigMaps/Secrets: Store configuration and credentials
- Probes: Use liveness and readiness probes to ensure Pod health
Example readiness probe:
readinessProbe:
exec:
command: ["pg_isready", "-U", "postgres"]
initialDelaySeconds: 10
periodSeconds: 5
Backup and Restore Strategies
Backups are critical in containerized environments. Use one of the following approaches:
- Run scheduled jobs using CronJobs and tools like
pg_dump
,mysqldump
- Use sidecar containers to perform continuous backups to S3 or GCS
- Take volume snapshots using CSI (Container Storage Interface) drivers
Example: CronJob for MySQL Backup
apiVersion: batch/v1
kind: CronJob
metadata:
name: mysql-backup
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: mysql
args:
- /bin/sh
- -c
- "mysqldump -u root -p$MYSQL_PASSWORD mydb > /backup/backup.sql"
env:
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
restartPolicy: OnFailure
High Availability (HA) Patterns
Single-node databases can be a SPOF (Single Point of Failure). For HA:
- Use database-native clustering (e.g., Galera for MySQL, Patroni for PostgreSQL)
- Combine with PodAntiAffinity to spread Pods across nodes
- Use PersistentVolumes with replication for data durability
Example: Pod Anti-Affinity
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- postgres
topologyKey: "kubernetes.io/hostname"
Security Considerations
Databases hold sensitive data. Secure them with:
- Secrets for credentials (avoid hardcoding in YAML)
- TLS encryption for database connections
- RBAC and Network Policies to restrict access
- Audit logging and monitoring
Monitoring and Observability
Instrument your database with:
- Prometheus exporters (e.g.,
mysqld_exporter
,postgres_exporter
) - Grafana dashboards for performance metrics
- Kubernetes Events and Logs for debugging
When to Use External DBaaS
In some cases, a managed Database-as-a-Service (DBaaS) may be more appropriate:
- Need for minimal operational overhead
- High availability with SLA guarantees
- Disaster recovery and compliance needs
Use Kubernetes-native tools like ExternalName or Service Mesh for seamless integration with DBaaS.
Conclusion
Running databases in Kubernetes requires thoughtful design around storage, networking, backups, and security. By leveraging StatefulSets, persistent volumes, and best practices like automated backups and readiness probes, you can reliably operate databases in a containerized world. Kubernetes is not just for stateless apps anymore—when done right, it’s a powerful platform for managing stateful services too.