Kubernetes and Databases - Best Practices for Running Databases in Containers

As Kubernetes adoption grows, so does the desire to run everything—including stateful databases—inside clusters. While Kubernetes excels at managing stateless applications, running databases in containers presents unique challenges. This guide explores the best practices for running databases in Kubernetes, offering a deep technical dive for intermediate and advanced users.

Should You Run Databases in Kubernetes?

Before diving into technical details, it’s important to ask: Should you run your databases in Kubernetes?

Pros:

Unified platform for stateless and stateful apps
Automated scaling and orchestration
Easier DevOps pipeline integration
Platform-agnostic deployments

Cons:

Complexity in storage and persistence
Risk of data loss without careful configuration
Stateful applications require extra effort for HA

If your team is prepared to handle these challenges, Kubernetes can be a powerful environment for running production-grade databases.

Use StatefulSets for Databases

Kubernetes provides StatefulSets to manage stateful applications like databases. Unlike Deployments, StatefulSets:

Assign stable network identities (DNS) to Pods
Preserve persistent volumes across Pod restarts
Maintain ordered, rolling updates

For example, running a PostgreSQL cluster with StatefulSets ensures that replicas can identify each other via predictable hostnames like:

postgres-0.postgres.default.svc.cluster.local

Persistent Volume Management

Databases need persistent storage that outlives Pods. This is handled using:

Persistent Volume Claims (PVCs): Requested by StatefulSets
StorageClasses: Define how volumes are provisioned (e.g., AWS EBS, GCP PD, Ceph)

Best Practices

Use ReadWriteOnce (RWO) volumes for most single-node databases
Choose fast, durable storage (SSD-backed where possible)
Enable volume snapshots for disaster recovery
Avoid using ephemeral storage for production data

Example PVC template in a StatefulSet:

volumeClaimTemplates:
- metadata:
    name: data
  spec:
    accessModes: ["ReadWriteOnce"]
    resources:
      requests:
        storage: 10Gi
    storageClassName: fast-ssd

Network Configuration and DNS

Databases often require stable endpoints for replication and client connections. StatefulSets offer:

Stable Pod DNS (essential for clustering)
Headless Services for direct Pod access

Example headless service:

apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
    - port: 5432

This enables discovery of each replica with DNS like postgres-0.postgres, postgres-1.postgres, etc.

Database Configuration and Initialization

Automate initialization with Kubernetes-native patterns:

Init Containers: Set up schemas or seed data
ConfigMaps/Secrets: Store configuration and credentials
Probes: Use liveness and readiness probes to ensure Pod health

Example readiness probe:

readinessProbe:
  exec:
    command: ["pg_isready", "-U", "postgres"]
  initialDelaySeconds: 10
  periodSeconds: 5

Backup and Restore Strategies

Backups are critical in containerized environments. Use one of the following approaches:

Run scheduled jobs using CronJobs and tools like pg_dump, mysqldump
Use sidecar containers to perform continuous backups to S3 or GCS
Take volume snapshots using CSI (Container Storage Interface) drivers

Example: CronJob for MySQL Backup

apiVersion: batch/v1
kind: CronJob
metadata:
  name: mysql-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: mysql
            args:
              - /bin/sh
              - -c
              - "mysqldump -u root -p$MYSQL_PASSWORD mydb > /backup/backup.sql"
            env:
              - name: MYSQL_PASSWORD
                valueFrom:
                  secretKeyRef:
                    name: mysql-secret
                    key: password
          restartPolicy: OnFailure

High Availability (HA) Patterns

Single-node databases can be a SPOF (Single Point of Failure). For HA:

Use database-native clustering (e.g., Galera for MySQL, Patroni for PostgreSQL)
Combine with PodAntiAffinity to spread Pods across nodes
Use PersistentVolumes with replication for data durability

Example: Pod Anti-Affinity

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - postgres
      topologyKey: "kubernetes.io/hostname"

Security Considerations

Databases hold sensitive data. Secure them with:

Secrets for credentials (avoid hardcoding in YAML)
TLS encryption for database connections
RBAC and Network Policies to restrict access
Audit logging and monitoring

Monitoring and Observability

Instrument your database with:

Prometheus exporters (e.g., mysqld_exporter, postgres_exporter)
Grafana dashboards for performance metrics
Kubernetes Events and Logs for debugging

When to Use External DBaaS

In some cases, a managed Database-as-a-Service (DBaaS) may be more appropriate:

Need for minimal operational overhead
High availability with SLA guarantees
Disaster recovery and compliance needs

Use Kubernetes-native tools like ExternalName or Service Mesh for seamless integration with DBaaS.

Conclusion

Running databases in Kubernetes requires thoughtful design around storage, networking, backups, and security. By leveraging StatefulSets, persistent volumes, and best practices like automated backups and readiness probes, you can reliably operate databases in a containerized world. Kubernetes is not just for stateless apps anymore—when done right, it’s a powerful platform for managing stateful services too.