Leveraging Kafka for Change Data Capture in Microservices
Use Apache Kafka for real-time Change Data Capture to sync databases and microservices seamlessly
In microservices architecture, data is often scattered across isolated, polyglot databases. Keeping these services in sync without tightly coupling them can be challenging. Change Data Capture (CDC) is a technique that captures changes in databases (insert, update, delete) and streams them in real-time — enabling event-driven microservices and reactive data flows.
By integrating CDC with Apache Kafka, teams can ensure real-time, decoupled, and scalable data propagation between services and systems. In this post, we explore how to implement CDC using Kafka and best practices for microservices data integration.
What is Change Data Capture (CDC)?
CDC is the process of detecting and capturing changes made to a database, and making that change available to other systems in real-time.
CDC enables:
- Event-driven microservices that react to DB changes
- Audit logs for compliance
- ETL pipelines for analytics and data lakes
- Database replication and backup
Why Kafka for CDC?
Apache Kafka is ideal for CDC due to:
- High throughput and durability
- Built-in support for partitioning and ordering
- Native integration with tools like Debezium
- Fault-tolerant and replayable event logs
Kafka turns change events into immutable streams that can be consumed asynchronously by multiple services.
CDC Architecture with Kafka and Debezium
+-------------+ +------------------+ +----------------------+
| MySQL/Postgres/etc | → | Debezium Source | → Kafka | Microservices (Sinks) |
+-------------+ +------------------+ +----------------------+
- Debezium captures database changes using DB logs (binlog, WAL, redo)
- Kafka Connect acts as a bridge between Debezium and Kafka
- Kafka Topics store changes per table or event type
- Microservices consume topics to react or update their local state
Setting Up CDC with Kafka + Debezium
- Start Kafka and Kafka Connect
Use Docker or Helm to deploy Kafka and Kafka Connect.
- Configure Debezium Source Connector
Example: Capture changes from MySQL
{
"name": "mysql-cdc-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql-db",
"database.port": "3306",
"database.user": "cdcuser",
"database.password": "cdcpass",
"database.server.id": "12345",
"database.server.name": "inventory",
"database.include.list": "inventory",
"table.include.list": "inventory.orders",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory"
}
}
Submit with:
curl -X POST -H "Content-Type: application/json" \
--data @cdc-config.json \
http://localhost:8083/connectors
- Consume CDC Events in Microservices
Messages are published to Kafka like:
{
"before": {"id": 1, "status": "pending"},
"after": {"id": 1, "status": "shipped"},
"source": {...},
"op": "u", // u = update, c = create, d = delete
"ts_ms": 1683560000
}
Services parse these changes and trigger actions accordingly.
Common Patterns in Microservices
- Outbox Pattern: Service writes events to a separate table; CDC publishes those rows to Kafka.
- Dual Writes: Avoid writing to DB and Kafka separately; use transactional outbox instead.
- Local Materialized Views: Services consume change streams to maintain local read-optimized replicas.
Handling Schema Changes
Debezium supports Schema Registry and Avro/JSON converters.
Enable:
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://schema-registry:8081
This ensures producers and consumers share schema compatibility.
Best Practices
- Use partitioning keys aligned with service ownership
- Maintain idempotent consumers to handle retries gracefully
- Use dead letter queues (DLQs) for failed messages
- Monitor lag and topic growth with Prometheus/Grafana
- Secure the pipeline with SASL/TLS and RBAC
Monitoring CDC Pipelines
- Use Debezium’s metrics (
/metrics
) for lag, error counts - Monitor Kafka Connect REST APIs (
/connectors
,/status
) - Alert on lag in consumer groups and under-replicated partitions
- Log and audit op types (c/u/d) for change tracking
Conclusion
By integrating Kafka with CDC, microservices can achieve real-time data sync without tight coupling. Tools like Debezium make it easy to capture changes and stream them into Kafka, where they can be consumed, audited, and acted upon in a scalable, fault-tolerant way.
With the right architecture and practices, Kafka-powered CDC can supercharge your microservices ecosystem with event-driven data movement, enhanced reliability, and increased agility.