Modern applications generate logs at an unprecedented scale. Centralizing and analyzing these logs in real-time is essential for observability, debugging, and threat detection. By combining Apache Kafka with ElasticSearch, you can build a highly scalable log analytics platform that ingests, processes, and indexes logs in real time.

This guide explores the architecture, key configurations, and best practices for using Kafka and ElasticSearch to build a reliable, real-time log monitoring system.


Why Kafka + ElasticSearch?

Kafka acts as a durable, high-throughput buffer between log producers and ElasticSearch, enabling:

  • Decoupling of ingestion and indexing
  • Resilient, replayable pipelines
  • Scalable, multi-consumer architecture

ElasticSearch is a powerful search and analytics engine that allows:

  • Fast querying over large log volumes
  • Full-text search and filtering
  • Aggregations, dashboards, and visualizations via Kibana

Together, they form a core component of observability stacks like the ELK or EFK stack.


Architecture Overview

[Applications / Containers / Servers]
↓
[Filebeat / Fluent Bit / Logstash]
↓
[Kafka Topics]
↓
[Kafka Connect / Logstash / Stream Processors]
↓
[ElasticSearch Indexes]
↓
[Kibana Dashboards]

Kafka decouples ingestion from indexing, ensuring no log loss even if ElasticSearch lags or goes down temporarily.


Step-by-Step Implementation

1. Produce Logs to Kafka

Use Filebeat, Fluent Bit, or Logstash to send logs to Kafka.

Example Filebeat output:

output.kafka:
hosts: ["broker1:9092"]
topic: "logs.app1"
codec.format:
string: '%{[@timestamp]} %{[message]}'

Ensure logs are structured (preferably in JSON) and include:

  • Timestamp
  • Host/container
  • Log level
  • Message body
  • Application/service ID

2. Consume Kafka Logs and Send to ElasticSearch

You can use:

  • Kafka Connect with ElasticSearch Sink Connector
  • Logstash Kafka Input Plugin
  • Custom Spark/Flink processors

Example Kafka Connect ElasticSearch Sink (JSON config):

{
"name": "es-sink",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"topics": "logs.app1",
"connection.url": "http://elasticsearch:9200",
"type.name": "_doc",
"key.ignore": "true",
"schema.ignore": "true",
"name": "KafkaToElastic"
}
}

3. Indexing and Mapping in ElasticSearch

Configure index templates to optimize search and storage:

PUT _template/log_template
{
"index_patterns": ["logs-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"timestamp": {"type": "date"},
"message": {"type": "text"},
"log_level": {"type": "keyword"},
"host": {"type": "keyword"}
}
}
}

Performance and Scalability Tips

  • Use Kafka partitions to parallelize log ingestion
  • Set ElasticSearch index refresh interval to balance latency vs. throughput
  • Use bulk ingestion (enabled by default in Kafka Connect)
  • Apply data retention in Kafka and ILM (Index Lifecycle Management) in ElasticSearch

Security and Compliance

  • Secure Kafka with TLS/SASL and ElasticSearch with X-Pack or OpenSearch security plugins
  • Mask or exclude sensitive data before indexing
  • Use Kafka topics per environment (dev, staging, prod) to isolate log flows
  • Enable audit logging for traceability

Monitoring the Pipeline

  • Monitor Kafka with Prometheus, Burrow, or Confluent Control Center
  • Monitor ElasticSearch with X-Pack Monitoring or OpenSearch Dashboards
  • Track:
    • Kafka consumer lag
    • ElasticSearch ingest node pressure
    • Index size and document count
    • Query performance in Kibana

Best Practices Summary

✅ Structure logs in JSON with rich context
✅ Use Kafka as a buffer to decouple producers from ElasticSearch
✅ Apply Kafka and ES retention policies to control storage
✅ Monitor consumer lag and ES ingest pipeline metrics
✅ Use ILM to automate index rollover and cleanup
✅ Protect data in transit and at rest
✅ Visualize data with Kibana dashboards


Conclusion

Combining Kafka and ElasticSearch provides a powerful foundation for building a scalable, resilient log analytics platform. Kafka ensures reliable and high-throughput log transport, while ElasticSearch offers deep querying and real-time visualization.

By following the best practices outlined here, you can build a robust system that helps you monitor infrastructure, debug applications, and respond to incidents in real time.