Modern applications need to process events in real time to power use cases like fraud detection, personalization, operational analytics, and IoT data processing. This is where event-driven architectures (EDA) come into play — enabling systems to react to events as they happen.

Two open-source technologies stand out in this space:

  • Apache Pulsar: a cloud-native, multi-tenant messaging platform
  • Apache Flink: a powerful framework for stateful stream processing

In this post, we explore how Apache Pulsar and Flink can be integrated to build scalable, fault-tolerant event-driven architectures.


Apache Pulsar Apache Flink
Durable, scalable event messaging Distributed stateful stream processing
Native multi-tenancy, geo-replication Complex event processing, windowing
Built-in support for pub/sub and queues Event time semantics and state management
Decouples producers and consumers SQL, CEP, and DataStream APIs

Together, Pulsar and Flink enable:

  • Ingestion of events from many sources
  • Real-time stream processing and analytics
  • Stateless or stateful transformations
  • Output to data lakes, databases, or dashboards

Architecture Overview

[Producers / Event Sources]
↓
[Apache Pulsar Topics]
↓
[Apache Flink (Pulsar Source)]
↓
[Enrichment, Windowing, Joins]
↓
[Sinks: DB, Kafka, Pulsar, S3, Elasticsearch]

Use Apache Flink’s Pulsar connector to consume and produce events from Pulsar.

Add Maven dependency:

<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-pulsar</artifactId>
<version>1.16.0</version>
</dependency>

Example Pulsar source in Flink (Java):

PulsarSource<String> source = PulsarSource.builder()
.setServiceUrl("pulsar://localhost:6650")
.setAdminUrl("http://localhost:8080")
.setStartCursor(StartCursor.earliest())
.setTopics("events.input")
.setDeserializationSchema(new SimpleStringSchema())
.build();

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> stream = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Pulsar Source");

stream
.map(event -> "Processed: " + event)
.print();

env.execute("Pulsar + Flink Stream");

Common Use Cases

  1. Real-time Analytics
    • Aggregate clickstream data per minute/hour
    • Detect anomalies or patterns
    • Use Flink SQL or CEP for analysis
  2. Event Enrichment
    • Join incoming events with reference data
    • Enhance payloads before storage
  3. Data Routing and Filtering
    • Route to different sinks based on attributes
    • Drop noisy events or enrich alerts
  4. Machine Learning Inference
    • Score events in real-time using a preloaded model
    • Output results to Pulsar or database

Flink allows keyed state, windowed joins, and timers for complex logic:

stream
.keyBy(event -> extractKey(event))
.window(TumblingEventTimeWindows.of(Time.minutes(5)))
.reduce((e1, e2) -> combine(e1, e2));

This makes it ideal for use cases like:

  • Sessionization
  • Fraud detection
  • Metric aggregation

Use PulsarSink for Flink to send processed results back to Pulsar topics:

PulsarSink<String> sink = PulsarSink.builder()
.setServiceUrl("pulsar://localhost:6650")
.setAdminUrl("http://localhost:8080")
.setTopic("events.output")
.setSerializationSchema(new SimpleStringSchema())
.build();

stream.sinkTo(sink);

Deployment Strategies

  • Flink on Kubernetes for scaling and isolation
  • Pulsar in multi-tenant mode to support different apps
  • Use Pulsar Functions for lightweight stream logic
  • Secure Flink-Pulsar connection with TLS and auth tokens

Monitoring and Reliability

✅ Use Flink checkpoints for fault-tolerance
✅ Monitor Pulsar consumer lag with Prometheus
✅ Use dead-letter topics in Pulsar for poison events
✅ Tune parallelism and task slots for throughput
✅ Use state backends (e.g., RocksDB) for large-scale state management


Conclusion

By combining the messaging reliability of Apache Pulsar with the processing power of Apache Flink, you can build modern, scalable, and fault-tolerant event-driven architectures that respond to data in real time.

Whether you’re building fraud detection systems, real-time dashboards, or personalized user experiences — Pulsar and Flink give you the tools to stream, process, and react to data like never before.