In distributed messaging systems, data durability and cross-region replication are vital for maintaining availability, consistency, and disaster recovery. Apache Pulsar is designed with built-in support for persistence and geo-replication, making it ideal for modern, cloud-native deployments where global resilience is a must.

This guide explains how to manage Pulsar’s data durability guarantees and set up multi-region replication for high-availability and disaster-resilient architectures.


Apache Pulsar separates its serving layer (Brokers) from the storage layer (BookKeeper). This allows fine-grained control over message durability and redundancy.

Key components:

  • Brokers handle client connections and topic routing
  • Bookies (Apache BookKeeper) persist messages to disk
  • ZooKeeper coordinates metadata and cluster state

Durability is ensured through write-ahead logs, replication across bookies, and acknowledgment strategies.


Configuring Message Durability

Set the replication factor to control how many copies of each message are stored:

managedLedgerDefaultEnsembleSize=3
managedLedgerDefaultWriteQuorum=3
managedLedgerDefaultAckQuorum=2

Definitions:

  • Ensemble size: Number of bookies a message is written to
  • Write quorum: Number of bookies a write must go to
  • Ack quorum: Number of successful writes needed to acknowledge

To improve durability:

  • Increase ensemble and quorum sizes
  • Use SSD-backed or provisioned IOPS volumes
  • Configure journal and ledger directories on separate disks

Enabling Synchronous Replication

Pulsar supports synchronous replication within a region using BookKeeper’s quorum writes. You can configure message acknowledgment behavior per producer:

Producer<byte[]> producer = client.newProducer()
.topic("persistent://public/default/critical-events")
.sendTimeout(5, TimeUnit.SECONDS)
.blockIfQueueFull(true)
.enableBatching(false)
.create();

To ensure stronger consistency:

  • Disable batching for low-latency durable writes
  • Set low send timeout to handle retries

Geo-Replication Across Regions

Pulsar offers built-in geo-replication to synchronize topics across clusters in different regions.

Step 1: Define Replication Clusters

Each region runs a separate Pulsar cluster with a unique name:

bin/pulsar-admin clusters create us-east \
--url http://us-east-broker:8080 \
--broker-url pulsar://us-east-broker:6650

bin/pulsar-admin clusters create eu-west \
--url http://eu-west-broker:8080 \
--broker-url pulsar://eu-west-broker:6650
Step 2: Register Clusters in Tenant

Assign both clusters to a tenant:

bin/pulsar-admin tenants update my-tenant \
--allowed-clusters us-east,eu-west
Step 3: Enable Replication for Namespace
bin/pulsar-admin namespaces set-clusters my-tenant/global-namespace \
--clusters us-east,eu-west

Messages published in one region will be replicated to other clusters automatically.


Best Practices for Replication Strategy

  • Use global topics to route traffic automatically to the nearest region
  • Ensure symmetric configurations across clusters (auth, TLS, retention)
  • Monitor replication lag using pulsar_replication_backlog and related metrics
  • Place clusters behind region-aware load balancers to reduce client latency
  • Use Kafka-on-Pulsar for bridging legacy systems during migration

Handling Failures and Disaster Recovery

If a region becomes unavailable:

  • Consumers in other regions can seamlessly switch to their local cluster
  • Replicated messages ensure no data loss during failover
  • Producers can write to the next available cluster in the namespace list

To strengthen DR:

  • Regularly test failover scenarios
  • Use tiered storage to recover long-term data
  • Automate cluster health checks and replication alerts

Conclusion

Apache Pulsar makes it easy to ensure durability and resilience at global scale, thanks to its write-quorum architecture and built-in geo-replication. Whether you’re supporting financial transactions, IoT telemetry, or mission-critical analytics, Pulsar helps guarantee that your data is protected and always available—even across regions.

By tuning durability parameters and replicating data across distributed clusters, teams can build modern, fault-tolerant systems that meet the demands of real-time applications.