Deploying Apache Pulsar in a Multi Cloud Environment for High Scalability
Architect and deploy Apache Pulsar across multiple cloud providers for resilient, high-throughput event streaming
As businesses grow across regions and platforms, adopting a multi-cloud strategy ensures resilience, vendor flexibility, and low-latency global access. Apache Pulsar, with its multi-layer distributed architecture, is well-suited for multi-cloud deployments that require scalable and reliable event streaming.
In this post, we explore how to deploy Apache Pulsar in a multi-cloud environment, addressing challenges such as data replication, geo-partitioning, service discovery, and network latency to achieve high scalability and availability.
Why Deploy Pulsar in Multi-Cloud?
Multi-cloud deployments allow organizations to:
- Avoid vendor lock-in
- Ensure business continuity during regional cloud outages
- Achieve low-latency processing close to end users
- Satisfy regulatory compliance by storing data regionally
Apache Pulsar supports multi-cloud readiness with:
- Built-in geo-replication
- Decoupled storage and compute
- Multi-tenant isolation
- Dynamic topic-level data routing
Pulsar Architecture Overview
Apache Pulsar’s modular components enable flexible multi-cloud placement:
- Brokers: Stateless message routers (can run on any cloud)
- Bookies (BookKeeper nodes): Store messages durably
- ZooKeeper: Stores metadata and cluster coordination
- Proxy: Load balancing and secure ingress for cross-cloud clients
Cloud A Cloud B Cloud C
+--------+ +--------+ +--------+
| Broker | ⇄⇄ | Broker | ⇄⇄ | Broker |
+--------+ +--------+ +--------+
⇣ ⇣ ⇣
[Bookie] [Bookie] [Bookie]
⇣ ⇣ ⇣
[ZK + Proxy] [ZK + Proxy] [ZK + Proxy]
Deployment Topologies
1. Stretched Cluster
- Single Pulsar cluster spanning multiple clouds/regions
- Shared ZooKeeper quorum and metadata
- Requires low-latency (<100ms) network links
Pros:
- Simplified configuration
- Global topic availability
Cons:
- Requires strong inter-cloud networking
- Not ideal for high-latency WANs
2. Multiple Isolated Clusters with Geo-Replication
- Separate Pulsar clusters per cloud/region
- Use geo-replication to sync messages across clusters
[Cluster A] ←→ [Cluster B] ←→ [Cluster C]
Geo-Replicated Topics: customer-events, audit-logs
Pros:
- Loose coupling
- High fault isolation
- Easy to enforce data locality
Cons:
- Added complexity in topic syncing and duplication avoidance
Networking and Connectivity
To ensure secure and performant cross-cloud communication:
- Use TLS encryption and mTLS between Pulsar components
- Establish VPN tunnels or private link endpoints across cloud providers
- Use DNS-based service discovery for cluster failover and routing
- Prefer regional brokers for latency-sensitive consumers
Example:
- Use AWS Transit Gateway or GCP VPC Peering with firewall rules for internal Pulsar traffic
Geo-Replication in Pulsar
Enable replication for a namespace:
pulsar-admin namespaces set-replication-clusters \
--clusters cloud-a,cloud-b \
my-tenant/my-namespace
Pulsar will automatically:
- Replicate messages between clusters
- Maintain per-cluster cursors
- Ensure ordering and deduplication per subscription
Use replication to:
- Route events from cloud-based producers to consumers in another region
- Build DR-ready architectures with minimal latency
Storage Tiering and Scalability
- Pulsar supports tiered storage (e.g., AWS S3, GCS, Azure Blob)
- Offloads old data from BookKeeper to object storage
- Enables cost-efficient long-term retention
Configure via:
pulsar-admin namespaces set-offload-threshold "5G" \
my-tenant/my-namespace
Benefits:
- Scales capacity independently from performance
- Keeps Bookies light for recent, hot data
Deployment Strategies
- Helm + Kubernetes:
- Use pulsar-helm-chart for multi-cloud K8s clusters (EKS, GKE, AKS)
- Use Node Affinity and Topology Spread Constraints for fault zones
- Terraform:
- Automate Pulsar deployment across clouds
- Manage infrastructure, IAM roles, peering, etc.
- Hybrid:
- Run Pulsar brokers on cloud VMs
- Use managed Kubernetes for Bookies and ZooKeeper
Monitoring and Observability
- Enable Prometheus + Grafana dashboards per cloud region
- Use Alertmanager for health checks (broker up/down, topic lag)
- Monitor:
- Replication backlog
- Disk I/O on Bookies
- Subscription redelivery rates
- End-to-end latency
Best Practices
- Use dedicated clusters per region for clean failover
- Enable namespace-level policies for rate limits and replication
- Keep ZK quorum odd-sized and geographically aware
- Automate failover and scaling using Kubernetes and Pulsar Manager
- Compress messages with LZ4/Zstd to reduce cross-cloud egress cost
Conclusion
Deploying Apache Pulsar in a multi-cloud environment enables globally distributed applications to achieve high scalability, fault tolerance, and compliance with data residency requirements. With built-in geo-replication, storage offloading, and flexible deployment patterns, Pulsar is an excellent choice for next-gen real-time infrastructure across cloud boundaries.
By following the architectural strategies and best practices outlined above, your organization can confidently deliver resilient, high-throughput event streaming across AWS, Azure, GCP, and beyond.