Apache Hudi enables streaming data ingestion and incremental processing by supporting upserts, deletes, and merges on large datasets. But with great flexibility comes complexity — and debugging Hudi write operations is essential for ensuring data accuracy, consistency, and performance in production.

In this guide, we’ll explore how to debug and monitor Hudi write operations, covering:

  • Timeline and commit inspection
  • Metadata validation
  • Spark logs and metrics
  • Hudi CLI tools
  • Integration with monitoring systems

Understanding Hudi’s Write Model

Hudi supports several write operations:

  • UPSERT – Update if key exists, insert if not
  • INSERT – Append-only
  • BULK_INSERT – High-throughput initial loads
  • DELETE – Remove records by key

Each write produces a commit, represented in the .hoodie/ directory as a timeline event:

  • .commit – Successful commit
  • .inflight – Ongoing write
  • .requested – Scheduled but not yet started

Monitoring these artifacts is the key to understanding Hudi’s internal state.


Step 1: Inspecting the Hudi Timeline

Every Hudi table maintains a timeline of operations under the .hoodie/ folder at the base path.

List the timeline:

ls /data/hudi/user_events/.hoodie/
20240416091523.commit
20240416093000.inflight
20240416094500.requested

Use the Hudi CLI to interactively inspect this timeline:

hudi-cli
connect --path /data/hudi/user_events
show commits
show inflights
show fsview all

This helps identify failed writes or long-running inflight commits.


Step 2: Validating Metadata and Partitions

To ensure that a write completed successfully:

  • Verify the presence of the expected .parquet or .log files
  • Check that record keys were correctly assigned
  • Use the CLI to inspect file views and partitions:
show files
desc partition <partition_path>

Additionally, query the table directly:

val df = spark.read.format("hudi").load("/data/hudi/user_events")
df.groupBy("partition_path").count().show()

Step 3: Monitoring Write Metrics

Hudi exposes runtime metrics that can be tracked using Spark or external tools like Prometheus:

Enable metrics in the write config:

.option("hoodie.metrics.on", "true")
.option("hoodie.metrics.reporter.type", "JMX")
.option("hoodie.write.status.storage.level", "WRITE_STATUS")

Key metrics:

  • Number of inserts/updates
  • Write amplification (total bytes written)
  • Error rate during write
  • Time spent in different phases (indexing, compaction, commit)

Use jconsole or Prometheus exporters to visualize trends.


Step 4: Analyzing Spark Logs and Exceptions

Since Hudi relies heavily on Spark, write operations are distributed. Check Spark logs for:

  • Task failures (due to schema mismatches, null primary keys)
  • HoodieWriteConflictException
  • OutOfMemory errors
  • Retries or timeouts during compaction or index lookup

Set Spark log level to DEBUG:

--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-debug.properties"
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-debug.properties"

Enable Hudi-specific logs:

log4j.logger.org.apache.hudi=DEBUG
log4j.logger.org.apache.hudi.io=DEBUG

Step 5: Validating Write Success

A successful write will:

  • Generate a .commit file in .hoodie/
  • Populate data files in the proper partition
  • Register metadata updates in Hive (if sync is enabled)
  • Include a _SUCCESS marker (optional, depending on job type)

You can validate programmatically:

val commits = spark.read.format("hudi")
.option("hoodie.datasource.query.type", "incremental")
.option("hoodie.datasource.read.begin.instanttime", "000")
.load("/data/hudi/user_events")
.select("commit_time")
.distinct()

commits.show()

Step 6: Using Hudi Metrics Dashboard

Hudi integrates with Prometheus + Grafana for centralized monitoring. Metrics include:

  • Write throughput (records/sec)
  • Compaction latency
  • Delta commit frequency
  • Failed operations

To enable:

.option("hoodie.metrics.reporter.type", "PROMETHEUS_PUSHGATEWAY")
.option("hoodie.metrics.prometheus.pushgateway.address", "<host>:9091")
.option("hoodie.metrics.prometheus.pushgateway.jobname", "hudi_write_job")

Then create Grafana dashboards based on Prometheus queries.


Step 7: Debugging Common Write Issues

Issue Resolution
Missing commit file Retry job or clean inflight commits via CLI
HoodieWriteConflictException Enable optimistic concurrency controls and check writer locks
Null record key Ensure hoodie.datasource.write.recordkey.field is properly defined
Hive sync not reflecting data Check hoodie.sync.* configs and Metastore connectivity
Compaction stuck or slow Review logs and enable async compaction

Use hudi-cli:

repair rollback --commit 20240416093000

To roll back incomplete or corrupt commits.


Conclusion

Monitoring and debugging Hudi write operations is crucial for building resilient and high-throughput data pipelines. By leveraging the timeline, inspecting metadata, enabling metrics, and using Hudi’s CLI and Spark logs, you gain full visibility into ingestion health and performance.

Whether you’re working on streaming ETL or near-real-time analytics, mastering these tools ensures consistent, observable, and production-ready data lakes with Hudi.