Loss of Granular ZFS Caching Control with Unified RocksDB Directory


The previous database configuration in Fulcrum utilized separate directories for different RocksDB column families (like `scripthash_history`, `utxoset`, etc.). This structure offered a significant advantage when using ZFS: **each directory could be mapped to a distinct ZFS dataset**.

This separation allowed for **fine-grained tuning of ZFS properties per dataset**. Specifically, one could apply different caching strategies using ZFS's `special device` (typically an SSD) for different parts of the database. For instance:

1.  The dataset containing the **UTXO set (`utxoset`)**, which experiences frequent writes and reads of relatively small key-value pairs, could benefit greatly from SSD caching. Properties like `special_small_blocks` could be set aggressively on this dataset to ensure frequently accessed UTXO data and associated ZFS metadata resided on the faster SSD tier.
2.  Conversely, the dataset containing the **transaction history (`scripthash_history`)**, which grows very large but consists mostly of older, less frequently accessed ("cold") data, could be configured to *primarily* reside on slower, high-capacity HDDs. Caching properties like `special_small_blocks` could be disabled or set to a very high threshold for this dataset to prevent the large history files from consuming valuable SSD space.

---

However, the **current Fulcrum configuration stores all RocksDB column families within a single directory**. Consequently, only one ZFS dataset can encompass the entire database. This **eliminates the possibility of applying differential caching policies**.

The primary issue arises from the interaction between RocksDB's **compaction** process (specifically Level Style Compaction used by Fulcrum) and ZFS properties like `special_small_blocks` when applied to the entire database dataset:

* **Compaction Rewrites Data:** RocksDB compaction involves reading data from older files (SSTs) and writing it into *new* files, discarding obsolete or deleted entries. Even historical data that hasn't changed logically gets physically rewritten during compaction between levels.
* **`special_small_blocks` Trigger:** When ZFS sees these new writes from RocksDB compaction, it evaluates the block sizes being written. If a block size is smaller than the `special_small_blocks` threshold set on the dataset, ZFS redirects that block write to the `special device` (SSD). Since RocksDB often uses relatively small internal block sizes (e.g., 4KB default), many blocks written during compaction qualify.
* **SSD Bloat:** Over time, through successive compactions, data originally residing on HDDs (like the large transaction history) gets rewritten in small blocks that ZFS places onto the SSD. Eventually, a significant portion, or even the entirety, of the RocksDB database, including the cold history data, migrates to the `special device`, consuming its limited capacity and negating the intended tiered storage strategy.

In summary, the move to a single directory structure, while potentially simplifying database management for RocksDB itself, removes a crucial layer of storage optimization previously possible with ZFS datasets, leading to inefficient use of SSD cache space due to the nature of RocksDB compaction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss of Granular ZFS Caching Control with Unified RocksDB Directory #306

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Loss of Granular ZFS Caching Control with Unified RocksDB Directory #306

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions