The previous database configuration in Fulcrum utilized separate directories for different RocksDB column families (like scripthash_history, utxoset, etc.). This structure offered a significant advantage when using ZFS: each directory could be mapped to a distinct ZFS dataset.
This separation allowed for fine-grained tuning of ZFS properties per dataset. Specifically, one could apply different caching strategies using ZFS's special device (typically an SSD) for different parts of the database. For instance:
- The dataset containing the UTXO set (
utxoset), which experiences frequent writes and reads of relatively small key-value pairs, could benefit greatly from SSD caching. Properties like special_small_blocks could be set aggressively on this dataset to ensure frequently accessed UTXO data and associated ZFS metadata resided on the faster SSD tier.
- Conversely, the dataset containing the transaction history (
scripthash_history), which grows very large but consists mostly of older, less frequently accessed ("cold") data, could be configured to primarily reside on slower, high-capacity HDDs. Caching properties like special_small_blocks could be disabled or set to a very high threshold for this dataset to prevent the large history files from consuming valuable SSD space.
However, the current Fulcrum configuration stores all RocksDB column families within a single directory. Consequently, only one ZFS dataset can encompass the entire database. This eliminates the possibility of applying differential caching policies.
The primary issue arises from the interaction between RocksDB's compaction process (specifically Level Style Compaction used by Fulcrum) and ZFS properties like special_small_blocks when applied to the entire database dataset:
- Compaction Rewrites Data: RocksDB compaction involves reading data from older files (SSTs) and writing it into new files, discarding obsolete or deleted entries. Even historical data that hasn't changed logically gets physically rewritten during compaction between levels.
special_small_blocks Trigger: When ZFS sees these new writes from RocksDB compaction, it evaluates the block sizes being written. If a block size is smaller than the special_small_blocks threshold set on the dataset, ZFS redirects that block write to the special device (SSD). Since RocksDB often uses relatively small internal block sizes (e.g., 4KB default), many blocks written during compaction qualify.
- SSD Bloat: Over time, through successive compactions, data originally residing on HDDs (like the large transaction history) gets rewritten in small blocks that ZFS places onto the SSD. Eventually, a significant portion, or even the entirety, of the RocksDB database, including the cold history data, migrates to the
special device, consuming its limited capacity and negating the intended tiered storage strategy.
In summary, the move to a single directory structure, while potentially simplifying database management for RocksDB itself, removes a crucial layer of storage optimization previously possible with ZFS datasets, leading to inefficient use of SSD cache space due to the nature of RocksDB compaction.
The previous database configuration in Fulcrum utilized separate directories for different RocksDB column families (like
scripthash_history,utxoset, etc.). This structure offered a significant advantage when using ZFS: each directory could be mapped to a distinct ZFS dataset.This separation allowed for fine-grained tuning of ZFS properties per dataset. Specifically, one could apply different caching strategies using ZFS's
special device(typically an SSD) for different parts of the database. For instance:utxoset), which experiences frequent writes and reads of relatively small key-value pairs, could benefit greatly from SSD caching. Properties likespecial_small_blockscould be set aggressively on this dataset to ensure frequently accessed UTXO data and associated ZFS metadata resided on the faster SSD tier.scripthash_history), which grows very large but consists mostly of older, less frequently accessed ("cold") data, could be configured to primarily reside on slower, high-capacity HDDs. Caching properties likespecial_small_blockscould be disabled or set to a very high threshold for this dataset to prevent the large history files from consuming valuable SSD space.However, the current Fulcrum configuration stores all RocksDB column families within a single directory. Consequently, only one ZFS dataset can encompass the entire database. This eliminates the possibility of applying differential caching policies.
The primary issue arises from the interaction between RocksDB's compaction process (specifically Level Style Compaction used by Fulcrum) and ZFS properties like
special_small_blockswhen applied to the entire database dataset:special_small_blocksTrigger: When ZFS sees these new writes from RocksDB compaction, it evaluates the block sizes being written. If a block size is smaller than thespecial_small_blocksthreshold set on the dataset, ZFS redirects that block write to thespecial device(SSD). Since RocksDB often uses relatively small internal block sizes (e.g., 4KB default), many blocks written during compaction qualify.special device, consuming its limited capacity and negating the intended tiered storage strategy.In summary, the move to a single directory structure, while potentially simplifying database management for RocksDB itself, removes a crucial layer of storage optimization previously possible with ZFS datasets, leading to inefficient use of SSD cache space due to the nature of RocksDB compaction.