Anatomy of Rippled: Shamap
Overview
The SHAMap is a Merkle-Patricia trie that serves as the fundamental data structure for storing blockchain state in the XRP Ledger. It combines two powerful concepts:
Merkle Tree Properties: Each node contains a cryptographic hash summarizing all data beneath it, enabling O(1) comparison of entire subtrees by comparing hashes.
Patricia Trie (Radix 16) Properties: A radix trie with branching factor of 16, where each level corresponds to one hexadecimal digit (4 bits) of a key, enabling efficient key-based navigation and operations.
The XRPL maintains two distinct SHAMaps:
Account State Tree: Tracks account balances, NFTs, offers, and other account data
Transaction Tree: Stores transaction history, ensuring the complete chain of state transitions
The SHAMap root hash represents the state of the blockchain at any given point. This cryptographic commitment enables efficient state comparison and verification: if two ledgers have the same SHAMap root hash, they are guaranteed to have identical state.
Architecture
Core Design Principles
The SHAMap architecture achieves three critical properties:
Cryptographic Integrity: Through Merkle tree hashing, any change to data propagates up to the root hash
Efficient Navigation: Radix-16 structure allows direct path traversal using key nibbles
Optimized Synchronization: Hash comparison enables rapid identification of differences between nodes
Node Hierarchy
The structure consists of three conceptual layers:
Root Layer
Always an inner node
Contains the cryptographic commitment to the entire tree state
Internal Layer
Inner nodes serving as branch points
Each can have up to 16 children (one per hexadecimal digit)
Store hashes of their children but no actual data
Leaf Layer
Terminal nodes containing actual data items
All leaves in a SHAMap are homogeneous (same type)
Three leaf types exist: account state, transactions, and transactions with metadata
Node Types
Inner Nodes
Inner nodes form the branching structure of the tree:
Structure
Up to 16 child slots (indexed 0-15, corresponding to hex digits 0-F)
Hash value representing all descendants
Bitset indicating which children exist
Copy-on-write identifier (cowid) for safe mutations
"Full below" generation marker for synchronization optimization
Key Properties
Do not store data items directly
Maintain cryptographic commitments through child hashes
Support both compressed and full serialization formats
Enable efficient subtree sharing between SHAMap instances
Serialization Formats
Inner nodes support two wire formats:
Compressed Format: Serializes only non-empty branches, saving space when many slots are empty
Full Format: Serializes all 16 branches explicitly, used when most slots are occupied
The format selection is automatic based on branch occupancy.
Leaf Nodes
Leaf nodes store the actual blockchain data:
Base Properties
Hold a SHAMapItem containing the key and data payload
Compute hash based on data content and leaf type
Immutable once added to an immutable SHAMap
Specializations
Three leaf node types exist, each with distinct hashing and serialization:
Account State Leaves: Store account information (balances, settings, owned objects)
Transaction Leaves: Store transaction data without execution metadata
Transaction + Metadata Leaves: Store transactions along with their execution results
Each type implements its own hash calculation using appropriate prefixes and data encoding.
SHAMapItem
The data container within leaf nodes:
Key (tag): 256-bit unique identifier determining tree position
Data: Variable-length payload (transaction, account state, etc.)
Memory Management: Uses intrusive reference counting and custom slab allocation for efficiency
Node Identification and Navigation
SHAMapNodeID
Nodes are identified by their position in the tree:
Components
Depth: Distance from root (0 = root, increases downward)
Path: Sequence of branch choices encoded in a uint256
Key Operations
getChildNodeID(branch)
: Computes child node identifier given a branch indexselectBranch(nodeID, key)
: Determines which branch to follow for a given keycreateID(depth, key)
: Constructs node identifier at specific depth for a key
Path Encoding
The path is packed as 4-bit nibbles into a uint256:
Each nibble (0-15) represents one branch decision
Depth indicates how many nibbles are significant
This enables efficient key-based navigation: to locate a key, extract successive nibbles and follow corresponding branches
Hashing and Merkle Properties
Hash Computation
Hashing is fundamental to SHAMap's integrity guarantees:
Leaf Node Hashing
Hash computed from leaf type prefix and data content
Ensures different leaf types cannot collide even with identical data
Inner Node Hashing
Hash computed from hashes of all non-empty children
Captures complete state of subtree in a single value
Updated bottom-up after modifications
Hash Update Process
When walking the tree:
Leaf node hashes are computed from their data
Inner node hashes are updated after processing all children
Updated hashes propagate to parent nodes
Process continues until reaching root
This ensures the root hash always represents the complete tree state.
Merkle Tree Properties
The hash structure enables:
O(1) Subtree Comparison
Compare two subtrees by comparing their root hashes
Identical hashes guarantee identical content
Efficient Difference Detection
Mismatched hashes pinpoint exact divergence points
Only differing subtrees need detailed comparison
Cryptographic Proofs
Merkle proofs enable trustless verification of data inclusion
Proofs are logarithmic in tree size
State Management and Mutability
Immutable vs Mutable SHAMaps
SHAMaps exist in one of several states:
Immutable State
Nodes cannot be modified
All nodes have cowid = 0
Nodes persist for the lifetime of the SHAMap
Critical Limitation: Immutable SHAMaps cannot be trimmed—once a node is loaded, it remains in memory
Used for historical or finalized ledger states
Mutable State
Nodes can be modified through copy-on-write
All mutable nodes share the same non-zero cowid as their SHAMap
Enables safe modifications without affecting other SHAMap instances
Used for constructing new ledger states
Synching State
Transitional state during synchronization
Allows incremental tree construction
Can transition to Immutable or Modifying when complete
State Transitions
Key state management operations:
setImmutable()
: Transitions to immutable statesetSynching()
: Enters synchronization modeclearSynching()
: Exits synchronization modeisValid()
: Checks if state is not invalid
Copy-on-Write Mechanism
The copy-on-write (COW) system enables safe sharing:
Principles
Nodes are shared between SHAMaps using shared pointers
When a mutable SHAMap needs to modify a shared node, it creates a copy
The copy receives the SHAMap's cowid, making it exclusive
Node Sharing
Nodes with cowid = 0 are shareable
Nodes with matching cowid belong to the same mutable SHAMap
unshareNode
utility automates the clone-if-needed logic
Benefits
Efficient snapshots: new SHAMap instances share unmodified subtrees
Safe concurrent access: modifications don't affect other instances
Memory efficiency: identical subtrees stored once
Snapshots
Creating a snapshot:
snapShot(isMutable)
: Creates a new SHAMap instanceIf
isMutable = false
: New SHAMap is immutable and shares all nodesIf
isMutable = true
: New SHAMap can be modified, triggering COW as needed
Snapshots enable versioning and safe concurrent operations without duplicating entire trees.
Traversal and Iteration
Depth-First Traversal: visitNodes
visitNodes
provides complete tree traversal:
Algorithm
Uses explicit stack to manage traversal state
Processes nodes in depth-first order
For each inner node, iterates through all 16 branches
Descends into non-empty children
Use Cases
Tree validation
Bulk operations on all nodes
Custom tree analysis
Leaf-Only Traversal: visitLeaves
visitLeaves
traverses only data-bearing nodes:
Implementation
Built on top of
visitNodes
Filters to process only leaf nodes
Applies user function to each leaf's SHAMapItem
Use Cases
Data extraction
State enumeration
Content validation
Iterator-Based Traversal
SHAMap provides STL-compatible const iterators:
Iterator Interface
begin()
,end()
: Full tree iterationupper_bound(key)
,lower_bound(key)
: Range queriesIterates items in key order
Example Usage
for (auto it = shamap.begin(); it != shamap.end(); ++it) {
// Access SHAMapItem via *it
}
Properties
Iteration order matches key ordering
Enables range-based operations
Compatible with standard algorithms
Parallel Traversal: walkMapParallel
For performance-critical operations: Algorithm
Divides tree into subtrees
Processes subtrees concurrently across multiple threads
Aggregates results after completion
Use Cases
Missing node detection at scale
Bulk hash validation
High-throughput synchronization
Synchronization
Missing Node Detection: getMissingNodes
The core synchronization primitive: Purpose
Identifies nodes required for complete tree reconstruction
Limits result set to a maximum count for bounded network requests
Algorithm
Initialization: Start with root, create tracking structure
Traversal: Walk tree using depth-first approach
Child Examination: For each inner node:
Check each of 16 potential children
If child hash present but node missing, record as missing
If child exists and is inner node, descend further
Deferred Reads: Initiate asynchronous fetches for potentially available nodes
Full Below Optimization: Mark nodes as "full below" when all descendants present
Helper: gmn_ProcessNodes Processes one inner node:
Iterates all 16 branches
Records missing children
Tracks pending asynchronous fetches
Descends into inner children not yet "full below"
Marks node "full below" when all children present
Helper: gmn_ProcessDeferredReads Handles asynchronous fetch results:
Processes completed reads
Canonicalizes successfully fetched nodes
Records failed fetches as missing
Prepares state for continued traversal
Output
Returns a vector of (SHAMapNodeID, SHAMapHash) pairs representing missing nodes, prioritized for network retrieval.
Full Below Cache
An optimization for synchronization:
Purpose
Tracks subtrees that are completely synchronized
Prevents redundant traversal of known-complete subtrees
Mechanism
Generation-based tracking: each subtree marked with "full below" generation
When a subtree is marked full below, all descendants are known present
Synchronization can skip these subtrees
Benefits
Reduces CPU overhead during incremental synchronization
Minimizes redundant tree walking
Accelerates convergence to synchronized state
Node Addition and Canonicalization
Adding the Root: addRootNode
Initializes or verifies the root node:
Algorithm
Check if root already exists with matching hash → return "duplicate"
Validate input data → return "invalid" if malformed
Deserialize node from input
Canonicalize if backed by storage
Set as tree root
Notify filter if provided
Return "useful"
Use Cases
Initial tree construction during sync
Verification during peer communication
Root validation in ledger acceptance
Adding Known Nodes: addKnownNode
Adds interior or leaf nodes during synchronization: Algorithm
Traverse from root toward target node's position
Verify path integrity:
Expected branch must exist
Branch hash must match provided hash
If branch empty or mismatched → return "invalid"
Deserialize and validate node
Canonicalize if backed
Insert into parent
Notify filter if provided
Return "useful" or "duplicate"
Safety Checks
Hash verification ensures node belongs in position
Path validation prevents malformed tree construction
Canonicalization prevents duplicate storage
Canonicalization
Ensures nodes are unique in memory: Purpose
Prevent duplicate nodes with identical hashes
Enable safe node sharing through caching
Provide thread-safe node insertion
Mechanism
Check TreeNodeCache for existing node with same hash
If found, use cached instance
If not found, insert new node into cache
Atomic operation prevents races
Benefits
Memory efficiency: identical nodes stored once
Thread safety: cache handles concurrent insertion
Fast equality checks: compare pointers instead of content
Cryptographic Proofs
Proof Generation: getProofPath
Creates a Merkle proof for a specific key: Algorithm
Initialize empty proof path
Walk from root to target leaf, guided by key
At each step:
Serialize current node
Add serialized node to proof path
Descend to next node following key
Proof path contains all nodes from root to leaf
Output Returns optional<vector>:
Each Blob is a serialized node
Ordered from root to leaf
Contains all information needed for verification
Use Cases
Light client verification
Cross-ledger proofs
State attestation
Proof Verification: verifyProofPath
Verifies a proof for a key and expected root hash: Algorithm
Start from leaf end of proof path
Deserialize leaf node, verify it contains the key
Compute leaf hash
For each inner node moving toward root:
Deserialize node
Extract branch corresponding to key at this depth
Verify computed hash matches branch hash in inner node
Compute inner node's hash
Update expected hash for next level
Verify final computed hash matches expected root hash
Validation
Path must lead directly from root to leaf
All hashes must chain correctly
Leaf must contain the claimed key
Result Returns boolean:
true: Proof is valid, key exists with verified path to root
false: Proof is invalid or key not present
Proof Properties
Compactness
Proof size is O(log N) where N is tree size
Typically only 5-10 nodes for large trees
Trustlessness
Verifier needs only the root hash
No need to trust the proof provider
Cryptographic guarantee of validity
Applications
Light clients can verify state without full ledger
Cross-chain bridges can prove ledger state
Auditors can verify specific transactions
Serialization
Wire Serialization
Nodes must be serialized for network transmission and storage: Root Serialization: serializeRoot
Serializes the root node
Typically an inner node
Uses compressed or full format based on branch count
Node Serialization: getNodeFat Retrieves and serializes a node with optional descendants: Parameters
nodeID: Target node to serialize
depth: How many levels of descendants to include
fatLeaves: Whether to include leaf content
Algorithm
Descend to requested node
Serialize the node itself
If depth > 0, recursively serialize children
If fatLeaves, include full leaf content
Return serialized data
Use Cases
Efficient bulk data transfer
Subtree synchronization
Bandwidth optimization by including anticipated child nodes
Serialization Formats
Leaf Node Format
Type prefix (distinguishes account state, tx, tx+meta)
SHAMapItem key
SHAMapItem data payload
No children or branches
Inner Node Format - Compressed
Header indicating compressed format
Bitmap showing which branches exist
For each existing branch:
Branch index
Child hash
Omits empty branches
Inner Node Format - Full
Header indicating full format
All 16 branches in order
For each branch:
Child hash (or empty marker)
No bitmap needed
Format Selection
Compressed: Few branches occupied, saves space
Full: Many branches occupied, simpler structure
Decision made automatically during serialization
Storage and Caching
Storage Abstraction: NodeStore
The NodeStore (detailed in a separate section) provides: Interface
Abstract key-value store for nodes
Keyed by node hash
Persistent backend (database, filesystem, etc.)
Responsibilities
Durable storage of nodes
Retrieval by hash
Lifecycle management (rotation, deletion)
Integration with SHAMap
SHAMap requests nodes by hash
NodeStore fetches from persistent storage
Fetched nodes cached in TreeNodeCache
TreeNodeCache
In-memory cache of immutable nodes: Purpose
Accelerate access to frequently used nodes
Reduce database queries
Enable node sharing across SHAMaps
Structure
Key: SHAMapHash
Value: shared_ptr to SHAMapTreeNode
Thread-safe concurrent access
Properties
Only immutable nodes (cowid = 0) are cached
Cache eviction based on access patterns
Canonicalization ensures uniqueness
Cache Benefits
Reduced I/O for hot nodes (root, high-level inner nodes)
Memory efficiency through sharing
Fast lookups for synchronization
FullBelowCache
Tracks synchronization state: Purpose
Records which subtrees are fully synchronized
Prevents redundant checks during sync
Mechanism
Generation-based tracking
Invalidated when new synchronization begins
Impact
Dramatically reduces work during incremental sync
Enables efficient resumption after interruption
Family and NodeFamily
Family Interface
Abstract interface for SHAMap resource management
Provides access to:
TreeNodeCache
FullBelowCache
NodeStore backend
NodeFamily
Concrete implementation
Manages shared resources across multiple SHAMaps
Coordinates cache eviction policies
Benefits
Centralized resource management
Shared caches across related SHAMaps
Simplified lifecycle management
Thread Safety
Concurrency Mechanisms
SHAMap employs multiple strategies for thread safety: Canonicalization Lock
Ensures only one thread inserts a node with a given hash
Prevents race conditions during cache insertion
Atomic cache operations guarantee uniqueness
SHAMapInnerNode Synchronization
Uses atomic operations for child pointer access
Lock field (std::atomicstd::uint16_t) protects concurrent modifications
Enables safe concurrent reads
Immutable Node Sharing
Immutable nodes (cowid = 0) are inherently thread-safe
Multiple threads can read simultaneously
No synchronization needed for reads
Copy-on-Write Protection
Mutable operations clone nodes before modification
Cloning prevents interference between SHAMap instances
Each mutable SHAMap operates on exclusive copies
Cache Thread Safety
TreeNodeCache
Designed for concurrent access
Internal synchronization primitives
Safe for multi-threaded lookups and insertions
FullBelowCache
Thread-safe read/write operations
Generation-based invalidation is atomic
Safe Patterns
Read Operations
Safe on immutable SHAMaps from any thread
No locking required for traversal or lookup
Write Operations
Mutable SHAMaps should be accessed from single thread
Or externally synchronized if multi-threaded access needed
Synchronization Operations
getMissingNodes uses async I/O safely
Deferred reads processed without blocking other operations
Advanced Topics
State Reconstruction
The transaction tree ensures verifiable state reconstruction: Problem
Without transaction history, multiple different sequences could produce same final state
Blockchain requires unique, verifiable history
Solution
Transaction tree preserves complete state transition chain
Current state sₙ = f(sₙ₋₁, txₙ) where f is deterministic
Full history ensures unique path: s₀ → s₁ → ... → sₙ
Verification
Given state tree root and transaction tree root at ledger L
Can verify all transactions that produced state
Ensures blockchain integrity
Performance Characteristics
Lookup Complexity
O(k) where k is key length in nibbles (typically 64 for 256-bit keys)
O(log₁₆ N) in practice, very shallow trees
Insertion/Deletion
O(k) path traversal
Plus O(h) hash updates where h is path height
Copy-on-write adds constant overhead
Synchronization
O(d) where d is number of differences
Hash comparison eliminates need to check identical subtrees
Parallel traversal improves wall-clock time
Space Complexity
O(N) for N data items
Inner node overhead minimal due to branching factor
Sharing reduces memory for snapshots
Design Tradeoffs
Branching Factor
Radix 16 chosen for balance:
Wide enough to keep tree shallow
Narrow enough for manageable child overhead
Aligns with hexadecimal for natural key encoding
Immutable Trimming Limitation
Immutable SHAMaps cannot evict nodes
Tradeoff: simplicity and safety vs memory flexibility
Solution: use mutable copies for long-lived operations requiring memory management
Hash Algorithm
SHA-512Half (first 256 bits of SHA-512)
Balance between security and performance
Enables 256-bit node identifiers
Implementation Reference
The complete implementation details, including class hierarchies, method signatures, and code examples, are available in the accompanying detailed documentation. Key source files include:
Node types: SHAMapTreeNode.h, SHAMapInnerNode.h, SHAMapLeafNode.h
Core SHAMap: SHAMap.h, detail/SHAMap.cpp
Synchronization: detail/SHAMapSync.cpp
Traversal: detail/SHAMapDelta.cpp
Caching: TreeNodeCache.h, Family.h, NodeFamily.h
Storage: SHAMapStoreImp.h, SHAMapStoreImp.cpp
Summary
The SHAMap is a sophisticated data structure that combines Merkle tree cryptographic properties with Patricia trie efficiency. Its design enables:
Efficient State Management: O(1) subtree comparison through hash-based verification
Cryptographic Integrity: Merkle proofs provide trustless verification of data inclusion
Optimized Synchronization: Hash comparison and parallel traversal enable rapid convergence
Memory Efficiency: Copy-on-write and node sharing minimize duplication
Thread Safety: Canonicalization and immutable nodes support concurrent access
Scalability: Integration with caching and persistent storage handles large datasets
These properties make SHAMap ideal for blockchain state management, where data integrity, efficient synchronization, and verifiable state transitions are paramount.
Last updated