Anatomy of Rippled: Shamap

Overview

The SHAMap is a Merkle-Patricia trie that serves as the fundamental data structure for storing blockchain state in the XRP Ledger. It combines two powerful concepts:

Merkle Tree Properties: Each node contains a cryptographic hash summarizing all data beneath it, enabling O(1) comparison of entire subtrees by comparing hashes.
Patricia Trie (Radix 16) Properties: A radix trie with branching factor of 16, where each level corresponds to one hexadecimal digit (4 bits) of a key, enabling efficient key-based navigation and operations.

The XRPL maintains two distinct SHAMaps:

Account State Tree: Tracks account balances, NFTs, offers, and other account data
Transaction Tree: Stores transaction history, ensuring the complete chain of state transitions

The SHAMap root hash represents the state of the blockchain at any given point. This cryptographic commitment enables efficient state comparison and verification: if two ledgers have the same SHAMap root hash, they are guaranteed to have identical state.

Architecture

Core Design Principles

The SHAMap architecture achieves three critical properties:

Cryptographic Integrity: Through Merkle tree hashing, any change to data propagates up to the root hash
Efficient Navigation: Radix-16 structure allows direct path traversal using key nibbles
Optimized Synchronization: Hash comparison enables rapid identification of differences between nodes

Node Hierarchy

The structure consists of three conceptual layers:

Root Layer

Always an inner node
Contains the cryptographic commitment to the entire tree state

Internal Layer

Inner nodes serving as branch points
Each can have up to 16 children (one per hexadecimal digit)
Store hashes of their children but no actual data

Leaf Layer

Terminal nodes containing actual data items
All leaves in a SHAMap are homogeneous (same type)
Three leaf types exist: account state, transactions, and transactions with metadata

Node Types

Inner Nodes

Inner nodes form the branching structure of the tree:

Structure

Up to 16 child slots (indexed 0-15, corresponding to hex digits 0-F)
Hash value representing all descendants
Bitset indicating which children exist
Copy-on-write identifier (cowid) for safe mutations
"Full below" generation marker for synchronization optimization

Key Properties

Do not store data items directly
Maintain cryptographic commitments through child hashes
Support both compressed and full serialization formats
Enable efficient subtree sharing between SHAMap instances

Serialization Formats

Inner nodes support two wire formats:

Compressed Format: Serializes only non-empty branches, saving space when many slots are empty
Full Format: Serializes all 16 branches explicitly, used when most slots are occupied

The format selection is automatic based on branch occupancy.

Leaf Nodes

Leaf nodes store the actual blockchain data:

Base Properties

Hold a SHAMapItem containing the key and data payload
Compute hash based on data content and leaf type
Immutable once added to an immutable SHAMap

Specializations

Three leaf node types exist, each with distinct hashing and serialization:

Account State Leaves: Store account information (balances, settings, owned objects)
Transaction Leaves: Store transaction data without execution metadata
Transaction + Metadata Leaves: Store transactions along with their execution results

Each type implements its own hash calculation using appropriate prefixes and data encoding.

SHAMapItem

The data container within leaf nodes:

Key (tag): 256-bit unique identifier determining tree position
Data: Variable-length payload (transaction, account state, etc.)
Memory Management: Uses intrusive reference counting and custom slab allocation for efficiency

SHAMapNodeID

Nodes are identified by their position in the tree:

Components

Depth: Distance from root (0 = root, increases downward)
Path: Sequence of branch choices encoded in a uint256

Key Operations

getChildNodeID(branch): Computes child node identifier given a branch index
selectBranch(nodeID, key): Determines which branch to follow for a given key
createID(depth, key): Constructs node identifier at specific depth for a key

Path Encoding

The path is packed as 4-bit nibbles into a uint256:

Each nibble (0-15) represents one branch decision
Depth indicates how many nibbles are significant
This enables efficient key-based navigation: to locate a key, extract successive nibbles and follow corresponding branches

Hashing and Merkle Properties

Hash Computation

Hashing is fundamental to SHAMap's integrity guarantees:

Leaf Node Hashing

Hash computed from leaf type prefix and data content
Ensures different leaf types cannot collide even with identical data

Inner Node Hashing

Hash computed from hashes of all non-empty children
Captures complete state of subtree in a single value
Updated bottom-up after modifications

Hash Update Process

When walking the tree:

Leaf node hashes are computed from their data
Inner node hashes are updated after processing all children
Updated hashes propagate to parent nodes
Process continues until reaching root

This ensures the root hash always represents the complete tree state.

Merkle Tree Properties

The hash structure enables:

O(1) Subtree Comparison

Compare two subtrees by comparing their root hashes
Identical hashes guarantee identical content

Efficient Difference Detection

Mismatched hashes pinpoint exact divergence points
Only differing subtrees need detailed comparison

Cryptographic Proofs

Merkle proofs enable trustless verification of data inclusion
Proofs are logarithmic in tree size

State Management and Mutability

Immutable vs Mutable SHAMaps

SHAMaps exist in one of several states:

Immutable State

Nodes cannot be modified
All nodes have cowid = 0
Nodes persist for the lifetime of the SHAMap
Critical Limitation: Immutable SHAMaps cannot be trimmed—once a node is loaded, it remains in memory
Used for historical or finalized ledger states

Mutable State

Nodes can be modified through copy-on-write
All mutable nodes share the same non-zero cowid as their SHAMap
Enables safe modifications without affecting other SHAMap instances
Used for constructing new ledger states

Synching State

Transitional state during synchronization
Allows incremental tree construction
Can transition to Immutable or Modifying when complete

State Transitions

Key state management operations:

setImmutable(): Transitions to immutable state
setSynching(): Enters synchronization mode
clearSynching(): Exits synchronization mode
isValid(): Checks if state is not invalid

Copy-on-Write Mechanism

The copy-on-write (COW) system enables safe sharing:

Principles

Nodes are shared between SHAMaps using shared pointers
When a mutable SHAMap needs to modify a shared node, it creates a copy
The copy receives the SHAMap's cowid, making it exclusive

Node Sharing

Nodes with cowid = 0 are shareable
Nodes with matching cowid belong to the same mutable SHAMap
unshareNode utility automates the clone-if-needed logic

Benefits

Efficient snapshots: new SHAMap instances share unmodified subtrees
Safe concurrent access: modifications don't affect other instances
Memory efficiency: identical subtrees stored once

Snapshots

Creating a snapshot:

snapShot(isMutable): Creates a new SHAMap instance
If isMutable = false: New SHAMap is immutable and shares all nodes
If isMutable = true: New SHAMap can be modified, triggering COW as needed

Snapshots enable versioning and safe concurrent operations without duplicating entire trees.

Traversal and Iteration

Depth-First Traversal: visitNodes

visitNodes provides complete tree traversal:

Algorithm

Uses explicit stack to manage traversal state
Processes nodes in depth-first order
For each inner node, iterates through all 16 branches
Descends into non-empty children

Use Cases

Tree validation
Bulk operations on all nodes
Custom tree analysis

Leaf-Only Traversal: visitLeaves

visitLeaves traverses only data-bearing nodes:

Implementation

Built on top of visitNodes
Filters to process only leaf nodes
Applies user function to each leaf's SHAMapItem

Use Cases

Data extraction
State enumeration
Content validation

Iterator-Based Traversal

SHAMap provides STL-compatible const iterators:

Iterator Interface

begin(), end(): Full tree iteration
upper_bound(key), lower_bound(key): Range queries
Iterates items in key order

Example Usage

for (auto it = shamap.begin(); it != shamap.end(); ++it) {
    // Access SHAMapItem via *it
}

Properties

Iteration order matches key ordering
Enables range-based operations
Compatible with standard algorithms

Parallel Traversal: walkMapParallel

For performance-critical operations: Algorithm

Divides tree into subtrees
Processes subtrees concurrently across multiple threads
Aggregates results after completion

Use Cases

Missing node detection at scale
Bulk hash validation
High-throughput synchronization

Synchronization

Missing Node Detection: getMissingNodes

The core synchronization primitive: Purpose

Identifies nodes required for complete tree reconstruction
Limits result set to a maximum count for bounded network requests

Algorithm

Initialization: Start with root, create tracking structure
Traversal: Walk tree using depth-first approach
Child Examination: For each inner node:
- Check each of 16 potential children
- If child hash present but node missing, record as missing
- If child exists and is inner node, descend further
Deferred Reads: Initiate asynchronous fetches for potentially available nodes
Full Below Optimization: Mark nodes as "full below" when all descendants present

Helper: gmn_ProcessNodes Processes one inner node:

Iterates all 16 branches
Records missing children
Tracks pending asynchronous fetches
Descends into inner children not yet "full below"
Marks node "full below" when all children present

Helper: gmn_ProcessDeferredReads Handles asynchronous fetch results:

Processes completed reads
Canonicalizes successfully fetched nodes
Records failed fetches as missing
Prepares state for continued traversal

Output

Returns a vector of (SHAMapNodeID, SHAMapHash) pairs representing missing nodes, prioritized for network retrieval.

Full Below Cache

An optimization for synchronization:

Purpose

Tracks subtrees that are completely synchronized
Prevents redundant traversal of known-complete subtrees

Mechanism

Generation-based tracking: each subtree marked with "full below" generation
When a subtree is marked full below, all descendants are known present
Synchronization can skip these subtrees

Benefits

Reduces CPU overhead during incremental synchronization
Minimizes redundant tree walking
Accelerates convergence to synchronized state

Node Addition and Canonicalization

Adding the Root: addRootNode

Initializes or verifies the root node:

Algorithm

Check if root already exists with matching hash → return "duplicate"
Validate input data → return "invalid" if malformed
Deserialize node from input
Canonicalize if backed by storage
Set as tree root
Notify filter if provided
Return "useful"

Use Cases

Initial tree construction during sync
Verification during peer communication
Root validation in ledger acceptance

Adding Known Nodes: addKnownNode

Adds interior or leaf nodes during synchronization: Algorithm

Traverse from root toward target node's position
Verify path integrity:
- Expected branch must exist
- Branch hash must match provided hash
If branch empty or mismatched → return "invalid"
Deserialize and validate node
Canonicalize if backed
Insert into parent
Notify filter if provided
Return "useful" or "duplicate"

Safety Checks

Hash verification ensures node belongs in position
Path validation prevents malformed tree construction
Canonicalization prevents duplicate storage

Canonicalization

Ensures nodes are unique in memory: Purpose

Prevent duplicate nodes with identical hashes
Enable safe node sharing through caching
Provide thread-safe node insertion

Mechanism

Check TreeNodeCache for existing node with same hash
If found, use cached instance
If not found, insert new node into cache
Atomic operation prevents races

Benefits

Memory efficiency: identical nodes stored once
Thread safety: cache handles concurrent insertion
Fast equality checks: compare pointers instead of content

Cryptographic Proofs

Proof Generation: getProofPath

Creates a Merkle proof for a specific key: Algorithm

Initialize empty proof path
Walk from root to target leaf, guided by key
At each step:
- Serialize current node
- Add serialized node to proof path
- Descend to next node following key
Proof path contains all nodes from root to leaf

Output Returns optional<vector>:

Each Blob is a serialized node
Ordered from root to leaf
Contains all information needed for verification

Use Cases

Light client verification
Cross-ledger proofs
State attestation

Proof Verification: verifyProofPath

Verifies a proof for a key and expected root hash: Algorithm

Start from leaf end of proof path
Deserialize leaf node, verify it contains the key
Compute leaf hash
For each inner node moving toward root:
- Deserialize node
- Extract branch corresponding to key at this depth
- Verify computed hash matches branch hash in inner node
- Compute inner node's hash
- Update expected hash for next level
Verify final computed hash matches expected root hash

Validation

Path must lead directly from root to leaf
All hashes must chain correctly
Leaf must contain the claimed key

Result Returns boolean:

true: Proof is valid, key exists with verified path to root
false: Proof is invalid or key not present

Proof Properties

Compactness

Proof size is O(log N) where N is tree size
Typically only 5-10 nodes for large trees

Trustlessness

Verifier needs only the root hash
No need to trust the proof provider
Cryptographic guarantee of validity

Applications

Light clients can verify state without full ledger
Cross-chain bridges can prove ledger state
Auditors can verify specific transactions

Serialization

Wire Serialization

Nodes must be serialized for network transmission and storage: Root Serialization: serializeRoot

Serializes the root node
Typically an inner node
Uses compressed or full format based on branch count

Node Serialization: getNodeFat Retrieves and serializes a node with optional descendants: Parameters

nodeID: Target node to serialize
depth: How many levels of descendants to include
fatLeaves: Whether to include leaf content

Algorithm

Descend to requested node
Serialize the node itself
If depth > 0, recursively serialize children
If fatLeaves, include full leaf content
Return serialized data

Use Cases

Efficient bulk data transfer
Subtree synchronization
Bandwidth optimization by including anticipated child nodes

Serialization Formats

Leaf Node Format

Type prefix (distinguishes account state, tx, tx+meta)
SHAMapItem key
SHAMapItem data payload
No children or branches

Inner Node Format - Compressed

Header indicating compressed format
Bitmap showing which branches exist
For each existing branch:
- Branch index
- Child hash
Omits empty branches

Inner Node Format - Full

Header indicating full format
All 16 branches in order
For each branch:
- Child hash (or empty marker)
No bitmap needed

Format Selection

Compressed: Few branches occupied, saves space
Full: Many branches occupied, simpler structure
Decision made automatically during serialization

Storage and Caching

Storage Abstraction: NodeStore

The NodeStore (detailed in a separate section) provides: Interface

Abstract key-value store for nodes
Keyed by node hash
Persistent backend (database, filesystem, etc.)

Responsibilities

Durable storage of nodes
Retrieval by hash
Lifecycle management (rotation, deletion)

Integration with SHAMap

SHAMap requests nodes by hash
NodeStore fetches from persistent storage
Fetched nodes cached in TreeNodeCache

TreeNodeCache

In-memory cache of immutable nodes: Purpose

Accelerate access to frequently used nodes
Reduce database queries
Enable node sharing across SHAMaps

Structure

Key: SHAMapHash
Value: shared_ptr to SHAMapTreeNode
Thread-safe concurrent access

Properties

Only immutable nodes (cowid = 0) are cached
Cache eviction based on access patterns
Canonicalization ensures uniqueness

Cache Benefits

Reduced I/O for hot nodes (root, high-level inner nodes)
Memory efficiency through sharing
Fast lookups for synchronization

FullBelowCache

Tracks synchronization state: Purpose

Records which subtrees are fully synchronized
Prevents redundant checks during sync

Mechanism

Generation-based tracking
Invalidated when new synchronization begins

Impact

Dramatically reduces work during incremental sync
Enables efficient resumption after interruption

Family and NodeFamily

Family Interface

Abstract interface for SHAMap resource management
Provides access to:
- TreeNodeCache
- FullBelowCache
- NodeStore backend

NodeFamily

Concrete implementation
Manages shared resources across multiple SHAMaps
Coordinates cache eviction policies

Benefits

Centralized resource management
Shared caches across related SHAMaps
Simplified lifecycle management

Thread Safety

Concurrency Mechanisms

SHAMap employs multiple strategies for thread safety: Canonicalization Lock

Ensures only one thread inserts a node with a given hash
Prevents race conditions during cache insertion
Atomic cache operations guarantee uniqueness

SHAMapInnerNode Synchronization

Uses atomic operations for child pointer access
Lock field (std::atomicstd::uint16_t) protects concurrent modifications
Enables safe concurrent reads

Immutable Node Sharing

Immutable nodes (cowid = 0) are inherently thread-safe
Multiple threads can read simultaneously
No synchronization needed for reads

Copy-on-Write Protection

Mutable operations clone nodes before modification
Cloning prevents interference between SHAMap instances
Each mutable SHAMap operates on exclusive copies

Cache Thread Safety

TreeNodeCache

Designed for concurrent access
Internal synchronization primitives
Safe for multi-threaded lookups and insertions

FullBelowCache

Thread-safe read/write operations
Generation-based invalidation is atomic

Safe Patterns

Read Operations

Safe on immutable SHAMaps from any thread
No locking required for traversal or lookup

Write Operations

Mutable SHAMaps should be accessed from single thread
Or externally synchronized if multi-threaded access needed

Synchronization Operations

getMissingNodes uses async I/O safely
Deferred reads processed without blocking other operations

Advanced Topics

State Reconstruction

The transaction tree ensures verifiable state reconstruction: Problem

Without transaction history, multiple different sequences could produce same final state
Blockchain requires unique, verifiable history

Solution

Transaction tree preserves complete state transition chain
Current state sₙ = f(sₙ₋₁, txₙ) where f is deterministic
Full history ensures unique path: s₀ → s₁ → ... → sₙ

Verification

Given state tree root and transaction tree root at ledger L
Can verify all transactions that produced state
Ensures blockchain integrity

Performance Characteristics

Lookup Complexity

O(k) where k is key length in nibbles (typically 64 for 256-bit keys)
O(log₁₆ N) in practice, very shallow trees

Insertion/Deletion

O(k) path traversal
Plus O(h) hash updates where h is path height
Copy-on-write adds constant overhead

Synchronization

O(d) where d is number of differences
Hash comparison eliminates need to check identical subtrees
Parallel traversal improves wall-clock time

Space Complexity

O(N) for N data items
Inner node overhead minimal due to branching factor
Sharing reduces memory for snapshots

Design Tradeoffs

Branching Factor

Radix 16 chosen for balance:
Wide enough to keep tree shallow
Narrow enough for manageable child overhead
Aligns with hexadecimal for natural key encoding

Immutable Trimming Limitation

Immutable SHAMaps cannot evict nodes
Tradeoff: simplicity and safety vs memory flexibility
Solution: use mutable copies for long-lived operations requiring memory management

Hash Algorithm

SHA-512Half (first 256 bits of SHA-512)
Balance between security and performance
Enables 256-bit node identifiers

Implementation Reference

The complete implementation details, including class hierarchies, method signatures, and code examples, are available in the accompanying detailed documentation. Key source files include:

Node types: SHAMapTreeNode.h, SHAMapInnerNode.h, SHAMapLeafNode.h
Core SHAMap: SHAMap.h, detail/SHAMap.cpp
Synchronization: detail/SHAMapSync.cpp
Traversal: detail/SHAMapDelta.cpp
Caching: TreeNodeCache.h, Family.h, NodeFamily.h
Storage: SHAMapStoreImp.h, SHAMapStoreImp.cpp

Summary

The SHAMap is a sophisticated data structure that combines Merkle tree cryptographic properties with Patricia trie efficiency. Its design enables:

Efficient State Management: O(1) subtree comparison through hash-based verification
Cryptographic Integrity: Merkle proofs provide trustless verification of data inclusion
Optimized Synchronization: Hash comparison and parallel traversal enable rapid convergence
Memory Efficiency: Copy-on-write and node sharing minimize duplication
Thread Safety: Canonicalization and immutable nodes support concurrent access
Scalability: Integration with caching and persistent storage handles large datasets

These properties make SHAMap ideal for blockchain state management, where data integrity, efficient synchronization, and verifiable state transitions are paramount.

PreviousDebugging Tools: Development and Debugging Techniques NextAnatomy of Rippled: Nodestore

Last updated 2 days ago

Overview

Architecture

Core Design Principles

Node Hierarchy

Node Types

Inner Nodes

Leaf Nodes

SHAMapItem

Node Identification and Navigation

SHAMapNodeID

Hashing and Merkle Properties

Hash Computation

Merkle Tree Properties

State Management and Mutability

Immutable vs Mutable SHAMaps

Copy-on-Write Mechanism

Snapshots

Traversal and Iteration

Depth-First Traversal: visitNodes

Leaf-Only Traversal: visitLeaves

Iterator-Based Traversal

Parallel Traversal: walkMapParallel

Synchronization

Missing Node Detection: getMissingNodes

Output

Full Below Cache

Purpose

Mechanism

Benefits

Node Addition and Canonicalization

Adding the Root: addRootNode

Adding Known Nodes: addKnownNode

Safety Checks

Canonicalization

Cryptographic Proofs

Proof Generation: getProofPath

Proof Verification: verifyProofPath

Proof Properties

Serialization

Wire Serialization

Serialization Formats

Storage and Caching

Storage Abstraction: NodeStore

TreeNodeCache

FullBelowCache

Family and NodeFamily

Family Interface

Thread Safety

Concurrency Mechanisms

Cache Thread Safety

Safe Patterns

Advanced Topics

State Reconstruction

Performance Characteristics

Design Tradeoffs

Implementation Reference

Summary