Anatomy of Rippled: Shamap

Overview

The SHAMap is a Merkle-Patricia trie that serves as the fundamental data structure for storing blockchain state in the XRP Ledger. It combines two powerful concepts:

  • Merkle Tree Properties: Each node contains a cryptographic hash summarizing all data beneath it, enabling O(1) comparison of entire subtrees by comparing hashes.

  • Patricia Trie (Radix 16) Properties: A radix trie with branching factor of 16, where each level corresponds to one hexadecimal digit (4 bits) of a key, enabling efficient key-based navigation and operations.

The XRPL maintains two distinct SHAMaps:

  • Account State Tree: Tracks account balances, NFTs, offers, and other account data

  • Transaction Tree: Stores transaction history, ensuring the complete chain of state transitions

The SHAMap root hash represents the state of the blockchain at any given point. This cryptographic commitment enables efficient state comparison and verification: if two ledgers have the same SHAMap root hash, they are guaranteed to have identical state.


Architecture

Core Design Principles

The SHAMap architecture achieves three critical properties:

  1. Cryptographic Integrity: Through Merkle tree hashing, any change to data propagates up to the root hash

  2. Efficient Navigation: Radix-16 structure allows direct path traversal using key nibbles

  3. Optimized Synchronization: Hash comparison enables rapid identification of differences between nodes

Node Hierarchy

The structure consists of three conceptual layers:

Root Layer

  • Always an inner node

  • Contains the cryptographic commitment to the entire tree state

Internal Layer

  • Inner nodes serving as branch points

  • Each can have up to 16 children (one per hexadecimal digit)

  • Store hashes of their children but no actual data

Leaf Layer

  • Terminal nodes containing actual data items

  • All leaves in a SHAMap are homogeneous (same type)

  • Three leaf types exist: account state, transactions, and transactions with metadata


Node Types

Inner Nodes

Inner nodes form the branching structure of the tree:

Structure

  • Up to 16 child slots (indexed 0-15, corresponding to hex digits 0-F)

  • Hash value representing all descendants

  • Bitset indicating which children exist

  • Copy-on-write identifier (cowid) for safe mutations

  • "Full below" generation marker for synchronization optimization

Key Properties

  • Do not store data items directly

  • Maintain cryptographic commitments through child hashes

  • Support both compressed and full serialization formats

  • Enable efficient subtree sharing between SHAMap instances

Serialization Formats

Inner nodes support two wire formats:

  • Compressed Format: Serializes only non-empty branches, saving space when many slots are empty

  • Full Format: Serializes all 16 branches explicitly, used when most slots are occupied

The format selection is automatic based on branch occupancy.

Leaf Nodes

Leaf nodes store the actual blockchain data:

Base Properties

  • Hold a SHAMapItem containing the key and data payload

  • Compute hash based on data content and leaf type

  • Immutable once added to an immutable SHAMap

Specializations

Three leaf node types exist, each with distinct hashing and serialization:

  1. Account State Leaves: Store account information (balances, settings, owned objects)

  2. Transaction Leaves: Store transaction data without execution metadata

  3. Transaction + Metadata Leaves: Store transactions along with their execution results

Each type implements its own hash calculation using appropriate prefixes and data encoding.

SHAMapItem

The data container within leaf nodes:

  • Key (tag): 256-bit unique identifier determining tree position

  • Data: Variable-length payload (transaction, account state, etc.)

  • Memory Management: Uses intrusive reference counting and custom slab allocation for efficiency


Node Identification and Navigation

SHAMapNodeID

Nodes are identified by their position in the tree:

Components

  • Depth: Distance from root (0 = root, increases downward)

  • Path: Sequence of branch choices encoded in a uint256

Key Operations

  • getChildNodeID(branch): Computes child node identifier given a branch index

  • selectBranch(nodeID, key): Determines which branch to follow for a given key

  • createID(depth, key): Constructs node identifier at specific depth for a key

Path Encoding

The path is packed as 4-bit nibbles into a uint256:

  • Each nibble (0-15) represents one branch decision

  • Depth indicates how many nibbles are significant

  • This enables efficient key-based navigation: to locate a key, extract successive nibbles and follow corresponding branches


Hashing and Merkle Properties

Hash Computation

Hashing is fundamental to SHAMap's integrity guarantees:

Leaf Node Hashing

  • Hash computed from leaf type prefix and data content

  • Ensures different leaf types cannot collide even with identical data

Inner Node Hashing

  • Hash computed from hashes of all non-empty children

  • Captures complete state of subtree in a single value

  • Updated bottom-up after modifications

Hash Update Process

When walking the tree:

  1. Leaf node hashes are computed from their data

  2. Inner node hashes are updated after processing all children

  3. Updated hashes propagate to parent nodes

  4. Process continues until reaching root

This ensures the root hash always represents the complete tree state.

Merkle Tree Properties

The hash structure enables:

O(1) Subtree Comparison

  • Compare two subtrees by comparing their root hashes

  • Identical hashes guarantee identical content

Efficient Difference Detection

  • Mismatched hashes pinpoint exact divergence points

  • Only differing subtrees need detailed comparison

Cryptographic Proofs

  • Merkle proofs enable trustless verification of data inclusion

  • Proofs are logarithmic in tree size


State Management and Mutability

Immutable vs Mutable SHAMaps

SHAMaps exist in one of several states:

Immutable State

  • Nodes cannot be modified

  • All nodes have cowid = 0

  • Nodes persist for the lifetime of the SHAMap

  • Critical Limitation: Immutable SHAMaps cannot be trimmed—once a node is loaded, it remains in memory

  • Used for historical or finalized ledger states

Mutable State

  • Nodes can be modified through copy-on-write

  • All mutable nodes share the same non-zero cowid as their SHAMap

  • Enables safe modifications without affecting other SHAMap instances

  • Used for constructing new ledger states

Synching State

  • Transitional state during synchronization

  • Allows incremental tree construction

  • Can transition to Immutable or Modifying when complete

State Transitions

Key state management operations:

  • setImmutable(): Transitions to immutable state

  • setSynching(): Enters synchronization mode

  • clearSynching(): Exits synchronization mode

  • isValid(): Checks if state is not invalid

Copy-on-Write Mechanism

The copy-on-write (COW) system enables safe sharing:

Principles

  • Nodes are shared between SHAMaps using shared pointers

  • When a mutable SHAMap needs to modify a shared node, it creates a copy

  • The copy receives the SHAMap's cowid, making it exclusive

Node Sharing

  • Nodes with cowid = 0 are shareable

  • Nodes with matching cowid belong to the same mutable SHAMap

  • unshareNode utility automates the clone-if-needed logic

Benefits

  • Efficient snapshots: new SHAMap instances share unmodified subtrees

  • Safe concurrent access: modifications don't affect other instances

  • Memory efficiency: identical subtrees stored once

Snapshots

Creating a snapshot:

  • snapShot(isMutable): Creates a new SHAMap instance

  • If isMutable = false: New SHAMap is immutable and shares all nodes

  • If isMutable = true: New SHAMap can be modified, triggering COW as needed

Snapshots enable versioning and safe concurrent operations without duplicating entire trees.


Traversal and Iteration

Depth-First Traversal: visitNodes

visitNodes provides complete tree traversal:

Algorithm

  • Uses explicit stack to manage traversal state

  • Processes nodes in depth-first order

  • For each inner node, iterates through all 16 branches

  • Descends into non-empty children

Use Cases

  • Tree validation

  • Bulk operations on all nodes

  • Custom tree analysis

Leaf-Only Traversal: visitLeaves

visitLeaves traverses only data-bearing nodes:

Implementation

  • Built on top of visitNodes

  • Filters to process only leaf nodes

  • Applies user function to each leaf's SHAMapItem

Use Cases

  • Data extraction

  • State enumeration

  • Content validation

Iterator-Based Traversal

SHAMap provides STL-compatible const iterators:

Iterator Interface

  • begin(), end(): Full tree iteration

  • upper_bound(key), lower_bound(key): Range queries

  • Iterates items in key order

Example Usage

for (auto it = shamap.begin(); it != shamap.end(); ++it) {
    // Access SHAMapItem via *it
}

Properties

  • Iteration order matches key ordering

  • Enables range-based operations

  • Compatible with standard algorithms

Parallel Traversal: walkMapParallel

For performance-critical operations: Algorithm

  • Divides tree into subtrees

  • Processes subtrees concurrently across multiple threads

  • Aggregates results after completion

Use Cases

  • Missing node detection at scale

  • Bulk hash validation

  • High-throughput synchronization

Synchronization

Missing Node Detection: getMissingNodes

The core synchronization primitive: Purpose

  • Identifies nodes required for complete tree reconstruction

  • Limits result set to a maximum count for bounded network requests

Algorithm

  1. Initialization: Start with root, create tracking structure

  2. Traversal: Walk tree using depth-first approach

  3. Child Examination: For each inner node:

    • Check each of 16 potential children

    • If child hash present but node missing, record as missing

    • If child exists and is inner node, descend further

  4. Deferred Reads: Initiate asynchronous fetches for potentially available nodes

  5. Full Below Optimization: Mark nodes as "full below" when all descendants present

Helper: gmn_ProcessNodes Processes one inner node:

  • Iterates all 16 branches

  • Records missing children

  • Tracks pending asynchronous fetches

  • Descends into inner children not yet "full below"

  • Marks node "full below" when all children present

Helper: gmn_ProcessDeferredReads Handles asynchronous fetch results:

  • Processes completed reads

  • Canonicalizes successfully fetched nodes

  • Records failed fetches as missing

  • Prepares state for continued traversal

Output

Returns a vector of (SHAMapNodeID, SHAMapHash) pairs representing missing nodes, prioritized for network retrieval.

Full Below Cache

An optimization for synchronization:

Purpose

  • Tracks subtrees that are completely synchronized

  • Prevents redundant traversal of known-complete subtrees

Mechanism

  • Generation-based tracking: each subtree marked with "full below" generation

  • When a subtree is marked full below, all descendants are known present

  • Synchronization can skip these subtrees

Benefits

  • Reduces CPU overhead during incremental synchronization

  • Minimizes redundant tree walking

  • Accelerates convergence to synchronized state

Node Addition and Canonicalization

Adding the Root: addRootNode

Initializes or verifies the root node:

Algorithm

  1. Check if root already exists with matching hash → return "duplicate"

  2. Validate input data → return "invalid" if malformed

  3. Deserialize node from input

  4. Canonicalize if backed by storage

  5. Set as tree root

  6. Notify filter if provided

  7. Return "useful"

Use Cases

  • Initial tree construction during sync

  • Verification during peer communication

  • Root validation in ledger acceptance

Adding Known Nodes: addKnownNode

Adds interior or leaf nodes during synchronization: Algorithm

  1. Traverse from root toward target node's position

  2. Verify path integrity:

    • Expected branch must exist

    • Branch hash must match provided hash

  3. If branch empty or mismatched → return "invalid"

  4. Deserialize and validate node

  5. Canonicalize if backed

  6. Insert into parent

  7. Notify filter if provided

  8. Return "useful" or "duplicate"

Safety Checks

  • Hash verification ensures node belongs in position

  • Path validation prevents malformed tree construction

  • Canonicalization prevents duplicate storage

Canonicalization

Ensures nodes are unique in memory: Purpose

  • Prevent duplicate nodes with identical hashes

  • Enable safe node sharing through caching

  • Provide thread-safe node insertion

Mechanism

  • Check TreeNodeCache for existing node with same hash

  • If found, use cached instance

  • If not found, insert new node into cache

  • Atomic operation prevents races

Benefits

  • Memory efficiency: identical nodes stored once

  • Thread safety: cache handles concurrent insertion

  • Fast equality checks: compare pointers instead of content

Cryptographic Proofs

Proof Generation: getProofPath

Creates a Merkle proof for a specific key: Algorithm

  1. Initialize empty proof path

  2. Walk from root to target leaf, guided by key

  3. At each step:

    • Serialize current node

    • Add serialized node to proof path

    • Descend to next node following key

  4. Proof path contains all nodes from root to leaf

Output Returns optional<vector>:

  • Each Blob is a serialized node

  • Ordered from root to leaf

  • Contains all information needed for verification

Use Cases

  • Light client verification

  • Cross-ledger proofs

  • State attestation

Proof Verification: verifyProofPath

Verifies a proof for a key and expected root hash: Algorithm

  1. Start from leaf end of proof path

  2. Deserialize leaf node, verify it contains the key

  3. Compute leaf hash

  4. For each inner node moving toward root:

    • Deserialize node

    • Extract branch corresponding to key at this depth

    • Verify computed hash matches branch hash in inner node

    • Compute inner node's hash

    • Update expected hash for next level

  5. Verify final computed hash matches expected root hash

Validation

  • Path must lead directly from root to leaf

  • All hashes must chain correctly

  • Leaf must contain the claimed key

Result Returns boolean:

  • true: Proof is valid, key exists with verified path to root

  • false: Proof is invalid or key not present

Proof Properties

Compactness

  • Proof size is O(log N) where N is tree size

  • Typically only 5-10 nodes for large trees

Trustlessness

  • Verifier needs only the root hash

  • No need to trust the proof provider

  • Cryptographic guarantee of validity

Applications

  • Light clients can verify state without full ledger

  • Cross-chain bridges can prove ledger state

  • Auditors can verify specific transactions

Serialization

Wire Serialization

Nodes must be serialized for network transmission and storage: Root Serialization: serializeRoot

  • Serializes the root node

  • Typically an inner node

  • Uses compressed or full format based on branch count

Node Serialization: getNodeFat Retrieves and serializes a node with optional descendants: Parameters

  • nodeID: Target node to serialize

  • depth: How many levels of descendants to include

  • fatLeaves: Whether to include leaf content

Algorithm

  1. Descend to requested node

  2. Serialize the node itself

  3. If depth > 0, recursively serialize children

  4. If fatLeaves, include full leaf content

  5. Return serialized data

Use Cases

  • Efficient bulk data transfer

  • Subtree synchronization

  • Bandwidth optimization by including anticipated child nodes

Serialization Formats

Leaf Node Format

  • Type prefix (distinguishes account state, tx, tx+meta)

  • SHAMapItem key

  • SHAMapItem data payload

  • No children or branches

Inner Node Format - Compressed

  • Header indicating compressed format

  • Bitmap showing which branches exist

  • For each existing branch:

    • Branch index

    • Child hash

  • Omits empty branches

Inner Node Format - Full

  • Header indicating full format

  • All 16 branches in order

  • For each branch:

    • Child hash (or empty marker)

  • No bitmap needed

Format Selection

  • Compressed: Few branches occupied, saves space

  • Full: Many branches occupied, simpler structure

  • Decision made automatically during serialization

Storage and Caching

Storage Abstraction: NodeStore

The NodeStore (detailed in a separate section) provides: Interface

  • Abstract key-value store for nodes

  • Keyed by node hash

  • Persistent backend (database, filesystem, etc.)

Responsibilities

  • Durable storage of nodes

  • Retrieval by hash

  • Lifecycle management (rotation, deletion)

Integration with SHAMap

  • SHAMap requests nodes by hash

  • NodeStore fetches from persistent storage

  • Fetched nodes cached in TreeNodeCache

TreeNodeCache

In-memory cache of immutable nodes: Purpose

  • Accelerate access to frequently used nodes

  • Reduce database queries

  • Enable node sharing across SHAMaps

Structure

  • Key: SHAMapHash

  • Value: shared_ptr to SHAMapTreeNode

  • Thread-safe concurrent access

Properties

  • Only immutable nodes (cowid = 0) are cached

  • Cache eviction based on access patterns

  • Canonicalization ensures uniqueness

Cache Benefits

  • Reduced I/O for hot nodes (root, high-level inner nodes)

  • Memory efficiency through sharing

  • Fast lookups for synchronization

FullBelowCache

Tracks synchronization state: Purpose

  • Records which subtrees are fully synchronized

  • Prevents redundant checks during sync

Mechanism

  • Generation-based tracking

  • Invalidated when new synchronization begins

Impact

  • Dramatically reduces work during incremental sync

  • Enables efficient resumption after interruption

Family and NodeFamily

Family Interface

  • Abstract interface for SHAMap resource management

  • Provides access to:

    • TreeNodeCache

    • FullBelowCache

    • NodeStore backend

NodeFamily

  • Concrete implementation

  • Manages shared resources across multiple SHAMaps

  • Coordinates cache eviction policies

Benefits

  • Centralized resource management

  • Shared caches across related SHAMaps

  • Simplified lifecycle management

Thread Safety

Concurrency Mechanisms

SHAMap employs multiple strategies for thread safety: Canonicalization Lock

  • Ensures only one thread inserts a node with a given hash

  • Prevents race conditions during cache insertion

  • Atomic cache operations guarantee uniqueness

SHAMapInnerNode Synchronization

  • Uses atomic operations for child pointer access

  • Lock field (std::atomicstd::uint16_t) protects concurrent modifications

  • Enables safe concurrent reads

Immutable Node Sharing

  • Immutable nodes (cowid = 0) are inherently thread-safe

  • Multiple threads can read simultaneously

  • No synchronization needed for reads

Copy-on-Write Protection

  • Mutable operations clone nodes before modification

  • Cloning prevents interference between SHAMap instances

  • Each mutable SHAMap operates on exclusive copies

Cache Thread Safety

TreeNodeCache

  • Designed for concurrent access

  • Internal synchronization primitives

  • Safe for multi-threaded lookups and insertions

FullBelowCache

  • Thread-safe read/write operations

  • Generation-based invalidation is atomic

Safe Patterns

Read Operations

  • Safe on immutable SHAMaps from any thread

  • No locking required for traversal or lookup

Write Operations

  • Mutable SHAMaps should be accessed from single thread

  • Or externally synchronized if multi-threaded access needed

Synchronization Operations

  • getMissingNodes uses async I/O safely

  • Deferred reads processed without blocking other operations

Advanced Topics

State Reconstruction

The transaction tree ensures verifiable state reconstruction: Problem

  • Without transaction history, multiple different sequences could produce same final state

  • Blockchain requires unique, verifiable history

Solution

  • Transaction tree preserves complete state transition chain

  • Current state sₙ = f(sₙ₋₁, txₙ) where f is deterministic

  • Full history ensures unique path: s₀ → s₁ → ... → sₙ

Verification

  • Given state tree root and transaction tree root at ledger L

  • Can verify all transactions that produced state

  • Ensures blockchain integrity

Performance Characteristics

Lookup Complexity

  • O(k) where k is key length in nibbles (typically 64 for 256-bit keys)

  • O(log₁₆ N) in practice, very shallow trees

Insertion/Deletion

  • O(k) path traversal

  • Plus O(h) hash updates where h is path height

  • Copy-on-write adds constant overhead

Synchronization

  • O(d) where d is number of differences

  • Hash comparison eliminates need to check identical subtrees

  • Parallel traversal improves wall-clock time

Space Complexity

  • O(N) for N data items

  • Inner node overhead minimal due to branching factor

  • Sharing reduces memory for snapshots

Design Tradeoffs

Branching Factor

  • Radix 16 chosen for balance:

  • Wide enough to keep tree shallow

  • Narrow enough for manageable child overhead

  • Aligns with hexadecimal for natural key encoding

Immutable Trimming Limitation

  • Immutable SHAMaps cannot evict nodes

  • Tradeoff: simplicity and safety vs memory flexibility

  • Solution: use mutable copies for long-lived operations requiring memory management

Hash Algorithm

  • SHA-512Half (first 256 bits of SHA-512)

  • Balance between security and performance

  • Enables 256-bit node identifiers

Implementation Reference

The complete implementation details, including class hierarchies, method signatures, and code examples, are available in the accompanying detailed documentation. Key source files include:

  • Node types: SHAMapTreeNode.h, SHAMapInnerNode.h, SHAMapLeafNode.h

  • Core SHAMap: SHAMap.h, detail/SHAMap.cpp

  • Synchronization: detail/SHAMapSync.cpp

  • Traversal: detail/SHAMapDelta.cpp

  • Caching: TreeNodeCache.h, Family.h, NodeFamily.h

  • Storage: SHAMapStoreImp.h, SHAMapStoreImp.cpp

Summary

The SHAMap is a sophisticated data structure that combines Merkle tree cryptographic properties with Patricia trie efficiency. Its design enables:

  • Efficient State Management: O(1) subtree comparison through hash-based verification

  • Cryptographic Integrity: Merkle proofs provide trustless verification of data inclusion

  • Optimized Synchronization: Hash comparison and parallel traversal enable rapid convergence

  • Memory Efficiency: Copy-on-write and node sharing minimize duplication

  • Thread Safety: Canonicalization and immutable nodes support concurrent access

  • Scalability: Integration with caching and persistent storage handles large datasets

These properties make SHAMap ideal for blockchain state management, where data integrity, efficient synchronization, and verifiable state transitions are paramount.

Last updated