SHAMap Architecture and Node Hierarchy

← Back to SHAMap and NodeStore: Data Persistence and State Management


Introduction

Now that you understand the mathematical foundations of Merkle-Patricia tries, let's explore how XRPL actually implements them in the SHAMap data structure.

The SHAMap is responsible for:

  • Maintaining account and transaction state in a cryptographically-verified Merkle tree

  • Computing root hashes that represent entire ledger state

  • Enabling efficient navigation through key-based lookups

  • Supporting snapshots and immutability for historical ledgers

  • Providing proof generation for trustless state verification

This chapter covers the architectural overview and node types. Later chapters dive into specific algorithms for navigation, hashing, and synchronization.

Core Design Principles

The SHAMap architecture achieves three critical properties:

1. Cryptographic Integrity

Every change to data propagates through hashes up to the root:

Modify Account0 data:
  Account0 hash changes
  Parent hash changes (incorporates new Account0 hash)
  Grandparent hash changes
  ... up to root

New root hash is deterministically different from old root hash

This ensures that no data can be modified undetected. Changing even one bit in an account's balance changes the root hash.

2. Efficient Navigation

The Patricia trie structure uses account identifiers as navigation guides:

Account hash: 0x3A7F2E1B4C9D...
Tree navigation:
  Level 0: Extract digit 3 → go to child 3
  Level 1: Extract digit A → go to child A
  Level 2: Extract digit 7 → go to child 7
  ... follow 4-bit chunks down the tree

No binary search needed. The path is directly encoded in the key.

3. Optimized Synchronization

Hash-based comparison eliminates unnecessary data transfer:

Peer A claims: "Root is 0xABCD"
Peer B claims: "Root is 0xXYZW"

If 0xABCD == 0xXYZW:
  Both have identical ledger state
  No synchronization needed

If different:
  Compare children's hashes
  Identify exactly which subtrees differ
  Request only those subtrees

This allows new nodes to synchronize full ledgers from peers in minutes.

The SHAMap Instance

A SHAMap instance represents a complete tree of ledger state:

Core Data:

class SHAMap {
    // The root node (always an inner node)
    std::shared_ptr<SHAMapInnerNode> mRoot;

    // Tree state (Immutable, Mutable, or Synching)
    State mState;

    // For mutable trees: unique identifier
    std::uint32_t mCowID;  // Copy-on-write identifier

    // For navigating to nodes
    std::shared_ptr<Family> mFamily;
};

Key Properties:

  • Root Node: Always an inner node, never a leaf (even if only one account, still has inner structure)

  • Depth: Exactly 64 levels (256-bit keys ÷ 4 bits per level)

  • Navigation Determinism: Any key uniquely determines its path from root to leaf

Node Hierarchy

The SHAMap consists of three conceptual layers:

Layer 1: Root

              [Inner Node - Root]
              (hashes all state)
  • Always present

  • Always an inner node

  • Contains the root hash representing entire tree

  • Can have up to 16 children (one for each hex digit 0-F)

Layer 2: Internal Structure

              [Inner Node - Root]
             /    |    \    ...    \
       [Inner]  [Inner] [Inner] ... [Inner]
       / | \    / | \
   [Inner][Inner][Inner]
  • Inner nodes serve as branch points

  • Each can have 0-16 children

  • Store only hashes of children, not actual data

  • No data items in inner nodes

Layer 3: Leaf Nodes

           [Inner Node]
           / | \ ... \
       [Leaf][Leaf][Leaf]
        |     |     |
    Account Account Account
     State    State   State
  • Terminal nodes containing actual data items

  • Types: Account state, Transaction, Transaction+Metadata

  • All leaves in a SHAMap tree are homogeneous (same type)

Inner Nodes

Inner nodes form the branching structure:

Structure:

class SHAMapInnerNode {
    // Up to 16 child slots (indexed 0-F)
    std::array<Branch, 16> mBranches;

    // Cryptographic hash of this node
    uint256 mHash;

    // Bitset: which children exist
    std::uint16_t mChildBits;

    // For copy-on-write: which SHAMap owns this node
    std::uint32_t mCowID;

    // Synchronization optimization: generation marker
    std::uint32_t mFullBelow;
};

// Each branch slot contains:
struct Branch {
    std::shared_ptr<SHAMapTreeNode> mNode;  // nullptr if empty
    uint256 mHash;  // Hash of child (or empty)
};

Key Characteristics:

  1. Do not store data items directly - Only hashing information

  2. Maintain cryptographic commitments through child hashes

  3. Variable occupancy - Not all 16 children present

  4. Support both serialization formats - Compressed and full

Serialization Formats:

Inner nodes support two wire formats:

Compressed Format (used when most slots are empty):

Header: "Inner (compressed)"
Bitmap: Which branches exist
For each existing branch:
  Branch index
  Child hash

Saves space by omitting empty branches.

Full Format (used when most slots are occupied):

Header: "Inner (full)"
For each of 16 branches:
  Child hash (or empty marker)

Simpler structure despite larger size.

Format Selection Algorithm:

XRPL automatically chooses format based on branch count:

Branch count:
  0-8:   Use compressed (saves ~40 bytes)
  9+:    Use full (simpler, fewer bytes overall)

Leaf Nodes

Leaf nodes store the actual blockchain data:

Base Properties:

class SHAMapLeafNode {
    // The data contained in this leaf
    std::shared_ptr<SHAMapItem> mItem;

    // Cryptographic hash
    uint256 mHash;

    // For copy-on-write
    std::uint32_t mCowID;
};

SHAMapItem Structure:

class SHAMapItem {
    // 256-bit unique identifier (determines tree position)
    uint256 mTag;

    // Variable-length data (transaction, account state, etc.)
    Blob mData;

    // Memory management: intrusive reference counting
};

Leaf Node Specializations:

Three distinct leaf node types exist, each with unique hashing:

1. Account State Leaves (hotACCOUNT_NODE)

  • Store account information

  • Include balances, settings, owned objects, trust lines

  • Type prefix during hashing prevents collision with other types

  • Updated when transactions affect accounts

2. Transaction Leaves (hotTRANSACTION_NODE)

  • Store transaction data

  • Do not include execution metadata

  • Immutable once added to ledger

  • Enable verification of transaction history

3. Transaction+Metadata Leaves (hotTRANSACTION_LEDGER_ENTRY)

  • Store transactions with execution metadata

  • Include results (success/failure, ledger entries modified)

  • Complete information for replaying or auditing

  • Support full transaction reconstruction

Why Multiple Types?

// Type prefix prevents collisions

uint256 hash_account_state = SHA512Half(
    PREFIX_ACCOUNT || key || data);

uint256 hash_transaction = SHA512Half(
    PREFIX_TRANSACTION || key || data);

// Even with identical (key, data), hashes differ
hash_account_state != hash_transaction

Ensures that moving data between leaves would be immediately detected as invalid.

SHAMapNodeID: Node Identification

Every node in the tree is uniquely identified by its position:

Components:

class SHAMapNodeID {
    // Distance from root (0 = root)
    std::uint32_t mDepth;

    // Path from root to this node
    // Packed as 4-bit nibbles in a uint256
    uint256 mNodeID;
};

Path Encoding:

The path is encoded as 4-bit chunks (nibbles) in a uint256:

Node at depth 3, path [3, A, 7]:

mNodeID = 0x3A7000...000
          ^^^     (significant nibbles)
            ^^^^^^ (zero padding for remaining levels)

Depth 0 (root):     mNodeID = 0x0000...000
Depth 1:            mNodeID = 0x3000...000
Depth 2:            mNodeID = 0x3A00...000
Depth 3:            mNodeID = 0x3A70...000
...
Depth 64:           mNodeID = complete (all 64 nibbles filled)

Key Operations:

getChildNodeID(branch) - Compute child position:

SHAMapNodeID parent(depth=2, nodeID=0x3A00...);
SHAMapNodeID child = parent.getChildNodeID(7);
// Result: depth=3, nodeID=0x3A70...

selectBranch(nodeID, key) - Determine which branch to follow:

uint256 key = 0x3A7F2E1B4C9D...;
int branch = key.nthNibble(depth);  // Extract nth 4-bit chunk
// For depth=2: branch = 7 (extract bits 8-11)

This deterministic navigation ensures every key has exactly one path through the tree.

State Management

SHAMaps exist in different states reflecting their role in the system:

Immutable State

mState == State::Immutable;
mCowID == 0;
  • Represents a finalized, historical ledger

  • Nodes cannot be modified

  • Multiple readers can access simultaneously

  • Critical limitation: cannot be trimmed (nodes loaded stay in memory)

  • Used for consensus history

Mutable State

mState == State::Modifying;
mCowID != 0;  // Unique identifier for this tree
  • Represents work-in-progress ledger state

  • Nodes can be modified through copy-on-write

  • Safe mutations without affecting other SHAMap instances

  • Single writer (typically), multiple readers possible

  • Used for constructing new ledger

Synching State

mState == State::Synching;
  • Transitional state during network synchronization

  • Allows incremental tree construction

  • Transitions to Immutable or Modifying when complete

  • Used when receiving missing nodes from peers

State Transitions:

        New Tree
          |
          v
    [Synching] <-- Receiving nodes from peers
          |
          +--> [Immutable] <-- Finalized ledger
          |
          +--> [Modifying] <-- New transactions incoming

Copy-on-Write Mechanism

The copy-on-write system enables safe node sharing and snapshots:

Principle:

Nodes are shared between SHAMaps using shared pointers. When a mutable SHAMap needs to modify a shared node, it creates a copy:

// Check if node is owned by this SHAMap
if (node->getCowID() != mCowID) {
    // Node is shared or immutable
    // Create a copy
    auto newNode = node->clone();
    newNode->setCowID(mCowID);  // Mark as owned by this tree
    return newNode;
} else {
    // Node already owned by this tree
    // Safe to modify in place
    return node;
}

Node Sharing Rules:

cowID == 0:           Immutable (shared by all)
cowID == tree.mCowID: Owned by this tree (safe to modify)
cowID != tree.mCowID: Owned by another tree (must copy before modifying)

Benefits:

  1. Efficient Snapshots: New SHAMap instances share unmodified subtrees

    Create snapshot before ledger close:
      New immutable SHAMap shares root and all unchanged nodes
      No deep copy required
  2. Safe Concurrent Access: Modifications don't affect other instances

    SHAMap A modifies Account1:
      Clones nodes along path to Account1
      Leaves rest of tree shared
    SHAMap B (different ledger) unaffected
  3. Memory Efficiency: Identical subtrees stored once

    100 ledgers share 99% of tree structure
    Only 1% of data duplicated for different accounts

Integration with NodeStore

SHAMap is designed for in-memory operation, but nodes can be persisted:

Persistent Nodes:

Some nodes are marked as "backed" - they exist in NodeStore:

// Node came from persistent storage
mNode->setBacked();

// When synchronizing, try to retrieve from storage
std::shared_ptr<SHAMapTreeNode> node = nodestore.fetch(hash);
if (node) {
    // Add to tree
    canonicalizeNode(node);  // Ensure uniqueness
}

Canonicalization:

Ensures nodes are unique in memory:

// Check cache for existing node with same hash
std::shared_ptr<SHAMapTreeNode> cached = cache.get(hash);
if (cached) {
    return cached;  // Use existing node
} else {
    cache.insert(hash, newNode);  // Cache new node
    return newNode;
}

This enables:

  • Safe node sharing across multiple SHAMap instances

  • Fast equality checking (compare pointers, not content)

  • Memory efficiency (identical nodes stored once)

Summary

Key Architectural Elements:

  1. SHAMap Instance: Complete tree representing one ledger version

  2. Inner Nodes: Branching structure with 0-16 children, storing hashes

  3. Leaf Nodes: Terminal nodes containing account or transaction data

  4. SHAMapNodeID: Identifies node position through depth and path

  5. State Management: Immutable (historical), Mutable (work-in-progress), Synching (incomplete)

  6. Copy-on-Write: Enables snapshots and safe sharing

  7. NodeStore Integration: Persistent layer for large datasets

Design Properties:

  • Perfect balance: All leaves at approximately same depth (log_16 of account count)

  • Deterministic navigation: Key uniquely determines path

  • Immutable persistence: Historical ledgers safe from modification

  • Efficient sharing: Snapshots require minimal additional memory

  • Scalable: Handles millions of accounts with logarithmic operations

In the next chapter, we'll explore how nodes are navigated, hashes are computed, and the tree structure is maintained.

Last updated