Trees, Hashing, and Cryptographic Commitments

← Back to SHAMap and NodeStore: Data Persistence and State Management

Introduction

Before diving into the implementation of SHAMap and NodeStore, we need to understand the mathematical foundations they're built on. This chapter covers:

Cryptographic hashing and why it creates mathematical guarantees
Tree structures and why they matter for efficiency
How Merkle trees combine both to create verifiable commitments to data
Patricia tries and why they're optimal for key-value storage

Don't skip this chapter thinking it's just theory. Every design decision in SHAMap flows directly from the properties of these data structures. Understanding the foundations will make the implementation clear.

Cryptographic Hash Functions

A cryptographic hash function H has three critical properties:

1. Determinism

H(x) always returns the same result
H("hello") = 0x2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
H("hello") = 0x2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Same input, always same output. This enables verification: others can compute the same hash and confirm correctness.

2. Collision Resistance

Finding two different inputs with the same hash output is computationally infeasible:

H(x) = H(y) where x ≠ y  → "collision"

For SHA-256: Finding a collision requires ~2^128 operations
Current computers: ~10^18 operations/second
Time required: 10^20 seconds ≈ 3 billion years

Cryptographic security: "practical impossibility"

3. Avalanche Effect

Tiny changes produce completely different outputs:

H("hello") = 0x2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
H("hallo") = 0xd3751713b7ac3d1ab39b1f7b1859d6e7baeac08b8cdb1c8fefd1f96a3f3c17f8

Change one character → hash completely changes

Implication for XRPL:

Hash output acts as a "fingerprint" of data:

Two accounts with identical state have identical hashes
Even one byte difference produces completely different hashes
Anyone can verify a claimed hash by recomputing it
Cannot forge a hash without recomputing it (collision-resistant)

XRPL uses SHA-512Half: First 256 bits of SHA-512. This gives:

256-bit output (32 bytes, fits in a uint256)
Security level of ~128 bits (same as AES-128)
Good performance for cryptographic verification

Tree Structures

Trees are recursive data structures: a root node with zero or more children, each of which is a tree.

         Root
        /  |  \
       /   |   \
      A    B    C
     / \   |   / \
    D   E  F  G   H

Key Tree Property: Logarithmic Depth

For a balanced tree with branching factor B and N items:

Depth = log_B(N)
Binary tree (B=2): 1M items → depth 20
Radix-16 tree (B=16): 1M items → depth 5
Radix-256 tree (B=256): 1M items → depth 3

Why Depth Matters:

Access time: proportional to depth
Update time: change leaf → rehash path from leaf to root (depth operations)
Synchronization: identify differences by comparing hashes (depth comparisons)

Higher branching factor = shallower tree = faster operations.

The Balance Problem:

But trees can become unbalanced:

Sequential insertion:

Balanced (good):      Unbalanced (bad - linked list):
      4                        1
     / \                        \
    2   6                        2
   /|   |\                        \
  1 3  5 7                         3
                                    \
                                     4

For blockchain state, accounts are inserted in unpredictable order. Without careful tree structure, you get unbalanced trees and logarithmic operations degrade.

Patricia Tries (Radix Tries)

Patricia tries solve the balance problem by using the key itself as navigation.

Basic Idea:

For a 256-bit key (like an account address), use the bits to guide traversal:

Radix-2 (binary): Each bit (0 or 1) determines left or right child
Radix-4: Each 2-bit pair determines one of 4 children
Radix-16 (hex): Each 4-bit nibble determines one of 16 children
Radix-256: Each 8-bit byte determines one of 256 children

Example: Radix-16 Patricia Trie

Account1 hash: 0x3A7F2E1B...
Account2 hash: 0x7B4C9D3A...

Account1: Navigate using hex digits
  Level 0: Digit 0 (3) → go to child 3
  Level 1: Digit 1 (A) → go to child A
  Level 2: Digit 2 (7) → go to child 7
  ...
  Eventually reaches leaf containing Account1

Account2: Navigate using hex digits
  Level 0: Digit 0 (7) → go to child 7
  Level 1: Digit 1 (B) → go to child B
  ...
  Eventually reaches leaf containing Account2

Balance Guarantee:

Since navigation is determined by the key:

All paths from root to any leaf have the same structure
Tree depth depends only on key length (256 bits / 4 bits = 64 levels)
Perfect balance for any set of keys

Space Trade-off:

Inner nodes have up to 16 pointers (for radix-16), but most sparse:

Many branches are empty (no accounts in that range)
Compressed representation avoids wasting space
Net result: similar space as binary tree despite higher branching

XRPL's Choice: Radix-16

// From rippled source: 16 possible children per level
static const int NUM_BRANCHES = 16;

// Each level represents 4 bits (one hex digit) of the key
for (int i = 0; i < keyLengthInBits; i += 4) {
    int branch = (key >> (keyLengthInBits - i - 4)) & 0x0F;
    // Navigate to child at position 'branch'
}

Why radix-16?

Shallow trees: 256-bit keys → 64 levels (log_16 of ledger size)
Manageable fanout: 16 children per node (balanced tree complexity)
Natural alignment: hex notation matches code
Proven in Bitcoin and Ethereum: both use similar approaches

Merkle Trees

A Merkle tree combines tree structure with cryptographic hashing:

Definition:

Leaf nodes: Contain data (accounts, transactions)
Inner nodes: Contain hashes of their children
Root hash: Represents the entire tree

Key Property: Hash Propagation

When a leaf changes, its hash changes. This affects its parent:

Step 1: Account changes
  Old hash: H("Alice: 100 XRP")
  New hash: H("Alice: 90 XRP")

Step 2: Parent recomputes its hash
  Old hash: H(H(Account) || H(Other_Accounts))
  New hash: H(H(NewAccount) || H(Other_Accounts))

Step 3: Grandparent recomputes
  ... and so on up the tree

Root hash changes

Benefit: O(1) Subtree Comparison

Compare two trees:

Tree A:                Tree B:
Root: 0xABCD          Root: 0xXYZW

0xABCD == 0xXYZW?
  YES:  Entire trees are identical
  NO:   At least one leaf differs

Without Merkle trees:

Compare all N leaves directly: O(N) time

With Merkle trees:

Compare root hashes: O(1) time
If different, compare child hashes to find divergence: O(log N) comparisons

Benefit: Logarithmic Proofs

Prove that a specific item is in the tree:

Merkle Proof for "Alice: 90 XRP":

Show:  H("Alice: 90 XRP") = 0x1234...
       H(Sibling1) = 0x5678...
       H(Sibling2) = 0x9ABC...

Verifier computes:
  Combine with siblings working up to root
  Verify final hash matches claimed root

Only need to show O(log N) nodes, not entire tree

XRPL's Merkle-Patricia Trie

XRPL's SHAMap combines both approaches:

Patricia Trie Structure:

Navigation determined by account key (256-bit hash)
Radix-16 branching (hex digit at each level)
Perfect balance regardless of account insertion order

Merkle Properties:

Each node contains hash of its content
Inner node hash computed from children's hashes
Root hash represents entire ledger state
Changes propagate up to root

The Result:

                      Root Hash (e.g., 0xABCD...)
                     /               |              \
              Hash(0x0...)      Hash(0x1...)    Hash(0x2...)
              /        |                              \
        Hash(0x00)  Hash(0x01)                   Hash(0x20)
        /     \       /    |                         /    \
      Data  Data    Data  Data                    Data   Data
     (acct) (acct)

Properties:

Leaf depth: exactly log_16(number of accounts) ≈ 5 levels for 1M accounts
Hash changes: only O(log N) nodes affected by any modification
Verification: O(1) root hash comparison, O(log N) proof verification
Synchronization: O(log N) hash comparisons to find differences

Hashing Algorithms in XRPL

XRPL uses SHA-512Half for all hashing:

// From rippled source
uint256 hashFunc(Blob const& data) {
    using hasher = SHA512Half;  // SHA-512, keep first 256 bits
    return hasher()(data);
}

Why SHA-512Half instead of SHA-256?

SHA-512 has better performance on 64-bit CPUs
Taking first 256 bits gives same security as SHA-256
But faster on modern hardware
256-bit output fits uint256 perfectly

Hash Computation in SHAMap:

Inner nodes hash their children's hashes:

uint256 computeHash(SHAMapInnerNode& node) {
    Blob data;
    for (int i = 0; i < 16; ++i) {
        if (node.hasChild(i)) {
            uint256 childHash = node.getChildHash(i);
            data.append(childHash);  // Concatenate child hashes
        }
    }
    return SHA512Half(data);
}

Leaf nodes hash their content with a type prefix:

uint256 computeHash(SHAMapLeafNode& leaf, Type type) {
    Blob data;
    data.push_back(type);  // Type prefix prevents hash collision
    data.append(leaf.getKey());
    data.append(leaf.getData());
    return SHA512Half(data);
}

Why Type Prefixes?

Prevent collisions between different types of data:

Account with data: 0xAA || 0xBBBB
Leaf type 1, data 0xAA, item 0xBBBB

Transaction with data: 0xAABB || 0xBB
Leaf type 2, data 0xAABB, item 0xBB

These have same content but different meaning!
With type prefix:
  H(1 || 0xAA || 0xBBBB) ≠ H(2 || 0xAABB || 0xBB)

Immutability and Copy-on-Write

Once a Merkle tree root hash is committed to (published in a ledger), that entire tree must be immutable:

Changing any leaf would change the root hash
Root hash commitment would become invalid
Trust in the ledger is broken

Solution: Snapshots

For each ledger version, create a snapshot:

Current ledger state (in-memory, mutable):
  SHAMap with live nodes, modified as transactions arrive

Ledger closes:
  Create snapshot of current SHAMap
  This becomes immutable historical ledger
  All nodes locked (cannot be modified)

Next ledger:
  Start new mutable SHAMap (sharing unchanged nodes)
  Apply next block of transactions
  Close creates new immutable snapshot

Copy-on-Write:

When a mutable SHAMap modifies a shared node:

Historical Ledger 1:     Historical Ledger 2:
  Node A → Hash(old)       Node A → Hash(old)

Transaction modifies Node A:
  Can't modify Node A in place (Ledger 1 depends on it)
  Create copy: Node A' → Hash(new)
  Ledger 2 points to Node A'
  Ledger 1 still points to Node A

Both ledgers are verified by their root hashes, but they share most of the tree (unchanged nodes).

Summary

Key Concepts:

Cryptographic hashing: Creates fingerprints of data that are collision-resistant and deterministic
Trees: Enable hierarchical organization with logarithmic depth and operations
Patricia tries: Use key structure to guarantee balance regardless of insertion order
Merkle trees: Combine hashing with tree structure to create verifiable commitments
XRPL's design: Radix-16 Patricia trie with Merkle hashing creates:
- Perfect balance for any set of accounts
- O(log N) operations for updates and proofs
- O(1) tree comparison through root hash
- Safe snapshots through copy-on-write

Why This Matters:

Every algorithm in SHAMap—navigation, hashing, synchronization—follows directly from these properties. Understanding the math explains why the code is structured the way it is.

In the next chapter, we'll see how these principles are implemented in rippled's actual C++ code.

PreviousFoundational Concepts NextSHAMap Architecture and Node Hierarchy

Last updated 7 days ago