Traversal, Iteration, and Synchronization

← Back to SHAMap and NodeStore: Data Persistence and State Management


Introduction

With navigation and hashing understood, we now explore how SHAMap enables efficient tree traversal and network synchronization. These operations are critical for:

  • New nodes catching up to the network

  • Peers verifying they have identical state

  • Extracting state information for queries

  • Detecting and resolving ledger divergences

Traversal Algorithms

SHAMap provides multiple traversal strategies depending on the use case:

Depth-First Traversal: visitNodes

The visitNodes method provides complete tree traversal:

void SHAMap::visitNodes(
    std::function<void(SHAMapTreeNode*)> callback)
{
    std::stack<std::shared_ptr<SHAMapTreeNode>> toVisit;
    toVisit.push(mRoot);

    while (!toVisit.empty()) {
        auto node = toVisit.top();
        toVisit.pop();

        callback(node);

        // Process inner node's children
        if (auto inner = std::dynamic_pointer_cast<SHAMapInnerNode>(node)) {
            for (int i = 15; i >= 0; --i) {  // Reverse order for stack
                if (auto child = inner->getChild(i)) {
                    toVisit.push(child);
                }
            }
        }
    }
}

Use Cases:

  • Tree validation (verify all hashes)

  • Bulk operations on all nodes

  • Custom tree analysis

Leaf-Only Traversal: visitLeaves

void SHAMap::visitLeaves(
    std::function<void(SHAMapItem const&)> callback)
{
    visitNodes([this, &callback](SHAMapTreeNode* node) {
        if (auto leaf = dynamic_cast<SHAMapLeafNode*>(node)) {
            callback(leaf->getItem());
        }
    });
}

Iterator-Based Traversal

for (auto it = shamap.begin(); it != shamap.end(); ++it) {
    // Access SHAMapItem via *it
    // Iteration order matches key ordering
}

Parallel Traversal: walkMapParallel

For performance-critical operations:

void SHAMap::walkMapParallel(
    std::function<void(SHAMapTreeNode*)> callback,
    int numThreads)
{
    // Divide tree into subtrees
    // Process subtrees concurrently
    // Aggregate results
}

Use Cases:

  • Missing node detection at scale

  • High-throughput synchronization

Missing Node Detection

The core synchronization primitive identifies nodes needed for complete tree reconstruction:

Algorithm: getMissingNodes

std::vector<std::pair<SHAMapNodeID, uint256>>
SHAMap::getMissingNodes(
    std::function<bool(uint256 const&)> nodeAvailable)
{
    std::vector<std::pair<SHAMapNodeID, uint256>> missing;

    std::stack<SHAMapNodeID> toVisit;
    toVisit.push(SHAMapNodeID(0));  // Root node

    while (!toVisit.empty() && missing.size() < MAX_RESULTS) {
        SHAMapNodeID nodeID = toVisit.top();
        toVisit.pop();

        // Check if we have this node
        auto node = getNode(nodeID);
        if (!node) {
            missing.push_back({nodeID, getExpectedHash(nodeID)});
            continue;
        }

        // For inner nodes, check children
        if (auto inner = dynamic_cast<SHAMapInnerNode*>(node.get())) {
            for (int branch = 0; branch < 16; ++branch) {
                uint256 childHash = inner->getChildHash(branch);
                if (childHash.isValid()) {
                    // Child should exist
                    if (!nodeAvailable(childHash)) {
                        // Child is missing
                        SHAMapNodeID childID = nodeID.getChildNodeID(branch);
                        toVisit.push(childID);
                    }
                }
            }
        }
    }

    return missing;
}

Output:

Returns vector of (NodeID, Hash) pairs representing missing nodes, prioritized for network retrieval.

Full Below Optimization

An optimization preventing redundant traversal:

class SHAMapInnerNode {
    // Generation counter: when this subtree was marked "complete"
    std::uint32_t mFullBelow = 0;
};

if (node->mFullBelow == currentGeneration) {
    // Entire subtree known complete
    // Skip traversal
    continue;
}

When a subtree is verified complete (all descendants present), skip traversing it again until a new sync starts.

Node Addition and Canonicalization

Adding the Root Node: addRootNode

Initializes or verifies the root node:

SHAMapAddNode SHAMap::addRootNode(
    uint256 const& hash,
    Blob const& nodeData,
    SHANodeFilter* filter = nullptr)
{
    // Check if root already exists with matching hash
    if (mRoot && mRoot->getHash() == hash) {
        return SHAMapAddNode::duplicate();
    }

    // Validate and deserialize
    auto node = deserializeNode(nodeData, SHAMapNodeID(0));
    if (!node) {
        return SHAMapAddNode::invalid();
    }

    // Canonicalize: ensure uniqueness in cache
    canonicalizeNode(node);

    // Set as root
    mRoot = std::dynamic_pointer_cast<SHAMapInnerNode>(node);

    if (filter) {
        filter->foundNode(hash);
    }

    return SHAMapAddNode::useful();
}

Adding Known Nodes: addKnownNode

Adds interior or leaf nodes during synchronization:

SHAMapAddNode SHAMap::addKnownNode(
    SHAMapNodeID const& nodeID,
    Blob const& nodeData,
    SHANodeFilter* filter = nullptr)
{
    // Deserialize node
    auto newNode = deserializeNode(nodeData, nodeID);
    if (!newNode) {
        return SHAMapAddNode::invalid();
    }

    // Canonicalize (prevent duplicate nodes in memory)
    canonicalizeNode(newNode);

    // Navigate from root to parent
    auto parent = getNode(nodeID.getParentNodeID());
    if (!parent || !parent->isInner()) {
        return SHAMapAddNode::invalid();
    }

    // Verify hash matches before insertion
    int branch = nodeID.getBranch();
    if (parent->getChildHash(branch) != newNode->getHash()) {
        return SHAMapAddNode::invalid();
    }

    // Insert into tree
    auto parentInner = std::dynamic_pointer_cast<SHAMapInnerNode>(parent);
    parentInner->setChild(branch, newNode);

    if (filter) {
        filter->foundNode(newNode->getHash());
    }

    return SHAMapAddNode::useful();
}

Canonicalization

Purpose:

Ensure nodes are unique in memory (one NodeObject per hash):

std::shared_ptr<SHAMapTreeNode>
SHAMap::canonicalizeNode(std::shared_ptr<SHAMapTreeNode> node)
{
    uint256 hash = node->getHash();

    // Check cache for existing node
    auto cached = mNodeCache->get(hash);
    if (cached) {
        return cached;  // Use cached instance
    }

    // New node - insert into cache
    mNodeCache->insert(hash, node);
    return node;
}

Benefits:

  1. Memory Efficiency: Identical nodes stored once

  2. Thread Safety: Cache handles concurrent insertion atomically

  3. Fast Equality: Compare pointers instead of content

  4. Shared Trees: Multiple SHAMaps can share nodes

Synchronization Scenario

The Complete Flow:

1. Syncing node receives ledger header with:
   - Ledger sequence number
   - Account state tree root hash
   - Transaction tree root hash

2. Request missing nodes from peers:
   - Start with root hashes
   - Call getMissingNodes()
   - Get list of missing (NodeID, Hash) pairs

3. Fetch missing nodes from network:
   - Request nodes in parallel
   - Asynchronously add them with addKnownNode()

4. Repeat until complete:
   - Call getMissingNodes() again
   - As new nodes are added, fewer appear missing
   - Eventually: getMissingNodes() returns empty list

5. Verification:
   - Recompute root hashes from all nodes
   - Verify they match received values
   - Ledger is now complete and verified

6. Persist:
   - Nodes are stored in NodeStore
   - SHAMap is marked immutable
   - Ledger is now available locally

Performance Metrics:

Small ledger (100k accounts):
  Nodes in tree: ~20,000
  Network requests: ~20,000 (batch fetch reduces count)
  Time to sync: seconds to minutes

Large ledger (10M accounts):
  Nodes in tree: ~2,000,000
  Network requests: ~100,000 (batching and parallel)
  Time to sync: hours to days

State Reconstruction Guarantee

The transaction tree, persisted in NodeStore, ensures unique history:

Problem:

Without transaction history, many sequences could produce same state:

State A --tx1--> State B
       \--tx2--/

Multiple paths, same end state
Which actually happened?

Solution:

The transaction tree proves the exact sequence:

Ledger header contains:
  - Account state tree root (current balances)
  - Transaction tree root (complete history)

Given state tree root + transaction tree root:
  Can verify exact sequence that produced this state
  No ambiguity about what happened

NodeStore's Role:

Both tree nodes are persisted:

  • Query state tree to find current values

  • Query transaction tree to find history

  • Together: complete, verifiable record

Summary

Key Algorithms:

  1. Traversal: visitNodes, visitLeaves, iterators

  2. Missing Detection: getMissingNodes with Full Below cache

  3. Node Addition: addRootNode, addKnownNode with canonicalization

  4. Synchronization: Complete flow from root to leaves

  5. Verification: Hash chain validation

Performance Properties:

Tree navigation:        O(log N) worst case, O(1) typical
Missing detection:      O(log N) comparisons
Complete sync:          O(N) for N nodes, ~O(log N) * bandwidth
Parallel optimization:  N-way parallelism with thread pools

Critical Insight:

Synchronization works because of hash-based tree structure:

  • Compare root hashes: O(1) comparison

  • If differ: compare children: O(16) comparisons max

  • Recursive descent: O(log N) total comparisons

  • vs. comparing all N leaves directly

This logarithmic advantage is what makes blockchain synchronization practical.

Last updated