Traversal, Iteration, and Synchronization

← Back to SHAMap and NodeStore: Data Persistence and State Management

Introduction

With navigation and hashing understood, we now explore how SHAMap enables efficient tree traversal and network synchronization. These operations are critical for:

New nodes catching up to the network
Peers verifying they have identical state
Extracting state information for queries
Detecting and resolving ledger divergences

Traversal Algorithms

SHAMap provides multiple traversal strategies depending on the use case:

Depth-First Traversal: visitNodes

The visitNodes method provides complete tree traversal:

void SHAMap::visitNodes(
    std::function<void(SHAMapTreeNode*)> callback)
{
    std::stack<std::shared_ptr<SHAMapTreeNode>> toVisit;
    toVisit.push(mRoot);

    while (!toVisit.empty()) {
        auto node = toVisit.top();
        toVisit.pop();

        callback(node);

        // Process inner node's children
        if (auto inner = std::dynamic_pointer_cast<SHAMapInnerNode>(node)) {
            for (int i = 15; i >= 0; --i) {  // Reverse order for stack
                if (auto child = inner->getChild(i)) {
                    toVisit.push(child);
                }
            }
        }
    }
}

Use Cases:

Tree validation (verify all hashes)
Bulk operations on all nodes
Custom tree analysis

Leaf-Only Traversal: visitLeaves

void SHAMap::visitLeaves(
    std::function<void(SHAMapItem const&)> callback)
{
    visitNodes([this, &callback](SHAMapTreeNode* node) {
        if (auto leaf = dynamic_cast<SHAMapLeafNode*>(node)) {
            callback(leaf->getItem());
        }
    });
}

Iterator-Based Traversal

for (auto it = shamap.begin(); it != shamap.end(); ++it) {
    // Access SHAMapItem via *it
    // Iteration order matches key ordering
}

Parallel Traversal: walkMapParallel

For performance-critical operations:

void SHAMap::walkMapParallel(
    std::function<void(SHAMapTreeNode*)> callback,
    int numThreads)
{
    // Divide tree into subtrees
    // Process subtrees concurrently
    // Aggregate results
}

Use Cases:

Missing node detection at scale
High-throughput synchronization

Missing Node Detection

The core synchronization primitive identifies nodes needed for complete tree reconstruction:

Algorithm: getMissingNodes

std::vector<std::pair<SHAMapNodeID, uint256>>
SHAMap::getMissingNodes(
    std::function<bool(uint256 const&)> nodeAvailable)
{
    std::vector<std::pair<SHAMapNodeID, uint256>> missing;

    std::stack<SHAMapNodeID> toVisit;
    toVisit.push(SHAMapNodeID(0));  // Root node

    while (!toVisit.empty() && missing.size() < MAX_RESULTS) {
        SHAMapNodeID nodeID = toVisit.top();
        toVisit.pop();

        // Check if we have this node
        auto node = getNode(nodeID);
        if (!node) {
            missing.push_back({nodeID, getExpectedHash(nodeID)});
            continue;
        }

        // For inner nodes, check children
        if (auto inner = dynamic_cast<SHAMapInnerNode*>(node.get())) {
            for (int branch = 0; branch < 16; ++branch) {
                uint256 childHash = inner->getChildHash(branch);
                if (childHash.isValid()) {
                    // Child should exist
                    if (!nodeAvailable(childHash)) {
                        // Child is missing
                        SHAMapNodeID childID = nodeID.getChildNodeID(branch);
                        toVisit.push(childID);
                    }
                }
            }
        }
    }

    return missing;
}

Output:

Returns vector of (NodeID, Hash) pairs representing missing nodes, prioritized for network retrieval.

Full Below Optimization

An optimization preventing redundant traversal:

class SHAMapInnerNode {
    // Generation counter: when this subtree was marked "complete"
    std::uint32_t mFullBelow = 0;
};

if (node->mFullBelow == currentGeneration) {
    // Entire subtree known complete
    // Skip traversal
    continue;
}

When a subtree is verified complete (all descendants present), skip traversing it again until a new sync starts.

Node Addition and Canonicalization

Adding the Root Node: addRootNode

Initializes or verifies the root node:

SHAMapAddNode SHAMap::addRootNode(
    uint256 const& hash,
    Blob const& nodeData,
    SHANodeFilter* filter = nullptr)
{
    // Check if root already exists with matching hash
    if (mRoot && mRoot->getHash() == hash) {
        return SHAMapAddNode::duplicate();
    }

    // Validate and deserialize
    auto node = deserializeNode(nodeData, SHAMapNodeID(0));
    if (!node) {
        return SHAMapAddNode::invalid();
    }

    // Canonicalize: ensure uniqueness in cache
    canonicalizeNode(node);

    // Set as root
    mRoot = std::dynamic_pointer_cast<SHAMapInnerNode>(node);

    if (filter) {
        filter->foundNode(hash);
    }

    return SHAMapAddNode::useful();
}

Adding Known Nodes: addKnownNode

Adds interior or leaf nodes during synchronization:

SHAMapAddNode SHAMap::addKnownNode(
    SHAMapNodeID const& nodeID,
    Blob const& nodeData,
    SHANodeFilter* filter = nullptr)
{
    // Deserialize node
    auto newNode = deserializeNode(nodeData, nodeID);
    if (!newNode) {
        return SHAMapAddNode::invalid();
    }

    // Canonicalize (prevent duplicate nodes in memory)
    canonicalizeNode(newNode);

    // Navigate from root to parent
    auto parent = getNode(nodeID.getParentNodeID());
    if (!parent || !parent->isInner()) {
        return SHAMapAddNode::invalid();
    }

    // Verify hash matches before insertion
    int branch = nodeID.getBranch();
    if (parent->getChildHash(branch) != newNode->getHash()) {
        return SHAMapAddNode::invalid();
    }

    // Insert into tree
    auto parentInner = std::dynamic_pointer_cast<SHAMapInnerNode>(parent);
    parentInner->setChild(branch, newNode);

    if (filter) {
        filter->foundNode(newNode->getHash());
    }

    return SHAMapAddNode::useful();
}

Canonicalization

Purpose:

Ensure nodes are unique in memory (one NodeObject per hash):

std::shared_ptr<SHAMapTreeNode>
SHAMap::canonicalizeNode(std::shared_ptr<SHAMapTreeNode> node)
{
    uint256 hash = node->getHash();

    // Check cache for existing node
    auto cached = mNodeCache->get(hash);
    if (cached) {
        return cached;  // Use cached instance
    }

    // New node - insert into cache
    mNodeCache->insert(hash, node);
    return node;
}

Benefits:

Memory Efficiency: Identical nodes stored once
Thread Safety: Cache handles concurrent insertion atomically
Fast Equality: Compare pointers instead of content
Shared Trees: Multiple SHAMaps can share nodes

Synchronization Scenario

The Complete Flow:

1. Syncing node receives ledger header with:
   - Ledger sequence number
   - Account state tree root hash
   - Transaction tree root hash

2. Request missing nodes from peers:
   - Start with root hashes
   - Call getMissingNodes()
   - Get list of missing (NodeID, Hash) pairs

3. Fetch missing nodes from network:
   - Request nodes in parallel
   - Asynchronously add them with addKnownNode()

4. Repeat until complete:
   - Call getMissingNodes() again
   - As new nodes are added, fewer appear missing
   - Eventually: getMissingNodes() returns empty list

5. Verification:
   - Recompute root hashes from all nodes
   - Verify they match received values
   - Ledger is now complete and verified

6. Persist:
   - Nodes are stored in NodeStore
   - SHAMap is marked immutable
   - Ledger is now available locally

Performance Metrics:

Small ledger (100k accounts):
  Nodes in tree: ~20,000
  Network requests: ~20,000 (batch fetch reduces count)
  Time to sync: seconds to minutes

Large ledger (10M accounts):
  Nodes in tree: ~2,000,000
  Network requests: ~100,000 (batching and parallel)
  Time to sync: hours to days

State Reconstruction Guarantee

The transaction tree, persisted in NodeStore, ensures unique history:

Problem:

Without transaction history, many sequences could produce same state:

State A --tx1--> State B
       \--tx2--/

Multiple paths, same end state
Which actually happened?

Solution:

The transaction tree proves the exact sequence:

Ledger header contains:
  - Account state tree root (current balances)
  - Transaction tree root (complete history)

Given state tree root + transaction tree root:
  Can verify exact sequence that produced this state
  No ambiguity about what happened

NodeStore's Role:

Both tree nodes are persisted:

Query state tree to find current values
Query transaction tree to find history
Together: complete, verifiable record

Summary

Key Algorithms:

Traversal: visitNodes, visitLeaves, iterators
Missing Detection: getMissingNodes with Full Below cache
Node Addition: addRootNode, addKnownNode with canonicalization
Synchronization: Complete flow from root to leaves
Verification: Hash chain validation

Performance Properties:

Tree navigation:        O(log N) worst case, O(1) typical
Missing detection:      O(log N) comparisons
Complete sync:          O(N) for N nodes, ~O(log N) * bandwidth
Parallel optimization:  N-way parallelism with thread pools

Critical Insight:

Synchronization works because of hash-based tree structure:

Compare root hashes: O(1) comparison
If differ: compare children: O(16) comparisons max
Recursive descent: O(log N) total comparisons
vs. comparing all N leaves directly

This logarithmic advantage is what makes blockchain synchronization practical.

PreviousNavigation, Hashing, and Merkle Properties NextThe NodeStore

Last updated 7 days ago