# Traversal, Iteration, and Synchronization

[← Back to SHAMap and NodeStore: Data Persistence and State Management](/core-dev-bootcamp/module03.md)

***

### Introduction

With navigation and hashing understood, we now explore how SHAMap enables efficient tree traversal and network synchronization. These operations are critical for:

* New nodes catching up to the network
* Peers verifying they have identical state
* Extracting state information for queries
* Detecting and resolving ledger divergences

### Traversal Algorithms

SHAMap provides multiple traversal strategies depending on the use case:

**Depth-First Traversal: visitNodes**

The `visitNodes` method provides complete tree traversal:

```cpp
void SHAMap::visitNodes(
    std::function<void(SHAMapTreeNode*)> callback)
{
    std::stack<std::shared_ptr<SHAMapTreeNode>> toVisit;
    toVisit.push(mRoot);

    while (!toVisit.empty()) {
        auto node = toVisit.top();
        toVisit.pop();

        callback(node);

        // Process inner node's children
        if (auto inner = std::dynamic_pointer_cast<SHAMapInnerNode>(node)) {
            for (int i = 15; i >= 0; --i) {  // Reverse order for stack
                if (auto child = inner->getChild(i)) {
                    toVisit.push(child);
                }
            }
        }
    }
}
```

**Use Cases:**

* Tree validation (verify all hashes)
* Bulk operations on all nodes
* Custom tree analysis

**Leaf-Only Traversal: visitLeaves**

```cpp
void SHAMap::visitLeaves(
    std::function<void(SHAMapItem const&)> callback)
{
    visitNodes([this, &callback](SHAMapTreeNode* node) {
        if (auto leaf = dynamic_cast<SHAMapLeafNode*>(node)) {
            callback(leaf->getItem());
        }
    });
}
```

**Iterator-Based Traversal**

```cpp
for (auto it = shamap.begin(); it != shamap.end(); ++it) {
    // Access SHAMapItem via *it
    // Iteration order matches key ordering
}
```

**Parallel Traversal: walkMapParallel**

For performance-critical operations:

```cpp
void SHAMap::walkMapParallel(
    std::function<void(SHAMapTreeNode*)> callback,
    int numThreads)
{
    // Divide tree into subtrees
    // Process subtrees concurrently
    // Aggregate results
}
```

**Use Cases:**

* Missing node detection at scale
* High-throughput synchronization

### Missing Node Detection

The core synchronization primitive identifies nodes needed for complete tree reconstruction:

**Algorithm: getMissingNodes**

```cpp
std::vector<std::pair<SHAMapNodeID, uint256>>
SHAMap::getMissingNodes(
    std::function<bool(uint256 const&)> nodeAvailable)
{
    std::vector<std::pair<SHAMapNodeID, uint256>> missing;

    std::stack<SHAMapNodeID> toVisit;
    toVisit.push(SHAMapNodeID(0));  // Root node

    while (!toVisit.empty() && missing.size() < MAX_RESULTS) {
        SHAMapNodeID nodeID = toVisit.top();
        toVisit.pop();

        // Check if we have this node
        auto node = getNode(nodeID);
        if (!node) {
            missing.push_back({nodeID, getExpectedHash(nodeID)});
            continue;
        }

        // For inner nodes, check children
        if (auto inner = dynamic_cast<SHAMapInnerNode*>(node.get())) {
            for (int branch = 0; branch < 16; ++branch) {
                uint256 childHash = inner->getChildHash(branch);
                if (childHash.isValid()) {
                    // Child should exist
                    if (!nodeAvailable(childHash)) {
                        // Child is missing
                        SHAMapNodeID childID = nodeID.getChildNodeID(branch);
                        toVisit.push(childID);
                    }
                }
            }
        }
    }

    return missing;
}
```

**Output:**

Returns vector of (NodeID, Hash) pairs representing missing nodes, prioritized for network retrieval.

**Full Below Optimization**

An optimization preventing redundant traversal:

```cpp
class SHAMapInnerNode {
    // Generation counter: when this subtree was marked "complete"
    std::uint32_t mFullBelow = 0;
};

if (node->mFullBelow == currentGeneration) {
    // Entire subtree known complete
    // Skip traversal
    continue;
}
```

When a subtree is verified complete (all descendants present), skip traversing it again until a new sync starts.

### Node Addition and Canonicalization

**Adding the Root Node: addRootNode**

Initializes or verifies the root node:

```cpp
SHAMapAddNode SHAMap::addRootNode(
    uint256 const& hash,
    Blob const& nodeData,
    SHANodeFilter* filter = nullptr)
{
    // Check if root already exists with matching hash
    if (mRoot && mRoot->getHash() == hash) {
        return SHAMapAddNode::duplicate();
    }

    // Validate and deserialize
    auto node = deserializeNode(nodeData, SHAMapNodeID(0));
    if (!node) {
        return SHAMapAddNode::invalid();
    }

    // Canonicalize: ensure uniqueness in cache
    canonicalizeNode(node);

    // Set as root
    mRoot = std::dynamic_pointer_cast<SHAMapInnerNode>(node);

    if (filter) {
        filter->foundNode(hash);
    }

    return SHAMapAddNode::useful();
}
```

**Adding Known Nodes: addKnownNode**

Adds interior or leaf nodes during synchronization:

```cpp
SHAMapAddNode SHAMap::addKnownNode(
    SHAMapNodeID const& nodeID,
    Blob const& nodeData,
    SHANodeFilter* filter = nullptr)
{
    // Deserialize node
    auto newNode = deserializeNode(nodeData, nodeID);
    if (!newNode) {
        return SHAMapAddNode::invalid();
    }

    // Canonicalize (prevent duplicate nodes in memory)
    canonicalizeNode(newNode);

    // Navigate from root to parent
    auto parent = getNode(nodeID.getParentNodeID());
    if (!parent || !parent->isInner()) {
        return SHAMapAddNode::invalid();
    }

    // Verify hash matches before insertion
    int branch = nodeID.getBranch();
    if (parent->getChildHash(branch) != newNode->getHash()) {
        return SHAMapAddNode::invalid();
    }

    // Insert into tree
    auto parentInner = std::dynamic_pointer_cast<SHAMapInnerNode>(parent);
    parentInner->setChild(branch, newNode);

    if (filter) {
        filter->foundNode(newNode->getHash());
    }

    return SHAMapAddNode::useful();
}
```

### Canonicalization

**Purpose:**

Ensure nodes are unique in memory (one NodeObject per hash):

```cpp
std::shared_ptr<SHAMapTreeNode>
SHAMap::canonicalizeNode(std::shared_ptr<SHAMapTreeNode> node)
{
    uint256 hash = node->getHash();

    // Check cache for existing node
    auto cached = mNodeCache->get(hash);
    if (cached) {
        return cached;  // Use cached instance
    }

    // New node - insert into cache
    mNodeCache->insert(hash, node);
    return node;
}
```

**Benefits:**

1. **Memory Efficiency**: Identical nodes stored once
2. **Thread Safety**: Cache handles concurrent insertion atomically
3. **Fast Equality**: Compare pointers instead of content
4. **Shared Trees**: Multiple SHAMaps can share nodes

### Synchronization Scenario

**The Complete Flow:**

```
1. Syncing node receives ledger header with:
   - Ledger sequence number
   - Account state tree root hash
   - Transaction tree root hash

2. Request missing nodes from peers:
   - Start with root hashes
   - Call getMissingNodes()
   - Get list of missing (NodeID, Hash) pairs

3. Fetch missing nodes from network:
   - Request nodes in parallel
   - Asynchronously add them with addKnownNode()

4. Repeat until complete:
   - Call getMissingNodes() again
   - As new nodes are added, fewer appear missing
   - Eventually: getMissingNodes() returns empty list

5. Verification:
   - Recompute root hashes from all nodes
   - Verify they match received values
   - Ledger is now complete and verified

6. Persist:
   - Nodes are stored in NodeStore
   - SHAMap is marked immutable
   - Ledger is now available locally
```

**Performance Metrics:**

```
Small ledger (100k accounts):
  Nodes in tree: ~20,000
  Network requests: ~20,000 (batch fetch reduces count)
  Time to sync: seconds to minutes

Large ledger (10M accounts):
  Nodes in tree: ~2,000,000
  Network requests: ~100,000 (batching and parallel)
  Time to sync: hours to days
```

### State Reconstruction Guarantee

The transaction tree, persisted in NodeStore, ensures unique history:

**Problem:**

Without transaction history, many sequences could produce same state:

```
State A --tx1--> State B
       \--tx2--/

Multiple paths, same end state
Which actually happened?
```

**Solution:**

The transaction tree proves the exact sequence:

```
Ledger header contains:
  - Account state tree root (current balances)
  - Transaction tree root (complete history)

Given state tree root + transaction tree root:
  Can verify exact sequence that produced this state
  No ambiguity about what happened
```

**NodeStore's Role:**

Both tree nodes are persisted:

* Query state tree to find current values
* Query transaction tree to find history
* Together: complete, verifiable record

### Summary

**Key Algorithms:**

1. **Traversal**: visitNodes, visitLeaves, iterators
2. **Missing Detection**: getMissingNodes with Full Below cache
3. **Node Addition**: addRootNode, addKnownNode with canonicalization
4. **Synchronization**: Complete flow from root to leaves
5. **Verification**: Hash chain validation

**Performance Properties:**

```
Tree navigation:        O(log N) worst case, O(1) typical
Missing detection:      O(log N) comparisons
Complete sync:          O(N) for N nodes, ~O(log N) * bandwidth
Parallel optimization:  N-way parallelism with thread pools
```

**Critical Insight:**

Synchronization works because of hash-based tree structure:

* Compare root hashes: O(1) comparison
* If differ: compare children: O(16) comparisons max
* Recursive descent: O(log N) total comparisons
* vs. comparing all N leaves directly

This logarithmic advantage is what makes blockchain synchronization practical.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.xrpl-commons.org/core-dev-bootcamp/module03/shamap-synchronization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
