With navigation and hashing understood, we now explore how SHAMap enables efficient tree traversal and network synchronization. These operations are critical for:
New nodes catching up to the network
Peers verifying they have identical state
Extracting state information for queries
Detecting and resolving ledger divergences
Traversal Algorithms
SHAMap provides multiple traversal strategies depending on the use case:
Depth-First Traversal: visitNodes
The visitNodes method provides complete tree traversal:
voidSHAMap::visitNodes(std::function<void(SHAMapTreeNode*)>callback){std::stack<std::shared_ptr<SHAMapTreeNode>> toVisit;toVisit.push(mRoot);while(!toVisit.empty()){auto node =toVisit.top();toVisit.pop();callback(node);// Process inner node's childrenif(auto inner =std::dynamic_pointer_cast<SHAMapInnerNode>(node)){for(int i =15; i >=0;--i){// Reverse order for stackif(auto child =inner->getChild(i)){toVisit.push(child);}}}}}
Use Cases:
Tree validation (verify all hashes)
Bulk operations on all nodes
Custom tree analysis
Leaf-Only Traversal: visitLeaves
Iterator-Based Traversal
Parallel Traversal: walkMapParallel
For performance-critical operations:
Use Cases:
Missing node detection at scale
High-throughput synchronization
Missing Node Detection
The core synchronization primitive identifies nodes needed for complete tree reconstruction:
Algorithm: getMissingNodes
Output:
Returns vector of (NodeID, Hash) pairs representing missing nodes, prioritized for network retrieval.
Full Below Optimization
An optimization preventing redundant traversal:
When a subtree is verified complete (all descendants present), skip traversing it again until a new sync starts.
Node Addition and Canonicalization
Adding the Root Node: addRootNode
Initializes or verifies the root node:
Adding Known Nodes: addKnownNode
Adds interior or leaf nodes during synchronization:
Canonicalization
Purpose:
Ensure nodes are unique in memory (one NodeObject per hash):
for (auto it = shamap.begin(); it != shamap.end(); ++it) {
// Access SHAMapItem via *it
// Iteration order matches key ordering
}
void SHAMap::walkMapParallel(
std::function<void(SHAMapTreeNode*)> callback,
int numThreads)
{
// Divide tree into subtrees
// Process subtrees concurrently
// Aggregate results
}
std::vector<std::pair<SHAMapNodeID, uint256>>
SHAMap::getMissingNodes(
std::function<bool(uint256 const&)> nodeAvailable)
{
std::vector<std::pair<SHAMapNodeID, uint256>> missing;
std::stack<SHAMapNodeID> toVisit;
toVisit.push(SHAMapNodeID(0)); // Root node
while (!toVisit.empty() && missing.size() < MAX_RESULTS) {
SHAMapNodeID nodeID = toVisit.top();
toVisit.pop();
// Check if we have this node
auto node = getNode(nodeID);
if (!node) {
missing.push_back({nodeID, getExpectedHash(nodeID)});
continue;
}
// For inner nodes, check children
if (auto inner = dynamic_cast<SHAMapInnerNode*>(node.get())) {
for (int branch = 0; branch < 16; ++branch) {
uint256 childHash = inner->getChildHash(branch);
if (childHash.isValid()) {
// Child should exist
if (!nodeAvailable(childHash)) {
// Child is missing
SHAMapNodeID childID = nodeID.getChildNodeID(branch);
toVisit.push(childID);
}
}
}
}
}
return missing;
}
class SHAMapInnerNode {
// Generation counter: when this subtree was marked "complete"
std::uint32_t mFullBelow = 0;
};
if (node->mFullBelow == currentGeneration) {
// Entire subtree known complete
// Skip traversal
continue;
}
SHAMapAddNode SHAMap::addRootNode(
uint256 const& hash,
Blob const& nodeData,
SHANodeFilter* filter = nullptr)
{
// Check if root already exists with matching hash
if (mRoot && mRoot->getHash() == hash) {
return SHAMapAddNode::duplicate();
}
// Validate and deserialize
auto node = deserializeNode(nodeData, SHAMapNodeID(0));
if (!node) {
return SHAMapAddNode::invalid();
}
// Canonicalize: ensure uniqueness in cache
canonicalizeNode(node);
// Set as root
mRoot = std::dynamic_pointer_cast<SHAMapInnerNode>(node);
if (filter) {
filter->foundNode(hash);
}
return SHAMapAddNode::useful();
}
SHAMapAddNode SHAMap::addKnownNode(
SHAMapNodeID const& nodeID,
Blob const& nodeData,
SHANodeFilter* filter = nullptr)
{
// Deserialize node
auto newNode = deserializeNode(nodeData, nodeID);
if (!newNode) {
return SHAMapAddNode::invalid();
}
// Canonicalize (prevent duplicate nodes in memory)
canonicalizeNode(newNode);
// Navigate from root to parent
auto parent = getNode(nodeID.getParentNodeID());
if (!parent || !parent->isInner()) {
return SHAMapAddNode::invalid();
}
// Verify hash matches before insertion
int branch = nodeID.getBranch();
if (parent->getChildHash(branch) != newNode->getHash()) {
return SHAMapAddNode::invalid();
}
// Insert into tree
auto parentInner = std::dynamic_pointer_cast<SHAMapInnerNode>(parent);
parentInner->setChild(branch, newNode);
if (filter) {
filter->foundNode(newNode->getHash());
}
return SHAMapAddNode::useful();
}
std::shared_ptr<SHAMapTreeNode>
SHAMap::canonicalizeNode(std::shared_ptr<SHAMapTreeNode> node)
{
uint256 hash = node->getHash();
// Check cache for existing node
auto cached = mNodeCache->get(hash);
if (cached) {
return cached; // Use cached instance
}
// New node - insert into cache
mNodeCache->insert(hash, node);
return node;
}
1. Syncing node receives ledger header with:
- Ledger sequence number
- Account state tree root hash
- Transaction tree root hash
2. Request missing nodes from peers:
- Start with root hashes
- Call getMissingNodes()
- Get list of missing (NodeID, Hash) pairs
3. Fetch missing nodes from network:
- Request nodes in parallel
- Asynchronously add them with addKnownNode()
4. Repeat until complete:
- Call getMissingNodes() again
- As new nodes are added, fewer appear missing
- Eventually: getMissingNodes() returns empty list
5. Verification:
- Recompute root hashes from all nodes
- Verify they match received values
- Ledger is now complete and verified
6. Persist:
- Nodes are stored in NodeStore
- SHAMap is marked immutable
- Ledger is now available locally
Small ledger (100k accounts):
Nodes in tree: ~20,000
Network requests: ~20,000 (batch fetch reduces count)
Time to sync: seconds to minutes
Large ledger (10M accounts):
Nodes in tree: ~2,000,000
Network requests: ~100,000 (batching and parallel)
Time to sync: hours to days
State A --tx1--> State B
\--tx2--/
Multiple paths, same end state
Which actually happened?
Ledger header contains:
- Account state tree root (current balances)
- Transaction tree root (complete history)
Given state tree root + transaction tree root:
Can verify exact sequence that produced this state
No ambiguity about what happened
Tree navigation: O(log N) worst case, O(1) typical
Missing detection: O(log N) comparisons
Complete sync: O(N) for N nodes, ~O(log N) * bandwidth
Parallel optimization: N-way parallelism with thread pools