Cache Layer and Performance Optimization
← Back to SHAMap and NodeStore: Data Persistence and State Management
Introduction
Database queries are fundamentally slow compared to memory access:
Memory access:     1-10 microseconds
Cache miss latency: 1-10 milliseconds
Ratio:             1000x slowerWith millions of nodes and transactions processing at ledger-close speeds (every 3-5 seconds), any system that naively queries the database for every node access will fail.
The NodeStore's Cache Layer solves this through a multi-tier strategy that keeps hot data in memory while safely delegating cold data to disk.
The Performance Problem
Scenario: Synchronizing from the Network
A new node joins XRPL and must catch up to current ledger. This requires:
Fetching missing ledgers (blocks of transactions)
For each ledger, fetching all state nodes
Verifying each node's hash
Storing nodes to disk
Naive Approach (No Cache):
for (each node in ledger) {
    backend.fetch(nodeHash);  // Database query - 10ms
}
For 10,000 nodes in ledger:
  10,000 × 10ms = 100 seconds per ledger
For 100,000 ledgers to catch up:
  100 seconds × 100,000 = 1,157 daysClearly infeasible.
With Caching (90% Hit Rate):
for (each node in ledger) {
    if (cache.has(nodeHash)) {
        cache.get(nodeHash);      // 10 microseconds
    } else {
        backend.fetch(nodeHash);  // 10 milliseconds
    }
}
For 10,000 nodes:
  9,000 × 10µs (cache hits) = 90ms
  1,000 × 10ms (cache misses) = 10,000ms
  Total: 10,090ms ≈ 10 seconds per ledger
For 100,000 ledgers:
  100,000 × 10 seconds = 1,000,000 seconds ≈ 11.5 daysStill slow, but realistic with parallel processing.
The difference between possible and impossible is caching.
TaggedCache Architecture
The NodeStore's primary cache is the TaggedCache:
Purpose:
class TaggedCache {
    // Keep frequently accessed NodeObjects in memory
    // Minimize expensive database queries
    // Provide thread-safe concurrent access
};Structure:
class TaggedCache {
private:
    // Key: NodeObject hash (uint256)
    // Value: shared_ptr<NodeObject>
    std::unordered_map<uint256, std::shared_ptr<NodeObject>> mCache;
    // Protect concurrent access
    std::mutex mLock;
    // Configuration
    size_t mMaxSize;        // Maximum objects in cache
    std::chrono::seconds mMaxAge;  // Maximum age before eviction
};
public:
    // Retrieve from cache
    std::shared_ptr<NodeObject> get(uint256 const& hash);
    // Store in cache
    void insert(uint256 const& hash,
               std::shared_ptr<NodeObject> const& obj);
    // Remove from cache
    void remove(uint256 const& hash);
    // Evict old entries
    void evictExpired();
};Cache Tiers:
NodeStore implements a two-tier caching strategy:
        Application
            ↓
        L1: TaggedCache (In-Memory)
        Fast: 1-10 microseconds
        Size: ~100MB - 5GB (configurable)
            ↓ (on miss)
        L2: Backend Database (Persistent)
        Slow: 1-10 milliseconds
        Size: Unlimited (disk)
            ↓
        Physical Storage (Disk)Fetch Algorithm:
std::shared_ptr<NodeObject> fetch(uint256 const& hash) {
    // Step 1: Check cache
    {
        std::lock_guard<std::mutex> lock(mCacheLock);
        auto it = mCache.find(hash);
        if (it != mCache.end()) {
            // Found in cache
            recordHit(hash);
            return it->second;
        }
    }
    // Step 2: Cache miss - query backend
    auto obj = backend->fetch(hash);
    if (!obj) {
        // Object not found anywhere
        // Cache a dummy marker to prevent repeated lookups
        cacheDummy(hash);
        return nullptr;
    }
    // Step 3: Update cache with newly fetched object
    {
        std::lock_guard<std::mutex> lock(mCacheLock);
        mCache.insert({hash, obj});
    }
    recordMiss(hash);
    return obj;
}Cache Insertion Strategy
When Objects Enter Cache:
1. On Database Fetch
   fetch() fails or succeeds → always cache result
   (Success: cache object, Failure: cache dummy)
2. On Storage
   store() called → immediately cache object
   Object will be reaccessed soon
3. Predictive Loading
   When traversing SHAMap, prefetch likely-needed siblingsDummy Objects:
Special marker objects prevent wasted lookups:
class NodeObject {
    static std::shared_ptr<NodeObject> createDummy(uint256 const& hash) {
        auto obj = std::make_shared<NodeObject>();
        obj->mType = hotDUMMY;      // Type 512: marker
        obj->mHash = hash;
        obj->mData.clear();         // Empty data
        return obj;
    }
    bool isDummy() const {
        return mType == hotDUMMY;
    }
};
// In fetch:
std::shared_ptr<NodeObject> backend_result = backend->fetch(hash);
if (!backend_result) {
    // Not found - cache dummy to avoid retry
    auto dummy = NodeObject::createDummy(hash);
    cache.insert(hash, dummy);
    return nullptr;
}Benefit of Dummies:
Scenario: Syncing but peer doesn't have a node
Without dummies:
  Request node X from network
  Peer: "I don't have it"
  Local check: Try backend
  Backend: "Not there"
  (repeat this sequence multiple times)
With dummies:
  Request node X from network
  Peer: "I don't have it"
  Local check: Cache hit on dummy
  Immediately know: "Not available"
  Don't retryPrevents thundering herd of repeated failed lookups.
Cache Eviction
Cache capacity is limited. When full, old objects must be evicted.
Eviction Triggers:
void insertWithEviction(uint256 const& hash,
                       std::shared_ptr<NodeObject> const& obj) {
    {
        std::lock_guard<std::mutex> lock(mCacheLock);
        // Check size limit
        if (mCache.size() >= mMaxSize) {
            evictLRU();  // Remove least recently used
        }
        mCache.insert({hash, obj});
    }
    // Check age limit (periodic)
    if (shouldEvictExpired()) {
        evictExpired();  // Remove objects older than mMaxAge
    }
}LRU (Least Recently Used) Eviction:
void evictLRU() {
    // Find object with oldest access time
    auto oldest = findOldestAccess();
    // Remove from cache
    mCache.erase(oldest->hash);
}
// Track access times
struct CacheEntry {
    std::shared_ptr<NodeObject> object;
    std::chrono::steady_clock::time_point lastAccess;
};Age-Based Eviction:
void evictExpired() {
    auto now = std::chrono::steady_clock::now();
    for (auto it = mCache.begin(); it != mCache.end();) {
        auto age = now - it->second.lastAccess;
        if (age > mMaxAge) {
            // Object too old - remove
            it = mCache.erase(it);
        } else {
            ++it;
        }
    }
}Configuration Parameters:
// From rippled.cfg
[node_db]
cache_size = 256        // MB
cache_age = 60          // secondsImpact of Configuration:
Small cache:
  cache_size = 32 MB
  Hit rate: ~60% (more evictions)
  Disk queries: 40% of lookups (slower)
Large cache:
  cache_size = 1024 MB
  Hit rate: ~95% (fewer evictions)
  Disk queries: 5% of lookups (faster)
  Memory usage: Higher
Operators choose based on available RAM and performance needsCache Performance Metrics
NodeStore tracks cache effectiveness:
struct CacheMetrics {
    uint64_t hits;           // Cache hits
    uint64_t misses;         // Cache misses
    uint64_t inserts;        // Objects added
    uint64_t evictions;      // Objects removed
    double hitRate() const {
        return hits / (double)(hits + misses);
    }
};
// Typical production hit rates:
// Well-configured: 90-95%
// Poorly tuned: 60-75%
// Synchronized node: 98% (accessing recent ledgers)Example Metrics:
From a running XRPL validator:
Period: 1 hour
Cache hits:     86,400  (hit on hot data)
Cache misses:    7,200  (query database)
Hit rate:       92.3%   (excellent)
Latency impact:
  Average: 0.92 * 1µs + 0.08 * 10ms = 0.81 milliseconds
  Without cache: Average 10 milliseconds
  Speedup: 12.3x
Throughput impact:
  Queries handled: 93,600 per hour
  If all disk: 360 per hour (would be ~260x slower)
  Caching enables actual performanceSynchronization Optimization
During network synchronization, special techniques optimize caching:
Prefetching
void synchronizeNode(SHAMapNodeID nodeID,
                    uint256 const& nodeHash) {
    // Fetch this node
    auto node = fetch(nodeHash);
    if (auto inner = dynamic_cast<SHAMapInnerNode*>(node)) {
        // This is an inner node
        // Likely next accesses are to its children
        // Prefetch children to warm cache
        for (int i = 0; i < 16; ++i) {
            uint256 childHash = inner->getChildHash(i);
            if (childHash.isValid()) {
                // Asynchronously prefetch
                asyncFetch(childHash);
            }
        }
    }
}Batch Loading
// Fetch multiple nodes in single operation
std::vector<uint256> hashes = {hash1, hash2, hash3, ...};
auto results = backend->fetchBatch(hashes);
// Reduces backend overhead
// Populates cache efficiently
// Parallelizes I/O operationsDeferred Reads
// During synchronization, identify missing nodes
std::vector<uint256> missing = getMissingNodes(shamap);
// Request from peers asynchronously
// When they arrive, cache them
// Continue traversal without blocking on network
// This allows pipelining: request more while processing previous resultsWorking Set Management
Different phases of operation have different access patterns:
During Normal Operation (Steady State)
Access pattern: Recent ledgers frequently accessed
  Root hash: checked at consensus
  Recent state nodes: queried for transactions
  Old historical data: rarely accessed
Cache configuration:
  Keep recent ledgers fully cached
  Let old ledgers evict to make room
  Result: Excellent hit rate for current operationsDuring Synchronization
Access pattern: Missing nodes from network
  Need to verify hash chain from root to leaf
  Often fetching siblings (related nodes)
  May access same node multiple times
Cache strategy:
  Smaller cache acceptable (still beneficial)
  Prefetch siblings when fetching parent
  Use dummy markers to avoid retry storms
  Result: Synchronization completes in hours vs daysAfter Sync Completion
Access pattern: Back to steady-state recent ledger access
Cache characteristics:
  Most-accessed nodes pinned in cache
  Hit rate quickly reaches 90%+
  Warm cache from prior workThread Safety
Cache must be safe for concurrent access:
class TaggedCache {
    std::unordered_map<uint256, CacheEntry> mCache;
    mutable std::shared_mutex mLock;  // Allow multiple readers
public:
    std::shared_ptr<NodeObject> get(uint256 const& hash) {
        std::shared_lock<std::shared_mutex> lock(mLock);
        auto it = mCache.find(hash);
        return (it != mCache.end()) ? it->second.object : nullptr;
    }
    void insert(uint256 const& hash,
               std::shared_ptr<NodeObject> const& obj) {
        std::unique_lock<std::shared_mutex> lock(mLock);
        // Check if size exceeded
        if (mCache.size() >= mMaxSize) {
            evictLRU();  // Exclusive lock held, safe to modify
        }
        mCache[hash] = {obj, now()};
    }
};Concurrency Properties:
Multiple readers:
  Many threads can fetch simultaneously
  No contention for cache hits
  Scaling: hundreds of concurrent fetches possible
Insert/evict operations:
  Exclusive lock for modification
  Short-lived (just map operations)
  Background eviction: doesn't block fetchesAdvanced Caching: Full Below Optimization
During synchronization, special tracking prevents redundant work:
The Problem:
Synchronizing large subtree:
  1. Check root: hash differs (need to sync)
  2. Check children: most hashes match, one differs
  3. Recurse into different child
  4. Check its children: all match
  5. Later: receive more nodes
  6. Check same grandparent again
  7. Redundant check (we already knew it was full)The Solution: Full Below Generation Counter
class SHAMapInnerNode {
    // Generation number marking when this node was verified complete
    std::uint32_t mFullBelow;
};
class FullBelowCache {
    // Current generation
    std::uint32_t mCurrentGeneration = 0;
    bool isKnownFull(SHAMapInnerNode* node) {
        return node->mFullBelow == mCurrentGeneration;
    }
    void markFull(SHAMapInnerNode* node) {
        node->mFullBelow = mCurrentGeneration;
    }
    void invalidate() {
        ++mCurrentGeneration;  // Invalidate all prior markings
    }
};Benefit:
When synchronizing:
  Fetch subtree, verify all descendants present
  Mark as "full below" with generation ID
  Later sync process checks generation
  If matches current: skip this subtree (known complete)
  If differs: need to re-verify (new sync started)
Result: Avoids re-traversing known-complete subtrees
Significant speedup in incremental sync scenariosSummary
Key Concepts:
Multi-Tier Caching: Memory + disk strategy balances performance and capacity
LRU Eviction: Keeps frequently-accessed data, evicts cold data
Dummy Markers: Prevent repeated failed lookups
Metrics Tracking: Monitor cache effectiveness
Synchronization Optimization: Prefetch and batch loading
Full Below Cache: Avoid redundant traversal during sync
Thread Safety: Shared locks for multiple readers
Performance Impact:
Without caching:     Synchronization would take months
With 90% hit rate:   Synchronization takes hours
With 95% hit rate:   Synchronization takes minutes
Production systems: Carefully tuned caches running at 92-96% hit rateThe Cache Layer transforms NodeStore from theoretical to practical. Database queries are unavoidable, but caching hides their latency, allowing XRPL to operate at microsecond efficiency despite millisecond database performance.
Last updated

