The NodeStore
← Back to SHAMap and NodeStore: Data Persistence and State Management
Introduction
Now that you understand SHAMap's in-memory structure and efficient algorithms, we turn to the critical question: how do you persist this state?
Without persistence, every validator restart requires replaying transactions from genesis—weeks of computation. With persistence, recovery takes minutes. But persistence introduces challenges:
Scale: Millions of nodes, gigabytes of data
Performance: Database queries are 1000x slower than memory access
Flexibility: Different operators need different storage engines
Reliability: Data must survive crashes without corruption
The NodeStore solves all these problems through elegant abstraction and careful design.
NodeStore's Role in XRPL Architecture
NodeStore sits at a critical junction in XRPL's architecture:
Application Layer
    |
    v
SHAMap (In-Memory State)
    |
    v
NodeStore Interface (Abstraction)
    |
    v
Cache Layer (TaggedCache) -- Hot Data in Memory
    |
    v
Backend Abstraction (Interface) -- Multiple implementations
    |
    v
Database Implementation (RocksDB, NuDB, etc.)
    |
    v
Physical Storage (Disk)SHAMap's Dependency:
SHAMap needs to retrieve historical nodes:
// During synchronization or historical queries:
std::shared_ptr<SHAMapTreeNode> node = nodestore.fetch(nodeHash);But SHAMap doesn't know or care about:
How data is stored
Which database backend is used
Where the data is physically located
How caching is implemented
All that complexity is hidden behind NodeStore's interface.
Core Purpose
NodeStore provides four critical services:
1. Persistence
SHAMap state exists in memory
↓
Serialize nodes to disk
↓
Survive application crash
↓
Reconstruct state on startup2. Consistent Interface
// Application code doesn't change regardless of backend
nodestore.store(node);   // Works with RocksDB, NuDB, SQLite...
auto node = nodestore.fetch(hash);3. Performance Optimization
Database queries: 1-10 milliseconds
Memory access: 1-10 microseconds
1000x difference!
NodeStore uses caching to keep hot data in memory
Typical hit rate: 90-95%
Result: Average latency near memory speed4. Lifecycle Management
Startup: Locate and open database
Runtime: Store and retrieve nodes as needed
Shutdown: Cleanly close database
Rotation: Enable online deletion and archivalNodeObject: The Fundamental Storage Unit
The atomic unit of storage in XRPL is the NodeObject:
Structure:
class NodeObject {
    // Type of object (LEDGER_HEADER, ACCOUNT_NODE, TRANSACTION_NODE)
    NodeObjectType mType;
    // 256-bit unique identifier
    uint256 mHash;
    // Serialized content (variable length)
    Blob mData;
public:
    // Factory: create NodeObject from components
    static std::shared_ptr<NodeObject> createObject(
        NodeObjectType type,
        Blob const& data,
        uint256 const& hash);
    // Access methods
    NodeObjectType getType() const { return mType; }
    uint256 const& getHash() const { return mHash; }
    Blob const& getData() const { return mData; }
};Key Characteristics:
Immutable Once Created: Cannot modify data after creation
Hash as Key: Hash uniquely identifies the object
Type Distinguishing: Type prevents hash collisions between different data types
Serialized Format: Data is already in wire format
NodeObject Types:
hotLEDGER
Ledger headers and metadata
1
hotACCOUNT_NODE
Account state tree nodes
3
hotTRANSACTION_NODE
Transaction tree nodes
4
hotUNKNOWN
Unknown/unrecognized types
0
hotDUMMY
Cache marker for missing entries
512
Type Prefix in Hashing:
Type fields prevent collisions:
// Two different types of data, might have same structure
// Type prefix ensures different hashes
uint256 hash_account = SHA512Half(
    ACCOUNT_TYPE_BYTE || accountData);
uint256 hash_transaction = SHA512Half(
    TRANSACTION_TYPE_BYTE || accountData);
// hash_account != hash_transactionNodeObject Lifecycle
Creation
During transaction processing:
  1. Transaction validated and applied to SHAMap
  2. SHAMap nodes modified
  3. Each modified node serialized
  4. NodeObject created with type, hash, serialized data
  5. Stored in NodeStoreStorage
For each NodeObject:
  1. Encode to persistent format
  2. Compute key (same as hash)
  3. Write to database
  4. Backend handles actual I/OCaching
After storage:
  1. Keep in memory for fast reaccess
  2. Move to cache tier
  3. Evict when cache capacity exceeded
  4. Dummy objects mark "known missing" (avoid repeated lookups)Retrieval
When SHAMap needs a node:
  1. Check cache (microseconds)
  2. If miss, query database (milliseconds)
  3. Deserialize and validate
  4. Add to cache
  5. Return to SHAMapArchival
After ledger is validated and no longer current:
  1. May be retained for history
  2. Or moved to archive during rotation
  3. Or deleted based on retention policyBackend Abstraction
The Backend class defines the minimal interface for any storage system:
Core Operations:
class Backend {
    // Store single object
    virtual Status store(NodeObject const& object) = 0;
    // Retrieve single object by hash
    virtual Status fetch(uint256 const& hash,
                        std::shared_ptr<NodeObject>& object) = 0;
    // Persist multiple objects atomically
    virtual Status storeBatch(std::vector<NodeObject> const& batch) = 0;
    // Retrieve multiple objects efficiently
    virtual Status fetchBatch(std::vector<uint256> const& hashes,
                             std::vector<NodeObject>& objects) = 0;
    // Lifecycle
    virtual Status open(std::string const& path) = 0;
    virtual Status close() = 0;
    virtual int fdRequired() const = 0;  // File descriptors needed
};Status Codes:
enum class Status {
    ok,                 // Operation succeeded
    notFound,          // Key doesn't exist
    dataCorrupt,       // Data integrity check failed (fatal)
    backendError       // Backend error
};Backend Independence:
NodeStore sits above backends, application logic unchanged:
// Same code works with any backend
struct DatabaseConfig {
    std::string type;  // "rocksdb", "nudb", "sqlite", etc.
    std::string path;
    // ... backend-specific options
};
auto backend = createBackend(config);
NodeStore store(backend);
// Application uses NodeStore
store.fetch(hash);  // Works regardless of backend
store.store(node);Supported Backends
RocksDB (Recommended for Most Cases)
Backend* createRocksDBBackend(std::string const& path) {
    return new RocksDBBackend(path);
}Modern key-value store developed by Facebook
LSM tree (Log-Structured Merge tree) design
Excellent performance for XRPL workloads
Built-in compression support
Active maintenance
Characteristics:
Write throughput: ~10,000-50,000 objects/second
Read throughput: ~100,000+ objects/second
Compression: Reduces disk space by 50-70%
NuDB (High-Throughput Alternative)
Backend* createNuDBBackend(std::string const& path) {
    return new NuDBBackend(path);
}Purpose-built for XRPL by Ripple
Append-only design optimized for SSD
Higher write throughput than RocksDB
Efficient space utilization
Characteristics:
Write throughput: ~50,000-200,000 objects/second
Read throughput: ~100,000+ objects/second
Better for high-volume systems
Testing Backends
Backend* createMemoryBackend() {
    return new MemoryBackend();  // In-memory, non-persistent
}
Backend* createNullBackend() {
    return new NullBackend();     // No-op backend
}Data Encoding Format
To enable backend independence, NodeStore uses a standardized encoding:
Encoded Blob Structure:
Byte Offset | Field | Description
0-7         | Reserved | Set to zero, reserved for future use
8           | Type | NodeObjectType enumeration value
9+          | Data | Serialized object payload (variable length)Encoding Process:
void encodeNodeObject(NodeObject const& obj, Blob& blob) {
    // Add 8 reserved bytes
    blob.resize(8, 0);
    // Add type byte
    blob.push_back(obj.getType());
    // Add data payload
    blob.append(obj.getData());
}Decoding Process:
std::shared_ptr<NodeObject> decodeNodeObject(
    uint256 const& hash,
    Blob const& blob)
{
    if (blob.size() < 9) {
        return nullptr;  // Corrupted
    }
    NodeObjectType type = static_cast<NodeObjectType>(blob[8]);
    Blob data(blob.begin() + 9, blob.end());
    return NodeObject::createObject(type, data, hash);
}Benefits:
Backend Agnostic: Any backend can store/retrieve encoded blobs
Self-Describing: Type embedded, forward-compatible with unknown types
Efficient: Minimal overhead (8 bytes) per object
Validated: Type byte catches most corruption
Database Key as Hash
The database key is the object's hash (not a sequential ID):
Status Backend::store(NodeObject const& obj) {
    uint256 key = obj.getHash();      // 256-bit hash as key
    Blob value = encode(obj);          // Encoded blob as value
    return database.put(key, value);   // Key-value store
}Implications:
Direct Retrieval: Any node retrievable by hash
Deduplication: Identical content produces identical hash → same key
Immutability: Hash never changes for given data
Verification: Can verify data by recomputing hash
Integration Architecture
NodeStore integrates with SHAMap through the Family pattern:
// Family provides NodeStore access to SHAMap
class Family {
    virtual std::shared_ptr<NodeStore> getNodeStore() = 0;
    virtual std::shared_ptr<TreeNodeCache> getTreeNodeCache() = 0;
    virtual std::shared_ptr<FullBelowCache> getFullBelowCache() = 0;
};
class NodeFamily : public Family {
    std::shared_ptr<NodeStore> mNodeStore;
    std::shared_ptr<TreeNodeCache> mTreeCache;
    std::shared_ptr<FullBelowCache> mFullBelow;
    // ... implement Family interface
};
// SHAMap uses Family for storage access
class SHAMap {
    std::shared_ptr<Family> mFamily;
    std::shared_ptr<SHAMapTreeNode> getNode(uint256 const& hash) {
        // Try cache first
        auto cached = mFamily->getTreeNodeCache()->get(hash);
        if (cached) return cached;
        // Fetch from NodeStore
        auto obj = mFamily->getNodeStore()->fetch(hash);
        if (obj) {
            auto node = deserializeNode(obj);
            // Cache for future access
            mFamily->getTreeNodeCache()->insert(hash, node);
            return node;
        }
        return nullptr;
    }
};Summary
Key Architectural Elements:
NodeObject: Atomic storage unit (type, hash, data)
Backend Interface: Minimal, consistent interface for storage
Abstraction: Decouples application logic from storage implementation
Encoding Format: Standardized format enables backend independence
Key as Hash: Direct retrieval without index lookups
Family Pattern: Provides access to caching and storage
Design Properties:
Backend Flexibility: Switch storage engines without code changes
Scale: Handles millions of objects efficiently
Persistence: Survives crashes and restarts
Verification: Data integrity through hashing
Simplicity: Minimal interface hides complexity
In the next chapter, we'll explore the critical Cache Layer that makes NodeStore practical for high-performance systems.
Last updated

