The NodeStore

← Back to SHAMap and NodeStore: Data Persistence and State Management


Introduction

Now that you understand SHAMap's in-memory structure and efficient algorithms, we turn to the critical question: how do you persist this state?

Without persistence, every validator restart requires replaying transactions from genesis—weeks of computation. With persistence, recovery takes minutes. But persistence introduces challenges:

  • Scale: Millions of nodes, gigabytes of data

  • Performance: Database queries are 1000x slower than memory access

  • Flexibility: Different operators need different storage engines

  • Reliability: Data must survive crashes without corruption

The NodeStore solves all these problems through elegant abstraction and careful design.

NodeStore's Role in XRPL Architecture

NodeStore sits at a critical junction in XRPL's architecture:

Application Layer
    |
    v
SHAMap (In-Memory State)
    |
    v
NodeStore Interface (Abstraction)
    |
    v
Cache Layer (TaggedCache) -- Hot Data in Memory
    |
    v
Backend Abstraction (Interface) -- Multiple implementations
    |
    v
Database Implementation (RocksDB, NuDB, etc.)
    |
    v
Physical Storage (Disk)

SHAMap's Dependency:

SHAMap needs to retrieve historical nodes:

// During synchronization or historical queries:
std::shared_ptr<SHAMapTreeNode> node = nodestore.fetch(nodeHash);

But SHAMap doesn't know or care about:

  • How data is stored

  • Which database backend is used

  • Where the data is physically located

  • How caching is implemented

All that complexity is hidden behind NodeStore's interface.

Core Purpose

NodeStore provides four critical services:

1. Persistence

SHAMap state exists in memory

Serialize nodes to disk

Survive application crash

Reconstruct state on startup

2. Consistent Interface

// Application code doesn't change regardless of backend
nodestore.store(node);   // Works with RocksDB, NuDB, SQLite...
auto node = nodestore.fetch(hash);

3. Performance Optimization

Database queries: 1-10 milliseconds
Memory access: 1-10 microseconds
1000x difference!

NodeStore uses caching to keep hot data in memory
Typical hit rate: 90-95%
Result: Average latency near memory speed

4. Lifecycle Management

Startup: Locate and open database
Runtime: Store and retrieve nodes as needed
Shutdown: Cleanly close database
Rotation: Enable online deletion and archival

NodeObject: The Fundamental Storage Unit

The atomic unit of storage in XRPL is the NodeObject:

Structure:

class NodeObject {
    // Type of object (LEDGER_HEADER, ACCOUNT_NODE, TRANSACTION_NODE)
    NodeObjectType mType;

    // 256-bit unique identifier
    uint256 mHash;

    // Serialized content (variable length)
    Blob mData;

public:
    // Factory: create NodeObject from components
    static std::shared_ptr<NodeObject> createObject(
        NodeObjectType type,
        Blob const& data,
        uint256 const& hash);

    // Access methods
    NodeObjectType getType() const { return mType; }
    uint256 const& getHash() const { return mHash; }
    Blob const& getData() const { return mData; }
};

Key Characteristics:

  1. Immutable Once Created: Cannot modify data after creation

  2. Hash as Key: Hash uniquely identifies the object

  3. Type Distinguishing: Type prevents hash collisions between different data types

  4. Serialized Format: Data is already in wire format

NodeObject Types:

Type
Purpose
Numeric Value

hotLEDGER

Ledger headers and metadata

1

hotACCOUNT_NODE

Account state tree nodes

3

hotTRANSACTION_NODE

Transaction tree nodes

4

hotUNKNOWN

Unknown/unrecognized types

0

hotDUMMY

Cache marker for missing entries

512

Type Prefix in Hashing:

Type fields prevent collisions:

// Two different types of data, might have same structure
// Type prefix ensures different hashes

uint256 hash_account = SHA512Half(
    ACCOUNT_TYPE_BYTE || accountData);

uint256 hash_transaction = SHA512Half(
    TRANSACTION_TYPE_BYTE || accountData);

// hash_account != hash_transaction

NodeObject Lifecycle

Creation

During transaction processing:
  1. Transaction validated and applied to SHAMap
  2. SHAMap nodes modified
  3. Each modified node serialized
  4. NodeObject created with type, hash, serialized data
  5. Stored in NodeStore

Storage

For each NodeObject:
  1. Encode to persistent format
  2. Compute key (same as hash)
  3. Write to database
  4. Backend handles actual I/O

Caching

After storage:
  1. Keep in memory for fast reaccess
  2. Move to cache tier
  3. Evict when cache capacity exceeded
  4. Dummy objects mark "known missing" (avoid repeated lookups)

Retrieval

When SHAMap needs a node:
  1. Check cache (microseconds)
  2. If miss, query database (milliseconds)
  3. Deserialize and validate
  4. Add to cache
  5. Return to SHAMap

Archival

After ledger is validated and no longer current:
  1. May be retained for history
  2. Or moved to archive during rotation
  3. Or deleted based on retention policy

Backend Abstraction

The Backend class defines the minimal interface for any storage system:

Core Operations:

class Backend {
    // Store single object
    virtual Status store(NodeObject const& object) = 0;

    // Retrieve single object by hash
    virtual Status fetch(uint256 const& hash,
                        std::shared_ptr<NodeObject>& object) = 0;

    // Persist multiple objects atomically
    virtual Status storeBatch(std::vector<NodeObject> const& batch) = 0;

    // Retrieve multiple objects efficiently
    virtual Status fetchBatch(std::vector<uint256> const& hashes,
                             std::vector<NodeObject>& objects) = 0;

    // Lifecycle
    virtual Status open(std::string const& path) = 0;
    virtual Status close() = 0;
    virtual int fdRequired() const = 0;  // File descriptors needed
};

Status Codes:

enum class Status {
    ok,                 // Operation succeeded
    notFound,          // Key doesn't exist
    dataCorrupt,       // Data integrity check failed (fatal)
    backendError       // Backend error
};

Backend Independence:

NodeStore sits above backends, application logic unchanged:

// Same code works with any backend

struct DatabaseConfig {
    std::string type;  // "rocksdb", "nudb", "sqlite", etc.
    std::string path;
    // ... backend-specific options
};

auto backend = createBackend(config);
NodeStore store(backend);

// Application uses NodeStore
store.fetch(hash);  // Works regardless of backend
store.store(node);

Supported Backends

RocksDB (Recommended for Most Cases)

Backend* createRocksDBBackend(std::string const& path) {
    return new RocksDBBackend(path);
}
  • Modern key-value store developed by Facebook

  • LSM tree (Log-Structured Merge tree) design

  • Excellent performance for XRPL workloads

  • Built-in compression support

  • Active maintenance

Characteristics:

  • Write throughput: ~10,000-50,000 objects/second

  • Read throughput: ~100,000+ objects/second

  • Compression: Reduces disk space by 50-70%

NuDB (High-Throughput Alternative)

Backend* createNuDBBackend(std::string const& path) {
    return new NuDBBackend(path);
}
  • Purpose-built for XRPL by Ripple

  • Append-only design optimized for SSD

  • Higher write throughput than RocksDB

  • Efficient space utilization

Characteristics:

  • Write throughput: ~50,000-200,000 objects/second

  • Read throughput: ~100,000+ objects/second

  • Better for high-volume systems

Testing Backends

Backend* createMemoryBackend() {
    return new MemoryBackend();  // In-memory, non-persistent
}

Backend* createNullBackend() {
    return new NullBackend();     // No-op backend
}

Data Encoding Format

To enable backend independence, NodeStore uses a standardized encoding:

Encoded Blob Structure:

Byte Offset | Field | Description
0-7         | Reserved | Set to zero, reserved for future use
8           | Type | NodeObjectType enumeration value
9+          | Data | Serialized object payload (variable length)

Encoding Process:

void encodeNodeObject(NodeObject const& obj, Blob& blob) {
    // Add 8 reserved bytes
    blob.resize(8, 0);

    // Add type byte
    blob.push_back(obj.getType());

    // Add data payload
    blob.append(obj.getData());
}

Decoding Process:

std::shared_ptr<NodeObject> decodeNodeObject(
    uint256 const& hash,
    Blob const& blob)
{
    if (blob.size() < 9) {
        return nullptr;  // Corrupted
    }

    NodeObjectType type = static_cast<NodeObjectType>(blob[8]);

    Blob data(blob.begin() + 9, blob.end());

    return NodeObject::createObject(type, data, hash);
}

Benefits:

  1. Backend Agnostic: Any backend can store/retrieve encoded blobs

  2. Self-Describing: Type embedded, forward-compatible with unknown types

  3. Efficient: Minimal overhead (8 bytes) per object

  4. Validated: Type byte catches most corruption

Database Key as Hash

The database key is the object's hash (not a sequential ID):

Status Backend::store(NodeObject const& obj) {
    uint256 key = obj.getHash();      // 256-bit hash as key
    Blob value = encode(obj);          // Encoded blob as value

    return database.put(key, value);   // Key-value store
}

Implications:

  1. Direct Retrieval: Any node retrievable by hash

  2. Deduplication: Identical content produces identical hash → same key

  3. Immutability: Hash never changes for given data

  4. Verification: Can verify data by recomputing hash

Integration Architecture

NodeStore integrates with SHAMap through the Family pattern:

// Family provides NodeStore access to SHAMap
class Family {
    virtual std::shared_ptr<NodeStore> getNodeStore() = 0;
    virtual std::shared_ptr<TreeNodeCache> getTreeNodeCache() = 0;
    virtual std::shared_ptr<FullBelowCache> getFullBelowCache() = 0;
};

class NodeFamily : public Family {
    std::shared_ptr<NodeStore> mNodeStore;
    std::shared_ptr<TreeNodeCache> mTreeCache;
    std::shared_ptr<FullBelowCache> mFullBelow;

    // ... implement Family interface
};

// SHAMap uses Family for storage access
class SHAMap {
    std::shared_ptr<Family> mFamily;

    std::shared_ptr<SHAMapTreeNode> getNode(uint256 const& hash) {
        // Try cache first
        auto cached = mFamily->getTreeNodeCache()->get(hash);
        if (cached) return cached;

        // Fetch from NodeStore
        auto obj = mFamily->getNodeStore()->fetch(hash);
        if (obj) {
            auto node = deserializeNode(obj);
            // Cache for future access
            mFamily->getTreeNodeCache()->insert(hash, node);
            return node;
        }

        return nullptr;
    }
};

Summary

Key Architectural Elements:

  1. NodeObject: Atomic storage unit (type, hash, data)

  2. Backend Interface: Minimal, consistent interface for storage

  3. Abstraction: Decouples application logic from storage implementation

  4. Encoding Format: Standardized format enables backend independence

  5. Key as Hash: Direct retrieval without index lookups

  6. Family Pattern: Provides access to caching and storage

Design Properties:

  • Backend Flexibility: Switch storage engines without code changes

  • Scale: Handles millions of objects efficiently

  • Persistence: Survives crashes and restarts

  • Verification: Data integrity through hashing

  • Simplicity: Minimal interface hides complexity

In the next chapter, we'll explore the critical Cache Layer that makes NodeStore practical for high-performance systems.

Last updated