# The NodeStore

[← Back to SHAMap and NodeStore: Data Persistence and State Management](/core-dev-bootcamp/module03.md)

***

### Introduction

Now that you understand SHAMap's in-memory structure and efficient algorithms, we turn to the critical question: **how do you persist this state?**

Without persistence, every validator restart requires replaying transactions from genesis—weeks of computation. With persistence, recovery takes minutes. But persistence introduces challenges:

* **Scale**: Millions of nodes, gigabytes of data
* **Performance**: Database queries are 1000x slower than memory access
* **Flexibility**: Different operators need different storage engines
* **Reliability**: Data must survive crashes without corruption

The NodeStore solves all these problems through elegant abstraction and careful design.

{% embed url="<https://www.youtube.com/watch?v=eTDSwkokBbo>" %}

### NodeStore's Role in XRPL Architecture

NodeStore sits at a critical junction in XRPL's architecture:

```
Application Layer
    |
    v
SHAMap (In-Memory State)
    |
    v
NodeStore Interface (Abstraction)
    |
    v
Cache Layer (TaggedCache) -- Hot Data in Memory
    |
    v
Backend Abstraction (Interface) -- Multiple implementations
    |
    v
Database Implementation (RocksDB, NuDB, etc.)
    |
    v
Physical Storage (Disk)
```

**SHAMap's Dependency:**

SHAMap needs to retrieve historical nodes:

```cpp
// During synchronization or historical queries:
std::shared_ptr<SHAMapTreeNode> node = nodestore.fetch(nodeHash);
```

But SHAMap doesn't know or care about:

* How data is stored
* Which database backend is used
* Where the data is physically located
* How caching is implemented

All that complexity is hidden behind NodeStore's interface.

### Core Purpose

NodeStore provides four critical services:

**1. Persistence**

```
SHAMap state exists in memory
↓
Serialize nodes to disk
↓
Survive application crash
↓
Reconstruct state on startup
```

**2. Consistent Interface**

```
// Application code doesn't change regardless of backend
nodestore.store(node);   // Works with RocksDB, NuDB, SQLite...
auto node = nodestore.fetch(hash);
```

**3. Performance Optimization**

```
Database queries: 1-10 milliseconds
Memory access: 1-10 microseconds
1000x difference!

NodeStore uses caching to keep hot data in memory
Typical hit rate: 90-95%
Result: Average latency near memory speed
```

**4. Lifecycle Management**

```
Startup: Locate and open database
Runtime: Store and retrieve nodes as needed
Shutdown: Cleanly close database
Rotation: Enable online deletion and archival
```

### NodeObject: The Fundamental Storage Unit

The atomic unit of storage in XRPL is the NodeObject:

**Structure:**

```cpp
class NodeObject {
    // Type of object (LEDGER_HEADER, ACCOUNT_NODE, TRANSACTION_NODE)
    NodeObjectType mType;

    // 256-bit unique identifier
    uint256 mHash;

    // Serialized content (variable length)
    Blob mData;

public:
    // Factory: create NodeObject from components
    static std::shared_ptr<NodeObject> createObject(
        NodeObjectType type,
        Blob const& data,
        uint256 const& hash);

    // Access methods
    NodeObjectType getType() const { return mType; }
    uint256 const& getHash() const { return mHash; }
    Blob const& getData() const { return mData; }
};
```

**Key Characteristics:**

1. **Immutable Once Created**: Cannot modify data after creation
2. **Hash as Key**: Hash uniquely identifies the object
3. **Type Distinguishing**: Type prevents hash collisions between different data types
4. **Serialized Format**: Data is already in wire format

**NodeObject Types:**

| Type                 | Purpose                          | Numeric Value |
| -------------------- | -------------------------------- | ------------- |
| hotLEDGER            | Ledger headers and metadata      | 1             |
| hotACCOUNT\_NODE     | Account state tree nodes         | 3             |
| hotTRANSACTION\_NODE | Transaction tree nodes           | 4             |
| hotUNKNOWN           | Unknown/unrecognized types       | 0             |
| hotDUMMY             | Cache marker for missing entries | 512           |

**Type Prefix in Hashing:**

Type fields prevent collisions:

```cpp
// Two different types of data, might have same structure
// Type prefix ensures different hashes

uint256 hash_account = SHA512Half(
    ACCOUNT_TYPE_BYTE || accountData);

uint256 hash_transaction = SHA512Half(
    TRANSACTION_TYPE_BYTE || accountData);

// hash_account != hash_transaction
```

### NodeObject Lifecycle

**Creation**

```
During transaction processing:
  1. Transaction validated and applied to SHAMap
  2. SHAMap nodes modified
  3. Each modified node serialized
  4. NodeObject created with type, hash, serialized data
  5. Stored in NodeStore
```

**Storage**

```
For each NodeObject:
  1. Encode to persistent format
  2. Compute key (same as hash)
  3. Write to database
  4. Backend handles actual I/O
```

**Caching**

```
After storage:
  1. Keep in memory for fast reaccess
  2. Move to cache tier
  3. Evict when cache capacity exceeded
  4. Dummy objects mark "known missing" (avoid repeated lookups)
```

**Retrieval**

```
When SHAMap needs a node:
  1. Check cache (microseconds)
  2. If miss, query database (milliseconds)
  3. Deserialize and validate
  4. Add to cache
  5. Return to SHAMap
```

**Archival**

```
After ledger is validated and no longer current:
  1. May be retained for history
  2. Or moved to archive during rotation
  3. Or deleted based on retention policy
```

### Backend Abstraction

The Backend class defines the minimal interface for any storage system:

**Core Operations:**

```cpp
class Backend {
    // Store single object
    virtual Status store(NodeObject const& object) = 0;

    // Retrieve single object by hash
    virtual Status fetch(uint256 const& hash,
                        std::shared_ptr<NodeObject>& object) = 0;

    // Persist multiple objects atomically
    virtual Status storeBatch(std::vector<NodeObject> const& batch) = 0;

    // Retrieve multiple objects efficiently
    virtual Status fetchBatch(std::vector<uint256> const& hashes,
                             std::vector<NodeObject>& objects) = 0;

    // Lifecycle
    virtual Status open(std::string const& path) = 0;
    virtual Status close() = 0;
    virtual int fdRequired() const = 0;  // File descriptors needed
};
```

**Status Codes:**

```cpp
enum class Status {
    ok,                 // Operation succeeded
    notFound,          // Key doesn't exist
    dataCorrupt,       // Data integrity check failed (fatal)
    backendError       // Backend error
};
```

**Backend Independence:**

NodeStore sits above backends, application logic unchanged:

```cpp
// Same code works with any backend

struct DatabaseConfig {
    std::string type;  // "rocksdb", "nudb", "sqlite", etc.
    std::string path;
    // ... backend-specific options
};

auto backend = createBackend(config);
NodeStore store(backend);

// Application uses NodeStore
store.fetch(hash);  // Works regardless of backend
store.store(node);
```

### Supported Backends

**RocksDB (Recommended for Most Cases)**

```cpp
Backend* createRocksDBBackend(std::string const& path) {
    return new RocksDBBackend(path);
}
```

* Modern key-value store developed by Facebook
* LSM tree (Log-Structured Merge tree) design
* Excellent performance for XRPL workloads
* Built-in compression support
* Active maintenance

**Characteristics:**

* Write throughput: \~10,000-50,000 objects/second
* Read throughput: \~100,000+ objects/second
* Compression: Reduces disk space by 50-70%

**NuDB (High-Throughput Alternative)**

```cpp
Backend* createNuDBBackend(std::string const& path) {
    return new NuDBBackend(path);
}
```

* Purpose-built for XRPL by Ripple
* Append-only design optimized for SSD
* Higher write throughput than RocksDB
* Efficient space utilization

**Characteristics:**

* Write throughput: \~50,000-200,000 objects/second
* Read throughput: \~100,000+ objects/second
* Better for high-volume systems

**Testing Backends**

```cpp
Backend* createMemoryBackend() {
    return new MemoryBackend();  // In-memory, non-persistent
}

Backend* createNullBackend() {
    return new NullBackend();     // No-op backend
}
```

### Data Encoding Format

To enable backend independence, NodeStore uses a standardized encoding:

**Encoded Blob Structure:**

```
Byte Offset | Field | Description
0-7         | Reserved | Set to zero, reserved for future use
8           | Type | NodeObjectType enumeration value
9+          | Data | Serialized object payload (variable length)
```

**Encoding Process:**

```cpp
void encodeNodeObject(NodeObject const& obj, Blob& blob) {
    // Add 8 reserved bytes
    blob.resize(8, 0);

    // Add type byte
    blob.push_back(obj.getType());

    // Add data payload
    blob.append(obj.getData());
}
```

**Decoding Process:**

```cpp
std::shared_ptr<NodeObject> decodeNodeObject(
    uint256 const& hash,
    Blob const& blob)
{
    if (blob.size() < 9) {
        return nullptr;  // Corrupted
    }

    NodeObjectType type = static_cast<NodeObjectType>(blob[8]);

    Blob data(blob.begin() + 9, blob.end());

    return NodeObject::createObject(type, data, hash);
}
```

**Benefits:**

1. **Backend Agnostic**: Any backend can store/retrieve encoded blobs
2. **Self-Describing**: Type embedded, forward-compatible with unknown types
3. **Efficient**: Minimal overhead (8 bytes) per object
4. **Validated**: Type byte catches most corruption

### Database Key as Hash

The database key is the object's hash (not a sequential ID):

```cpp
Status Backend::store(NodeObject const& obj) {
    uint256 key = obj.getHash();      // 256-bit hash as key
    Blob value = encode(obj);          // Encoded blob as value

    return database.put(key, value);   // Key-value store
}
```

**Implications:**

1. **Direct Retrieval**: Any node retrievable by hash
2. **Deduplication**: Identical content produces identical hash → same key
3. **Immutability**: Hash never changes for given data
4. **Verification**: Can verify data by recomputing hash

### Integration Architecture

NodeStore integrates with SHAMap through the Family pattern:

```cpp
// Family provides NodeStore access to SHAMap
class Family {
    virtual std::shared_ptr<NodeStore> getNodeStore() = 0;
    virtual std::shared_ptr<TreeNodeCache> getTreeNodeCache() = 0;
    virtual std::shared_ptr<FullBelowCache> getFullBelowCache() = 0;
};

class NodeFamily : public Family {
    std::shared_ptr<NodeStore> mNodeStore;
    std::shared_ptr<TreeNodeCache> mTreeCache;
    std::shared_ptr<FullBelowCache> mFullBelow;

    // ... implement Family interface
};

// SHAMap uses Family for storage access
class SHAMap {
    std::shared_ptr<Family> mFamily;

    std::shared_ptr<SHAMapTreeNode> getNode(uint256 const& hash) {
        // Try cache first
        auto cached = mFamily->getTreeNodeCache()->get(hash);
        if (cached) return cached;

        // Fetch from NodeStore
        auto obj = mFamily->getNodeStore()->fetch(hash);
        if (obj) {
            auto node = deserializeNode(obj);
            // Cache for future access
            mFamily->getTreeNodeCache()->insert(hash, node);
            return node;
        }

        return nullptr;
    }
};
```

### Summary

**Key Architectural Elements:**

1. **NodeObject**: Atomic storage unit (type, hash, data)
2. **Backend Interface**: Minimal, consistent interface for storage
3. **Abstraction**: Decouples application logic from storage implementation
4. **Encoding Format**: Standardized format enables backend independence
5. **Key as Hash**: Direct retrieval without index lookups
6. **Family Pattern**: Provides access to caching and storage

**Design Properties:**

* **Backend Flexibility**: Switch storage engines without code changes
* **Scale**: Handles millions of objects efficiently
* **Persistence**: Survives crashes and restarts
* **Verification**: Data integrity through hashing
* **Simplicity**: Minimal interface hides complexity

In the next chapter, we'll explore the critical Cache Layer that makes NodeStore practical for high-performance systems.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.xrpl-commons.org/core-dev-bootcamp/module03/nodestore-architecture.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
