Appendix : Debugging and Development Tools

← Back to SHAMap and NodeStore: Data Persistence and State Management


Introduction

This appendix provides techniques and tools for investigating SHAMap and NodeStore behavior.

Logging and Diagnostics

Enable Verbose Logging

Edit rippled.cfg:

[rpc_startup]
command = log_level
severity = debug

[logging]
debug
rpc

Then restart rippled and check logs:

tail -f /var/log/rippled/rippled.log | grep -i nodestore

Key Log Messages

TRACE Ledger:        Ledger opened/closed
DEBUG SHAMap:        Tree operations
DEBUG NodeStore:     Database operations
INFO  Consensus:     Validation and agreement
WARN  Performance:   Slow operations detected

Metrics Inspection

JSON-RPC Inspection

# Get storage metrics
rippled-cli server_info | jq '.result.node_db'

# Expected output:
{
  "type": "RocksDB",
  "path": "/var/lib/rippled/db/rocksdb",
  "cache_size": 256,
  "cache_hit_rate": 0.923,
  "writes": 1000000,
  "bytes_written": 1000000000,
  "reads": 50000000,
  "cache_hits": 46150000,
  "read_latency_us": 15
}

File System Inspection

# Check database size
du -sh /var/lib/rippled/db/*

# Monitor growth
watch -n 1 'du -sh /var/lib/rippled/db/*'

# Check free space
df -h /var/lib/rippled/

# Monitor I/O
iostat -x 1 /dev/sda

Debugging Specific Issues

Issue: Cache Hit Rate Too Low

Symptoms:

  • Database queries slow

  • Ledger close times increasing

  • Hit rate < 80%

Investigation:

# Check cache metrics
rippled-cli server_info | jq '.result.node_db.cache_hit_rate'

# Check cache size configuration
grep cache_size rippled.cfg

# Monitor cache evictions
tail -f /var/log/rippled/rippled.log | \
  grep -i "evict\|cache"

Solutions:

  1. Increase cache_size if memory available

  2. Reduce cache_age for faster eviction of cold data

  3. Check if system is memory-constrained (use free)

Issue: Write Performance Degradation

Symptoms:

  • Ledger closes slow (>10 seconds)

  • Database write errors in logs

  • Validator falling behind network

Investigation:

# Check write latency
rippled-cli server_info | \
  jq '.result.node_db.write_latency_us'

# Monitor disk I/O
iotop -o -b -n 1

# Check disk space
df -h

# Monitor async queue
tail /var/log/rippled/rippled.log | \
  grep -i "async.*queue"

Solutions:

  1. Ensure SSD (not HDD) for database

  2. Check disk I/O isn't saturated

  3. Increase async_threads if I/O bound

  4. Switch to faster backend (NuDB vs RocksDB)

  5. Enable compression if disk is bottleneck

Issue: Synchronization Slow

Symptoms:

  • New nodes take hours to sync

  • Falling behind network

  • High database query count

Investigation:

# Monitor sync progress
rippled-cli server_info | jq '.result.ledger.ledger_index'

# Track fetch operations
tail -f /var/log/rippled/rippled.log | \
  grep -i "fetch\|sync"

# Monitor thread pool
ps -p $(pidof rippled) -L

# Check queue depths
rippled-cli server_info | jq '.result.node_db.async_queue_depth'

Solutions:

  1. Increase cache size for better hit rate during sync

  2. Increase async_threads (more parallel fetches)

  3. Use faster SSD

  4. Check network bandwidth (might be bottleneck)

  5. Switch to NuDB for higher throughput

Code Debugging

Building with Debug Symbols

cd rippled
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Debug ..
make -j4

GDB Debugging

# Run under GDB
gdb --args rippled --conf /path/to/rippled.cfg

# Inside GDB:
(gdb) run
(gdb) break SHAMap::addKnownNode
(gdb) continue

# When breakpoint hit:
(gdb) print node->getHash()
(gdb) print nodeID
(gdb) step
(gdb) quit

Common Breakpoints

// Node addition
break SHAMap::addKnownNode
break Database::store

// Cache operations
break TaggedCache::get
break TaggedCache::insert

// Synchronization
break SHAMap::getMissingNodes
break NodeStore::fetchNodeObject
(gdb) print node->getHash().hex()
(gdb) print nodeID.mDepth
(gdb) print nodeID.mNodeID.hex()
(gdb) print metrics.cacheHits
(gdb) print metrics.cacheMisses

Performance Profiling

CPU Profiling with Perf

# Record 60 seconds of system behavior
perf record -F 99 -p $(pidof rippled) -- sleep 60

# Analyze results
perf report

# Show flame graph
perf record -F 99 -p $(pidof rippled) -- sleep 60
perf script | stackcollapse-perf.pl | flamegraph.pl > profile.svg

Memory Profiling with Valgrind

# Run under memcheck (very slow)
valgrind --leak-check=full rippled --conf rippled.cfg

# Run specific test
valgrind --leak-check=full rippled --unittest test.nodestore

Custom Instrumentation

Add to rippled source:

// In Database::fetchNodeObject
auto startTime = std::chrono::steady_clock::now();

auto obj = mBackend->fetch(hash);

auto elapsed = std::chrono::steady_clock::now() - startTime;
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(elapsed);

if (ms.count() > 100) {
    JLOG(mLog.warning()) << "Slow fetch: " << hash.hex()
                        << " took " << ms.count() << "ms";
}

Test-Driven Debugging

Running Unit Tests

# Build with tests enabled
cd rippled/build
cmake -DENABLE_TESTING=ON ..

# Run all tests
ctest

# Run specific test
ctest -R "shamap" -V

# Run single test file
./bin/rippled --unittest test.SHAMap

Writing Debug Tests

// In rippled/src/test/shamap_test.cpp
SECTION("Debug specific case") {
    // Create SHAMap
    auto shamap = std::make_shared<SHAMap>(...);

    // Add nodes
    shamap->addRootNode(...);

    // Test operation
    auto node = shamap->getNode(hash);

    // Assert behavior
    REQUIRE(node != nullptr);
    REQUIRE(node->getHash() == expectedHash);
}

Useful Commands

Check Configuration

grep -E "^[a-z]|^\[" rippled.cfg | head -30

Monitor in Real Time

# CPU/Memory
top -p $(pidof rippled)

# Disk I/O
iotop -p $(pidof rippled)

# Network traffic
netstat -an | grep ripple

# File descriptors
lsof -p $(pidof rippled) | wc -l

Database Inspection

For RocksDB:

# Use RocksDB tools
rocksdb_ldb --db=/var/lib/rippled/db/rocksdb scan

# List files
ls -lah /var/lib/rippled/db/rocksdb/

Log Analysis

# Count errors
grep ERROR /var/log/rippled/rippled.log | wc -l

# Find slow operations
grep "took.*ms" /var/log/rippled/rippled.log

# Timeline of events
tail -f /var/log/rippled/rippled.log | \
  awk '{print $1" "$2" "$3" "$4" ..."}'

Performance Regression Testing

Benchmark Before/After

# Get baseline
rippled --unittest test.SHAMap > baseline.txt 2>&1

# Modify code...

# Test after change
rippled --unittest test.SHAMap > modified.txt 2>&1

# Compare
diff baseline.txt modified.txt

Load Testing

# Submit transactions and measure
./load_test.sh --transactions 1000 --duration 60

# Monitor metrics
watch -n 1 'rippled-cli server_info | jq ".result.node_db"'

Common Issues and Solutions

Issue
Investigation
Solution

High cache miss rate

Cache metrics

Increase cache_size

Slow sync

Fetch latency

Increase async_threads

Disk full

df -h

Enable online_delete

Memory leak

Valgrind

Fix code (likely nodes not freed)

Hang on startup

strace

Check database corruption

Consensus failing

Logs for validation errors

Check NodeStore consistency


See Appendix A for codebase navigation to find files mentioned here.

Last updated