Hash Functions in XRPL

Introduction

Hash functions are the workhorses of cryptographic systems. While signatures prove authorization and keys establish identity, hash functions ensure integrity and enable efficient data structures. In XRPL, hash functions are everywhere—transaction IDs, ledger object keys, Merkle trees, address generation, and more.

This chapter explores how XRPL uses hash functions, why specific algorithms were chosen, and how they provide the integrity guarantees the system depends on.

What is a Cryptographic Hash Function?

A cryptographic hash function takes arbitrary input and produces a fixed-size output:

Input (any size)  →  Hash Function  →  Output (fixed size)

"Hello"          →  sha512Half  →  0x7F83B165...
"Hello World!"   →  sha512Half  →  0xA591A6D4...
[1 MB file]      →  sha512Half  →  0x3C9F2A8B...

Required Properties

1. Deterministic

sha512Half("Hello") == sha512Half("Hello")  // Always true
// Same input always produces same output

2. Fast to Compute

// Can hash gigabytes per second
auto hash = sha512Half(largeData);  // Microseconds to milliseconds

3. Avalanche Effect

sha512Half("Hello")  → 0x7F83B165...
sha512Half("Hello!") → 0xC89F3AB2...  // Completely different!
// One bit change → ~50% of output bits flip

4. Preimage Resistance (One-Way)

// Given hash, cannot find input
uint256 hash = 0x7F83B165...;
// No way to compute: input = reverse_hash(hash);

5. Collision Resistance

// Cannot find two inputs with same hash
// sha512Half(x) == sha512Half(y) where x != y
// Computationally infeasible

SHA-512-Half: The Primary Workhorse

Why SHA-512-Half?

// Not SHA-256, but SHA-512 truncated to 256 bits
template <class... Args>
uint256 sha512Half(Args const&... args)
{
    sha512_half_hasher h;
    hash_append(h, args...);
    return static_cast<typename sha512_half_hasher::result_type>(h);
}

Why truncate SHA-512 instead of using SHA-256?

Performance on 64-bit processors:

SHA-512: Operates on 64-bit words → ~650 MB/s on modern CPUs
SHA-256: Operates on 32-bit words → ~450 MB/s on modern CPUs

SHA-512-Half = SHA-512 speed + SHA-256 output size

On 64-bit systems (which all modern servers are), SHA-512 is faster than SHA-256 despite producing more output. By truncating to 256 bits, we get the best of both worlds.

Implementation

// From src/libxrpl/protocol/digest.cpp
class sha512_half_hasher
{
private:
    SHA512_CTX ctx_;

public:
    using result_type = uint256;

    sha512_half_hasher()
    {
        SHA512_Init(&ctx_);
    }

    void operator()(void const* data, std::size_t size) noexcept
    {
        SHA512_Update(&ctx_, data, size);
    }

    operator result_type() noexcept
    {
        // Compute full SHA-512 (64 bytes)
        std::uint8_t digest[64];
        SHA512_Final(digest, &ctx_);

        // Return first 32 bytes (256 bits)
        result_type result;
        std::memcpy(result.data(), digest, 32);
        return result;
    }
};

Usage Throughout XRPL

Transaction IDs:

uint256 STTx::getTransactionID() const
{
    Serializer s;
    s.add32(HashPrefix::transactionID);
    addWithoutSigningFields(s);
    return sha512Half(s.slice());
}

Ledger Object Keys:

uint256 keylet::account(AccountID const& id)
{
    return sha512Half(
        HashPrefix::account,
        id);
}

Merkle Tree Nodes:

uint256 SHAMapInnerNode::getHash() const
{
    if (hashValid_)
        return hash_;

    Serializer s;
    for (auto const& child : children_)
        s.add256(child.getHash());

    hash_ = sha512Half(s.slice());
    hashValid_ = true;
    return hash_;
}

Secure Variant: sha512Half_s

// Secure variant that erases internal state
uint256 sha512Half_s(Slice const& data)
{
    sha512_half_hasher h;
    h(data.data(), data.size());
    auto result = static_cast<uint256>(h);

    // Hasher destructor securely erases internal state
    // This prevents sensitive data from lingering in memory
    return result;
}

When to use the secure variant:

  • Hashing secret keys or seeds

  • Deriving keys from passwords

  • Any operation involving sensitive data

Why it matters:

// Regular variant
auto hash1 = sha512Half(secretData);
// SHA512_CTX still contains secretData fragments in memory

// Secure variant
auto hash2 = sha512Half_s(secretData);
// SHA512_CTX is securely erased

RIPESHA: Address Generation

The Double Hash

class ripesha_hasher
{
private:
    openssl_sha256_hasher sha_;

public:
    using result_type = ripemd160_hasher::result_type;  // 20 bytes

    void operator()(void const* data, std::size_t size) noexcept
    {
        // First: SHA-256
        sha_(data, size);
    }

    operator result_type() noexcept
    {
        // Get SHA-256 result (32 bytes)
        auto const sha256_digest =
            static_cast<openssl_sha256_hasher::result_type>(sha_);

        // Second: RIPEMD-160 of the SHA-256
        ripemd160_hasher ripe;
        ripe(sha256_digest.data(), sha256_digest.size());
        return static_cast<result_type>(ripe);  // 20 bytes
    }
};

Why Two Hash Functions?

1. Defense in Depth

If SHA-256 is broken:
  RIPEMD-160 provides second layer
If RIPEMD-160 is broken:
  SHA-256 provides protection
Breaking both: requires defeating two independent algorithms

2. Compactness

Public Key:    33 bytes
  ↓ SHA-256
SHA-256 hash:  32 bytes
  ↓ RIPEMD-160
Account ID:    20 bytes (40% smaller than public key)

3. Quantum Resistance (Partial)

Quantum computers may break elliptic curves:
  PublicKey → SecretKey (vulnerable)

But cannot reverse hashes:
  AccountID ↛ PublicKey (still secure)

This provides time to upgrade the system if quantum computers emerge.

Usage

// Calculate account ID from public key
AccountID calcAccountID(PublicKey const& pk)
{
    ripesha_hasher h;
    h(pk.data(), pk.size());
    return AccountID{static_cast<ripesha_hasher::result_type>(h)};
}

// Calculate node ID from public key
NodeID calcNodeID(PublicKey const& pk)
{
    ripesha_hasher h;
    h(pk.data(), pk.size());
    return NodeID{static_cast<ripesha_hasher::result_type>(h)};
}

SHA-256: Checksum and Encoding

Double SHA-256 for Base58Check

// From src/libxrpl/protocol/tokens.cpp
std::string encodeBase58Token(
    TokenType type,
    void const* token,
    std::size_t size)
{
    std::vector<uint8_t> buffer;
    buffer.push_back(static_cast<uint8_t>(type));
    buffer.insert(buffer.end(), token, token + size);

    // Compute checksum: first 4 bytes of SHA-256(SHA-256(data))
    auto const hash1 = sha256(makeSlice(buffer));
    auto const hash2 = sha256(makeSlice(hash1));

    // Append checksum
    buffer.insert(buffer.end(), hash2.begin(), hash2.begin() + 4);

    // Base58 encode
    return base58Encode(buffer);
}

Why double SHA-256?

Historical reasons (inherited from early cryptocurrency designs):

  • Provides defense against length-extension attacks

  • Standard pattern for checksums

  • Well-tested over many years

Checksum properties:

4 bytes = 32 bits = 2^32 possible values

Probability of random corruption matching checksum: 1 in 4,294,967,296

Effectively catches all typos and errors.

Hash Prefixes: Domain Separation

// From include/xrpl/protocol/HashPrefix.h
enum class HashPrefix : std::uint32_t
{
    transactionID       = 0x54584E00,  // 'TXN\0'
    txSign              = 0x53545800,  // 'STX\0'
    txMultiSign         = 0x534D5400,  // 'SMT\0'
    manifest            = 0x4D414E00,  // 'MAN\0'
    ledgerMaster        = 0x4C575200,  // 'LWR\0'
    ledgerInner         = 0x4D494E00,  // 'MIN\0'
    ledgerLeaf          = 0x4D4C4E00,  // 'MLN\0'
    accountRoot         = 0x41525400,  // 'ART\0'
};

Why use prefixes?

Prevent cross-protocol attacks where a hash from one context is used in another:

// Without prefixes (BAD):
hash_tx  = SHA512Half(tx_data)
hash_msg = SHA512Half(msg_data)

// If tx_data == msg_data, then hash_tx == hash_msg
// Could cause confusion/attacks

// With prefixes (GOOD):
hash_tx  = SHA512Half(PREFIX_TX,  tx_data)
hash_msg = SHA512Half(PREFIX_MSG, msg_data)

// Even if tx_data == msg_data, hash_tx != hash_msg

Example Usage

// Transaction ID
uint256 getTransactionID(STTx const& tx)
{
    Serializer s;
    s.add32(HashPrefix::transactionID);  // Add prefix first
    tx.addWithoutSigningFields(s);
    return sha512Half(s.slice());
}

// Signing data (different prefix, different hash)
uint256 getSigningHash(STTx const& tx)
{
    Serializer s;
    s.add32(HashPrefix::txSign);  // Different prefix
    tx.addWithoutSigningFields(s);
    return sha512Half(s.slice());
}

Incremental Hashing

Hash functions can process data incrementally:

// Instead of hashing all at once:
auto hash = sha512Half(bigData);  // Requires loading all data

// Can hash incrementally:
sha512_half_hasher h;
h(chunk1.data(), chunk1.size());
h(chunk2.data(), chunk2.size());
h(chunk3.data(), chunk3.size());
auto hash = static_cast<uint256>(h);

Benefits:

  • Stream large files without loading into memory

  • Hash complex data structures field by field

  • More efficient for large inputs

Example: Hashing a transaction

Serializer s;
s.add32(HashPrefix::transactionID);
s.addVL(tx.getFieldVL(sfAccount));
s.addVL(tx.getFieldVL(sfDestination));
s.add64(tx.getFieldU64(sfAmount));
// ... more fields ...

return sha512Half(s.slice());

Hash Collisions: Why We Don't Worry

Birthday Paradox

The "birthday attack" on a 256-bit hash requires:

Number of hashes to find collision = 2^(256/2) = 2^128

2^128 = 340,282,366,920,938,463,463,374,607,431,768,211,456

If you could compute 1 trillion hashes per second:
Time = 2^128 / (10^12) seconds
     = 10^25 years

(Universe age ≈ 10^10 years)

Conclusion: Collision attacks on SHA-512-Half are not feasible with current or foreseeable technology.

Collision Resistance in Practice

// XRPL relies on collision resistance for:

// 1. Transaction IDs must be unique
uint256 txID = sha512Half(tx);

// 2. Ledger object keys must not collide
uint256 accountKey = sha512Half(HashPrefix::account, accountID);

// 3. Merkle tree integrity
uint256 nodeHash = sha512Half(leftChild, rightChild);

A collision in any of these would be catastrophic, but the probability is negligible.

Performance Considerations

Hashing Speed

// Benchmark results (approximate, hardware-dependent):

SHA-512-Half: ~650 MB/s
SHA-256:      ~450 MB/s
RIPEMD-160:   ~200 MB/s

For 1 KB transaction:
SHA-512-Half: ~1.5 microseconds

Caching Hashes

class SHAMapNode
{
private:
    uint256 hash_;
    bool hashValid_;

public:
    uint256 getHash() const
    {
        if (hashValid_)
            return hash_;  // Return cached value

        // Compute hash (expensive)
        hash_ = computeHash();
        hashValid_ = true;
        return hash_;
    }

    void invalidateHash()
    {
        hashValid_ = false;  // Force recomputation next time
    }
};

Why cache?

  • Merkle tree nodes are hashed repeatedly

  • Caching avoids redundant computation

  • Invalidate when node contents change

Hash Function Summary

Function
Output Size
Speed
Primary Use

SHA-512-Half

256 bits

~650 MB/s

Transaction IDs, object keys, Merkle trees

SHA-256

256 bits

~450 MB/s

Base58Check checksums

RIPEMD-160

160 bits

~200 MB/s

Part of RIPESHA (address generation)

RIPESHA

160 bits

~300 MB/s

Account IDs, node IDs

Best Practices

✅ DO:

  1. Use sha512Half for new protocols

    uint256 hash = sha512Half(data);  // Fast and standard
  2. Use hash prefixes for domain separation

    uint256 hash = sha512Half(HashPrefix::custom, data);
  3. Cache computed hashes when appropriate

    if (cached)
        return cachedHash;
    cachedHash = sha512Half(data);
    return cachedHash;
  4. Use secure variant for sensitive data

    uint256 hash = sha512Half_s(secretData);

❌ DON'T:

  1. Don't use non-cryptographic hashes for security

    std::hash<std::string>{}(data);  // ❌ NOT SECURE
  2. Don't implement your own hash function

    uint32_t myHash(data) { /* ... */ }  // ❌ Don't do this
  3. Don't assume hashes are unique without checking

    // Even though collisions are infeasible, handle errors gracefully
    if (hashExists(newHash))
        handleCollision();  // Paranoid but correct

Summary

Hash functions in XRPL provide:

  1. Integrity: Detect any data modification

  2. Identification: Unique IDs for transactions and objects

  3. Efficiency: Fast computation on modern CPUs

  4. Security: Collision and preimage resistance

Key algorithms:

  • SHA-512-Half: Primary hash (fast on 64-bit systems)

  • RIPESHA: Address generation (compact, defense in depth)

  • SHA-256: Checksums (standard, well-tested)

Usage patterns:

  • Always use hash prefixes for domain separation

  • Cache hashes when recomputed frequently

  • Use secure variants for sensitive data

  • Trust collision resistance but code defensively

In the next chapter, we'll explore Base58Check encoding and how XRPL makes binary data human-readable.

Last updated