Decoding Red Stuff: Walrus's Engine for Resilient and Efficient Storage

Understanding the 2D erasure coding protocol at the heart of Walrus, and how it solves the trade-offs faced by existing decentralized storage systems.

Infrastructure

Get Started

Walrus's Red Stuff, a two-dimensional (2D) erasure coding protocol that defines how data is converted for storage, is at the heart of Walrus, enabling efficient, secure, and highly available decentralized storage.
Red Stuff enables Walrus to solve for the traditional trade-offs of decentralized storage, providing security, replication efficiency, and fast data recovery.
By using a matrix-based encoding process to create primary and secondary slivers, Red Stuff enables lightweight "self-healing," which unlocks rapid data recovery using minimal network bandwidth.
For builders, Red Stuff’s innovation translates to a storage network that’s more cost-effective, performant, resilient, and scalable.

Walrus’s Red Stuff encoding protocol, which defines how data is converted for storage, is at the heart of how Walrus provides high availability and integrity for blobs at scale.

Centralized cloud storage, while performant, introduces single points of failure, censorship risks, and misaligned incentives. Decentralized storage networks aim to improve on this by providing a credibly-neutral system for persistent data, but they face a set of fundamental trade-offs in order to do so.

Red Stuff allows Walrus to directly address the traditional trade-offs of decentralized storage with high security and efficient, uninterrupted availability, even in the case of storage node churn and outages. Overcoming these tradeoffs allows Walrus to match the performance of centralized cloud storage solutions without sacrificing the benefits of decentralization.

‍

Understanding the trade-offs of decentralized storage

Unlike centralized cloud storage, where data is managed by a single provider’s infrastructure, decentralized storage systems distribute data across many independent nodes. This improves resilience and censorship-resistance.

But, because storage nodes can leave the network or fail without warning (creating a ‘high churn’ environment), decentralization introduces new challenges in ensuring data remains available and durable. Availability can be achieved by replicating data across the network’s nodes; when individual nodes fail or go offline, the data they store can be rebuilt from other nodes’ redundant copies.

The unique challenges of churn and replication mean that decentralized storage networks must weigh fundamental trade-offs between several priorities:

Cost efficiency, which can be improved by minimizing replication overhead
Data durability and availability, which requires efficient data recovery, and the ability to continue operating correctly even when some storage nodes fail (fault tolerance)
Performance, including data retrieval latency and recovery speed
Security and trust guarantees, which include maintaining a sufficiently distributed and verifiable set of storage nodes

The method that a decentralized storage network uses to achieve data persistence in a high-churn environment directly impacts the trade-off between these priorities. Two commonly-used methods include full replication and one-dimensional (1D) erasure coding.

Full replication involves storing multiple complete copies of the data across different storage nodes. While this approach simplifies data retrieval and recovery — a new node simply needs to download one full copy from a peer — it suffers from extremely high storage overhead. To achieve a high degree of security there must be many copies of the data stored across the network. This makes full replication prohibitively expensive for large-scale data storage.
1D erasure coding, like Reed-Solomon (RS) encoding, offers a more space-efficient alternative. A file is split into data fragments, then ‘parity fragments’, or redundant pieces of data, are created mathematically from the original data fragments. The data file can be reconstructed from a subset of the fragments, significantly reducing the number of copies that must be stored across the network to achieve the same level of security as in full replication. However, 1D erasure coding has a critical weakness: to repair a single lost or corrupted fragment, the number of fragments that the node must download from its peers requires a data transfer equivalent to the entire original file size. In a dynamic network with frequent node churn, this high-bandwidth repair process becomes a bottleneck, limiting scalability and increasing operational costs.

The trade-offs between these common methods means that most decentralized storage networks must choose between the high cost of full replication and the inefficient recovery of 1D erasure coding.

Blobs and decentralized storage

Walrus is tailored to support binary large object (blob) storage, allowing builders to store, read, manage, and program large data and media files. Blob storage is valued for its flexibility, scalability, and ability to store traditionally hard-to-accommodate unstructured data.

Blobs are particularly useful for handling large data files — “large object” is in their very name. When it comes to storing and retrieving blobs, the storage overhead required for full replication isn’t practical, and the inefficient recovery of traditional erasure coding can represent a significant recovery bottleneck. Neither of the most common decentralized storage replication methods sufficiently meets the needs of blob storage at scale.

‍

Red Stuff: efficiency, without sacrificing recovery

One of Walrus’ core technical innovations is Red Stuff, a two-dimensional (2D) erasure coding protocol. Red Stuff is key to how Walrus can provide decentralized blob storage that’s highly redundant and secure, without sacrificing data recovery — even in the case of storage node churn and outages.

Red Stuff provides the storage efficiency of erasure coding, offering the same durability as full replication with a fraction of the replication overhead. It also solves the high-bandwidth recovery problem of 1D erasure coding methods like RS encoding, offering an efficient self-healing method.

For a full technical overview of how Red Stuff’s novel 2D erasure coding method works, read the Walrus Whitepaper. At a high level, while 1D erasure encoding fragments the original data one way (one dimension), 2D erasure encoding fragments the original data in two ways (two dimensions), which allows for more granular and efficient data recovery.

The steps Red Stuff follows to encode a data blob help illustrate that process:

Matrix formation: The original data blob is first organized into a matrix of rows and columns. This matrix forms the basis of the 2D structure.
Primary encoding: Each of the columns in the initial matrix is treated as an independent data block. These columns are individually erasure-coded, which extends them into a larger, intermediate matrix. The source symbols in each row of this final matrix constitute a primary sliver.
Secondary encoding: In parallel, the rows of the initial matrix are also treated as independent data blocks. Each of these rows is erasure-coded, which extends them into a larger, intermediate matrix. The source symbols in each column of this final matrix constitutes a secondary sliver.
Sliver pair distribution: Following the encoding, the protocol assigns a unique pair of slivers — one primary sliver and one secondary sliver — to each of the storage nodes in Walrus’ active committee.

This two-dimensional structure is the key to Red Stuff's efficiency. A node's primary sliver contains encoded data from all columns of the original matrix, while its secondary sliver contains encoded data from all rows. This redundancy across two dimensions enables Red Stuff’s self-healing system, which makes data recovery highly efficient compared to 1D erasure encoding.

Self-healing

One of Red Stuff’s operational advantages is its capacity for efficient "self-healing," which underpins the protocol's resilience to node churn while ensuring that data remains available.

Red Stuff’s 2D encoding structure minimizes the bandwidth required to reconstruct data, solving for the massive data transfer that is 1D erasure coding’s primary weakness. While recovery in 1D erasure coding requires downloading a proportional amount of data to the entire file, the amount of data a node needs to download in a 2D erasure coding system is proportional to the size of a single sliver, making recovery lightweight and scalable.

There is a separate process for recovering each of the slivers in a storage node’s pair:

Secondary sliver recovery: A node that has crashed and come back online, or a new node joining the active committee, can recover its assigned secondary sliver first. It does so by querying just 1/3 other nodes. Because 2/3 of nodes hold the data slivers for each blob, the recovering node is guaranteed to find enough peers to respond and successfully reconstruct its secondary sliver.
Primary sliver recovery: Recovering a primary sliver requires a higher threshold of 2/3 peer responses, but the process is similar.

This lightweight recovery makes the Walrus network highly resilient to node churn and makes onboarding new nodes operationally viable without congesting the network.

‍

Additional Red Stuff benefits: data integrity and fault tolerance

The benefits that Red Stuff brings to Walrus as its encoding protocol expand beyond solving the decentralization trade-off between security, replication efficiency, and efficient data recovery.

Data integrity

To ensure data integrity and defend against malicious nodes who might provide corrupted or inconsistently-encoded data, Red Stuff authenticates data structures through a vector commitment process:

Sliver commitments: For each of the primary slivers and secondary slivers generated during encoding, a cryptographic vector commitment is computed. The commitment can be later used to prove that a specific value exists at a specific position in that sliver. This allows any party to trustlessly verify that data exists in a sliver without needing the entire sliver.
Blob commitment: After generating commitments for all individual slivers, a final blob commitment is generated over the list of all the sliver commitments. This single, top-level commitment acts as a unique and verifiable fingerprint for the entire encoded state of the blob.
Blob ID generation: The globally unique identifier for the blob is created by cryptographically hashing the blob commitment along with other relevant metadata.

This commitment process provides end-to-end verifiability throughout the data lifecycle on Walrus. When a storage node receives its sliver pair, it can verify that the slivers are consistent with the publicly known commitments. Similarly, when a reader retrieves slivers from the network, it can verify their authenticity against the commitments before reconstructing the data.

This prevents attacks where a malicious node might serve invalid data, as the invalid data would fail verification against the original writer's commitment.

Fault tolerance

Another unique feature of the Red Stuff protocol is the different quorum thresholds of honest and active storage nodes that it requires to operate. These quorums support Walrus' security, liveness, and fault tolerance.

In order to write a data blob to Walrus, a quorum of 2/3 of the storage nodes in Walrus’ active set must acknowledge receiving the blob’s sliver pairs, verifying their sliver commitments. But, when reading and retrieving a blob, the client only needs to collect and verify slivers from a 1/3 quorum of nodes.

As opposed to the 2/3 quorum of nodes Red Stuff requires to write, Red Stuff’s lower 1/3 read quorum makes reads on Walrus extremely resilient. The use of differing quorums is mirrored in Red Stuff’s self-healing process, which requires a 1/3 quorum to recover a secondary sliver and a 2/3 quorum for primary sliver reconstruction.

By defining different quorums, Red Stuff allows Walrus to have highly efficient recovery operations among honest nodes, as it requires fewer participants. It sets a higher bar for critical operations, like data reconstruction and storage proofs, guaranteeing that any successful operation has been validated by a sufficient number of honest participants.

You can learn where generating commitments, creating a blob ID, and the process for reading and writing a blob on Walrus, each sit within the lifecycle storing a blob on Walrus in our post here.

‍

Get started with Walrus

The Red Stuff protocol is what allows Walrus to solve for the traditional trade-offs of decentralized storage. For builders, Red Stuff’s innovation translates to a storage network that is more:

Cost-effective, achieving high security with a low storage overhead of that greatly improves over full replication models.
Performant, unlocks high availability and integrity for blobs at scale, making Walrus a more performant decentralized storage system that can serve as a true alternative to traditional cloud storage.
Resilient and scalable, with its lightweight self-healing mechanism enabling the network to seamlessly manage node churn without high bandwidth costs, ensuring data is continuously available.

Alongside Walrus’ other core technical innovations, including its data programmability, these advantages make Walrus a next-generation data storage platform for building decentralized applications. By representing Walrus-stored blobs as objects on Sui, developers can use Move smart contracts to interact with and manage their data in novel ways, like automating blob lifecycle management, developing dynamic interactions between onchain and offchain data, and allowing for onchain data verification.

Learn about Walrus and check out the Walrus documentation to start building with Walrus today! Need more inspiration? Explore Awesome Walrus repo for a curated list of developer tools and infrastructure projects in the Walrus ecosystem.