September 2, 2022

Decentralized Database — A Trustless System to Enable New Functionality

Luke Lamey

Centralized databases operate under a “Don’t be Evil” standard. Decentralized databases operate under a “Can’t be Evil” standard.

At Kwil, we promote that we are building a decentralized database for data-intensive dApps and protocols. However, what does a decentralized database offer vis-a-vis a traditional database? Here I discuss some of the differentiating features of a decentralized database, with an example case study below.

We at Kwil define a decentralized database as a permissionless, trustless, fault-tolerant network for storing data. Further, we ensure data parity across a series of nodes using modern consensus algorithms, allowing for the enforcement of both the authenticity and provenance of data writes. Whereas traditional databases used for Web 2.0 applications/services are owned and maintained by centralized entities, decentralized databases are owned, maintained, and funded by users and communities.

A core feature of a decentralized database is that it moves trust from a centralized entity back to individuals. Rather than trusting a single entity to validate data writes across a network, users are guaranteed valid data writes through consensus mechanisms across individual nodes. For data reads, applications can adopt common blockchain API and gateway service architectures, or leverage mechanisms like Space and Time’s Proof of SQL to ensure that valid data is not being tampered with as it moves from the database back to the client. Companies using traditional databases have little option but to use internal policies as the mechanism by which to mitigate bad actors, application bugs, or simple mistakes made by DBAs.

New Functionality #1: Data Authenticity & Provenance

The first new functionality provided by a decentralized database is a cryptographic guarantee of data authenticity and provenance. Authenticity means that data can be proven to be from the correct owner. Because nodes must come to a consensus on data writes, authorship, and order, anyone who uses the decentralized database can trust that the data is authentic, validating that the data authorship and origin are uncorrupted.

Improving authenticity and provenance opens up opportunities for traceability solutions that have historically required guarantees unavailable to traditional databases. Users will now be better able to understand the flow of information and even how data moves through entire ecosystems. A great use case for improved authenticity is in supply chain management. For example, a food supplier could create a table on Kwil, giving its supply chain participants permissionless writing functionality to the table. When a food recall occurs, trustless data authenticity, as well as an immutable (and verifiable) history of the evolution of that data, enables manufacturers and their direct/indirect suppliers to easily and accurately trace the source of contamination, leading to faster remediation, corrective process improvements, and lower costs incurred as a result of the recall.

New Functionality #2: Tamperproof Data Writes

The second new functionality that a decentralized database provides is tamperproof data writes. Similar to how data authenticity allows the verifiable proof of data ownership, tamperproof data writes allows the verifiable proof that data is not corrupted or illegitimately changed once it is stored in the database. Enabled by the consensus mechanism, data cannot be tampered with unless nodes agree that such a modification is being done by valid users. In contrast with centralized databases, this means that no single actor can illegitimately delete, edit, or otherwise disrupt the data.

New Functionality #3: Censorship Resistance

In a decentralized database, nodes are distinguished by owner and geography. In theory, each node is owned by a different user or entity and is geographically distributed across the globe. Because active database states are distributed and replicated across a series of geographically distributed nodes, no single actor can forcibly remove or censor data. If a company wishes to censor data, it cannot simply shut down its node, as active states of the database are running across other nodes which the company does not control. Similarly, if a government wishes to censor data, it cannot forcibly shut down the system, as many of the nodes are operating outside of its sovereignty.

New Functionality #4: Disaster Recovery

A decentralized database also allows for quick disaster recovery. If a node crashes while serving a user, the user can simply connect to another node that is also serving this data set. In the unlikely event that all nodes serving a database crash, old database states can be stored on permanent storage protocols like Arweave. This means that even in the worst-case scenario, users can still query data revisions at any time, allowing them to see the history of changes, maintain consistent access to the data, and even recover data if needed. This is extremely useful in the case of a ransomware attack, where data owners and applications can seamlessly restore old database states, ensuring minimal disruption of services.

A Case Study

To illustrate the functionalities a decentralized database enables, let’s use the example of Yelp, an online platform where customers can rate and review local businesses. Numerous small businesses have accused Yelp of manipulating customer reviews. The FTC has received over 2000 complaints about Yelp’s practices, including allegations that Yelp buries positive reviews and displays negative ones as a selling tool, forcing more businesses to pay for advertising. A simple Google search, “How to remove negative Yelp reviews” yields almost 8 million results, many of which are companies built around helping other businesses suppress negative reviews and promote positive reviews, regardless of if the Yelp filter determines that the business reviews are legitimate.

In a permissionless, open, and transparent internet, centralized entities should not be able to manipulate the flow of valid information. In the case of Yelp, users have no way of knowing whether the reviews they read are legitimate or have been arbitrarily boosted/suppressed by purchasing Yelp ads or contracting a third party. In the Web 2.0 model, users simply have to trust Yelp that its results are valid. No means exist to validate whether or not data is being tampered with.

For Yelp, a decentralized database means that a single company cannot manipulate or remove data. In a decentralized database, data is owned by a creator, and only the creator can modify his or her data. If a Yelp user were to write a review to a decentralized database, then only the Yelp user can modify/update/delete his or her review. Although Yelp could (and probably should) filter data from the decentralized database into the client (e.g. to omit spam), any user can validate that the data has not been manipulated or tampered with by checking the decentralized database. A decentralized database is an auditable system for anyone to be able to check and ensure that data is not being manipulated, deleted, or otherwise tampered with.

Yelp has an incentive to prove to its users that it is not manipulating data. Amid accusations of misconduct, Yelp has adamantly denied that businesses can purchase advertising to change, promote, or suppress reviews. If Yelp wishes to build confidence with businesses and prove that it is not manipulating data, then it has an incentive to leverage a decentralized database and allow users to validate the authenticity of its data. For a company like Yelp, a decentralized database is an excellent way to restore confidence between the application and its users.

Simply put, a decentralized database shifts trust from centralized entities back to the consumers.

Concluding Thoughts

For data-intensive dApps and Protocols, the implications of a decentralized, community-owned database are significant. If the next generation of the internet is going to be open, transparent, and trustless, then a decentralized database is an essential infrastructure layer to return data ownership to individuals. At Kwil, we could not be more excited to be providing this crucial infrastructure component for the future of Web 3.0.

Are you a developer interested in contributing to Kwil? Or do you have a project that could benefit from using Kwil? Join our community here: https://discord.gg/dH6rVQvD

Are you interested in making the leap and building with KwilDB full-time? We are hiring! Open roles can be found here: https://kwil.com/careers.