Data accuracy is an overlooked nuiscance in Web3. It plagues RPC calls and all blockchain interactions. Incorrect data from blockchains can lead to incorrect token balances, erroneous smart contract execution, compromised reputations systems, failed transactions, and misleading price information on decentralized exchanges. The problem could be exacerbated by the rise of AI…
According to Hackernoon, “For the simple reason that they were giving their users contradictory information, a popular NFT team lost tens of thousands of subscribers.” The story is repeated elsewhere, as Alchemy’s data accuracy article similarly exclaims that frontend bugs caused by unreliable data led to a whopping $2,000,000 loss!
By now, we hope that you know a little bit about RPC. In my previous article on Web3 RPC Developer Woes, we talked about how RPC sometimes returns inaccurate, invalid, and wildly stale data. Now, I want to talk about how and why we ensure the data we receive is coming correctly from the blockchain- and the ways in which unreliable data can manifest into costly mistakes.
There are problems that everyone has but nobody thinks about and problems that few people have but everyone thinks about. One of the hidden nuisances when using RPC is data verifiability, which refers to the ability to ensure that the data being exchanged is authentic and accurate. It is a problem that few people think about, but everyone has experienced — in one way or another — and which can have profound consequences.
Because these issues do not generate error messages, it is difficult to catch them. Note that when they do happen, if you’re unclear about the source of the issue, you can spend hours debugging them — trying to pin down the cause of the undesired outputs of your calls. In the meantime, your application can go haywire — producing unexpected behavior.
Ultimately, this is caused because RPC is accomplished through peer-to-peer connections in the blockchain world, and as such, it is difficult to prove the veracity of the data received. In most cases, such data is not stored on chain, but metadata is. Therefore, in a majority of cases, only the metadata of a remote call can be verified.
Metadata, in this context, refers to data that provides information about the structure and properties of the actual data being exchanged. Most commonly, this includes a basic “who, what, and when” that excludes the content of the request.
Our fren Bajali points out this is accomplished primarily by recording three factors on-chain: 1) digital signature of the authorizing parties, 2) a hash of high-visibility data exchanged, and 3) timestamp of the transaction. While metadata can provide some level of confidence in the data’s integrity, it alone is often not sufficient to ensure that the exchanged data itself has not been tampered with. The issue of data verifiability remains.
So what happens if data is inaccurate, you ask? Inaccurate data can be the cause of a ton of issues:
Beyond avoiding the pitfalls listed above, there are second-order, ideological reasons why verifying data can be important:
One of the core principles of Web3 is the concept of trustlessness, which means that users and participants in the network do not need to rely on a central authority or trust any single entity in order to operate. Data verifiability plays a key role in ensuring trustlessness, as it allows users to independently validate the authenticity and accuracy of the data being exchanged within the network. This empowers users to have confidence in the system without having to rely on a centralized governing entity.
In a decentralized ecosystem like Web3, ensuring the security of data is of utmost importance, as it forms the basis of transactions, smart contracts, and other interactions between users and applications. Beyond this, users will not use an application that appears risky, has known exploits, or leaks critical data. Verifying data helps protect against malicious actors who may attempt to manipulate or tamper with the data to gain undue advantage, compromise the network, or conduct fraudulent activities.
Web3 envisions a highly interconnected and interoperable landscape of decentralized applications, platforms, and protocols. Verifying data is crucial in enabling seamless communication and collaboration between different components in the Web3 ecosystem. Ensuring data verifiability allows developers to build reliable applications and services that can exchange data across various networks and platforms without compromising the integrity and accuracy of the information. This promotes greater innovation, efficiency, and user experience in the Web3 space.
One last point — as we move towards a world increasingly dominated by artificial intelligence (AI), data verifiability will become even more critical. Some have even said AI without verifiable data will be disastrous. Undoubtedly, the performance and reliability of AI systems depends on the quality and authenticity of the data they process. Ensuring data verifiability in AI-driven systems will be crucial to prevent biases, misinformation, and manipulation. Furthermore, lack of data accuracy can have profound impact on independent actors in monetized systems! That is to say, on web3 users in cryptonetworks!
Moreover, one must consider how profoundly different a world which has powerful AI will be from a data perspective. Basic tasks such as data entry and dummy data generation will be performed at unprecedented speeds. The sheer amount of data in interoperable systems will unavoidably increase. This creates new vectors for attack and new considerations for verifiability — if, for example, AI is clever enough to generate patterned data which appears “correct” to existing systems of verification. On the other foot, one must also consider that AI can be a valid tool for defense and can create more sophisticated visions of data verification technologies.
All this to say, there is a unique maelstrom of potential unleashed the moment AI meets trustless, permissionless decentralized networks and does its thing! Data verifiability, a mere nuisance now, will be significantly more important as time continues — and our methods for doing so in web3 will need to be cheaper, more robust and more performant!
That’s all for now… 💪🏿 In an upcoming article, we will delve deeper into the challenges and potential solutions for data verifiability in the age of AI. We will explore how cutting-edge technologies such as homomorphic encryption, ZeroKnowledge proofs, and other techniques can contribute to improving data verification. Sleep tight and don’t trust, verify 😉.
KagemniKarimu is current Developer Relations Engineer for Lava Network and former Developer Relations at Skynet Labs. He’s a self-proclaimed Rubyist, new Rust learner, and friendly Web3 enthusiast who entertains all conversations about tech. Follow him on Twitter or say hi to him on Lava’s Discord where he can be found lurking.
Lava is a decentralized network of top-tier API providers, where developers make one subscription to access any blockchain. Providers are rewarded for their quality of service, so your users can fetch data and send transactions with maximum speed, data integrity and uptime. Pairings are randomized, meaning your users can make make queries or transact in privacy.
We help developers build web3-native apps on any chain, while giving users the best possible experience.