tl;dr — By standard tech industry measures of uptime, the availability of common web2 infra currently exceeds that of common web3 infrastructure like RPCs. Even a small difference in downtime is consequential in highly available production-grade systems. Increasing the availability of RPC will improve adoption of web3.
As mentioned in Developer Web3 RPC Woes and The Web3 RPC Problem, unreliable RPC is a major pain point for web3 developers. Unreliable RPC endpoints lead to mission-critical unavailability incidents, poor user experience and usability, and existential-threat-level business risk. There are ample examples in web3 of RPC endpoints pining out or disappearing in the height of need.
Some examples abound:
Web3 is no stranger to these sorts of issues, and RPC is a major contributor to downtime in web3 at the moment. In the next section, we investigate uptime and the difference in scale made by even minute changes in downtime.
We define uptime as the amount of time a server or service stays operational and accessible to users measurable in units of time. downtime is therefore any measurable time that a service becomes offline and inaccessible to users for any reason. availabilityis calculated by subtracting the total amount of downtime from the ideal uptime; availabilityis usually calculated over longer periods of time where it is more observable. availability, here, is the ratio of uptime to downtime, represented as a percentage.
A simple pseudo-code example demonstrates this in terms of one month (or 43,860 minutes). A simple 60 minutes — a mere one hour — of monthly downtime knocks us all the way down below 99.9% availability:
uptime = 43,830
downtime = 60
availability = (43,830 - 60) / 43,830 x 100
availability = 99.863%
Here, we treat 99.9% (AKA “three nines”) as a minimal threshhold — something that we should look for as the bare minimum in any system we call ‘highly available.’ However, the real prize is at 99.999%.
The difference between 99.9% and 99.999% available might seem minimal to our non-computational eyes, but it can have a significant impact over a stretch of time on a highly available system. For instance, 99.9% availability corresponds to approximately 43 minutes of downtime per month, while 99.999% availability translates to only 26 seconds of downtime per month. That is quite different, after all! The compounding effects of an occasionally unavailable system are more apparent when they are projected over an entire year.
Let’s take a look at a simple chart which illustrates this more clearly below:
As we can see, the effects compound quickly. Making a system more available by nine-hundredths of a percent results in a significant reduction in downtime. To put this into perspective, “five nines” (99.999% availability) means an app has likely less than 78 seconds of downtime per quarter. With only a few seconds of downtime every few months, our “five nines” figure is widely considered the hallmark of reliability measures.
Unfortunately, most of web3, and especially web3 RPC has not made it to this enshrined figure, major RPC service providers only offer 99.9% availability in their Service-Level Agreements (SLA). We know that downtime can be caused by hardware failure, infrastructure disruptions, and software glitches caused by configuration errors or bugs. Ironically and very importantly, a great deal of RPC downtime is simply caused by excessive traffic. With Centralized Service Providers offering a 99.9% availability SLA, we’re still off to the races to find a solution which reaches the height of RPC demand; with all this talk of centralization, redundancy, and distributed systems one would dream of a system experiencing zero downtime. It is a dream to be dream… Now, what about the various endpoints that are available publicly all over the place? Don’t they indicate ubiquitous blockchain uptime and availability?
Public RPC endpoints are a beautiful notion. They give an alternative to self-hosting and operation overnight dev ops. They provide ‘good enough’ RPC services to otherwise unserviced developers. They even prove the altruism of the open-source web3 community. Unfortunately, they alone are generally not enough to serve the needs of expanding applications and services. While, centralized RPC service providers offer a usable 99.9% SLA by standard, other public RPC endpoints offer no promise of availability or uptime. Centralization offers some benefit of surety but comes at a cost. And with regards to unpaid Public RPC — unfortunately, as the time-honored Robber-Baron-Age adage goes, “you get what you pay for!”
Public RPC can go down, be overwhelmed, or just up and disappear. In many or most cases, there is no SLA between users and operators. Once an application or service reaches scale beyond hobbyist or testing levels, RPC endpoints often fail due to insufficient resources. Paid, private RPC providers are generally prepared for this load and financially incentivized to upscale their resources, but unpaid, public RPC endpoints must do things to protect themselves — such as rate-limit users. Public RPC is great in many circumstances. However, when a project is production-grade, it requires a high level of uptime and reliability to ensure that it continuously delivers value to its users. Reaching ‘high availability’ is something that Public RPC alone cannot do.
In discussing highly reliable RPC, it’s important that we note here that there are things that node operators and RPC providers can do to ensure maximizing up-time and increasing the availability of their service. Some of the most crucial are catalogued here:
Fortunately, redundant, distributed and decentralized networks such as Lava can ease the pain and awkwardness of implementing all of these by incentivizing independent RPC providers to offer service. It is our vision that the five-nine future is not far away…
KagemniKarimu is current Developer Relations Engineer for Lava Network and former Developer Relations at Skynet Labs. He’s a self-proclaimed Rubyist, new Rust learner, and friendly Web3 enthusiast who entertains all conversations about tech. Follow him on Twitter or say hi to him on Lava’s Discord where he can be found lurking.
Lava is a decentralized network of top-tier API providers, where developers make one subscription to access any blockchain. Providers are rewarded for their quality of service, so your users can fetch data and send transactions with maximum speed, data integrity and uptime. Pairings are randomized, meaning your users can make make queries or transact in privacy.
We help developers build web3-native apps on any chain, while giving users the best possible experience.