VPS Guide

Why 99.9% Uptime Is Misleading

99.9% uptime is 8.7 hours of permitted downtime per year — a number that appears in almost no marketing material, because the percentage sounds better than the hours.

Overview

99.9% uptime means 0.1% downtime. In a year, 0.1% is 8.76 hours. In a month, it's 43.8 minutes. These hours can be distributed across hundreds of brief incidents or consumed in a single extended outage — the SLA treats them identically. A provider that delivers 99.9% by having two four-hour outages per year and a provider that delivers 99.9% by having 525 one-minute outages per year both meet the same contractual commitment. The user's experience of those two patterns is not equivalent.

How to think about it

The uptime percentage is an average over a measurement period. It collapses the timing, duration, and distribution of downtime events into a single number that loses all of that information. The percentage doesn't tell you when the downtime occurred — whether it hit during peak traffic or at 3am on a Sunday. It doesn't tell you how concentrated the downtime was — one long event or many short ones. It doesn't tell you whether the downtime was scheduled or unexpected, partially degraded or completely unavailable.

Two services with identical uptime percentages can have dramatically different operational profiles. The percentage is a compliance metric. It describes whether the provider met its contractual minimum. It is not a reliable predictor of the reliability experience for any specific workload.

How it works

Providers calculate uptime by monitoring their infrastructure at defined intervals — often every minute or every five minutes — and checking whether the server responds to a network probe. Downtime is counted only for periods where the check fails continuously. A 30-second outage that falls between two 5-minute checks may not count against the SLA at all. The monitoring interval determines the resolution of the uptime measurement, and most SLA calculations are coarser than users assume.

Scheduled maintenance is excluded from most uptime calculations. The hours during which a provider takes infrastructure offline for planned upgrades don't count as downtime under the SLA. For providers that schedule frequent maintenance windows, this exclusion is significant — the actual availability experienced by users may be meaningfully lower than the advertised SLA percentage.

Partial availability is rarely addressed by SLA frameworks. A server that is reachable but serving requests 10x slower than normal, or that is reachable on port 22 but not port 443, often passes the uptime check. The SLA says the server is up. The application is unavailable. These failure modes — common in practice — exist in a space the uptime SLA doesn't measure.

Where it breaks

The most uncomfortable version of this: a provider meets its 99.9% uptime SLA while delivering a service that is functionally unavailable during every business-hours traffic peak due to overprovisioning. The server passes the ping check. Response times are five seconds. Users leave. The SLA credit, if claimed, covers a fraction of the monthly compute fee. The damage is not covered.

99.9% is not a bad guarantee. It is a misunderstood one. The number describes one narrow property of the service — network reachability — and is routinely treated as a comprehensive reliability commitment. It isn't.

In context

Historical incident records are more informative than SLA percentages. Most providers publish status pages with incident history. Reading that history — frequency of incidents, duration, how incidents are communicated, how long resolution takes — gives a more accurate picture of what reliability looks like in practice than the advertised SLA number. A provider with a 99.9% SLA and a clean incident history is more reliable than a provider with a 99.99% SLA and frequent partial outages that don't technically violate the guarantee.

Infrastructure redundancy matters more than SLA tier for applications with serious availability requirements. A single-server VPS at any SLA tier has a single point of failure. When that point fails, the application is down regardless of whether the failure counts against the SLA. Load-balanced architectures across multiple availability zones eliminate the single point of failure. The provider's SLA stops being the ceiling of achievable availability — the architecture is.

From understanding to decision

Instead of comparing SLA percentages: how often does this provider have incidents in their status page history? How long do incidents typically last? Are scheduled maintenance windows frequent? Does the provider's definition of 'downtime' align with what would actually affect the application? These questions are answerable from public information and are more predictive of the actual reliability experience than the SLA number.

If real availability — not just SLA compliance — is the requirement→If degraded performance under load is as consequential as full downtime→If architectural redundancy is the path to availability rather than single-server SLA→

How uptime SLAs work and what they actually measure→What causes VPS downtime and how failure modes propagate→Building availability that doesn't depend on a single server's uptime→UpCloud vs Vultr — infrastructure reliability records compared→