VPS Guide

Backups and Recovery Strategies on VPS

A backup strategy is not a backup — it is the combination of a backup and a tested recovery process, and the second half is where most strategies fail.

Overview

A server is lost. The team goes to restore. The backup files exist — a directory full of .tar.gz files with dates in the names, created by the automated backup script someone set up at launch. The oldest one is nine months old and works. The newest one is from last Tuesday and is corrupt. Every backup between those two dates is missing or corrupted because the backup disk ran out of space three months ago and the backup job started failing silently. Nine months of data, gone. The backup system was running. It had not been working.

How to think about it

A backup is not the script or tool that creates archive files. It is the complete system: the creation process, the verification that created archives are valid, the transfer to a location that survives the failure scenario it's protecting against, the retention policy that keeps enough history without accumulating indefinitely, and the tested recovery process that proves the backup can be used. A backup system with any of these components missing is not a backup system — it is a backup job with gaps that will surface at the worst possible time.

How it works

Backup scope is the first decision: what needs to be backed up. Application databases are usually the most critical — they contain data that cannot be reconstructed. Application files may be reconstructible from version control if the deployment process is reproducible. Configuration files need to be backed up if they're managed on the server rather than in version control. Server snapshots capture everything but are large and expensive to store frequently. Understanding what is irreplaceable determines what needs to be in the backup.

Backup destination must survive the failure scenario the backup is protecting against. A backup on the same disk as the data it protects doesn't survive disk failure. A backup on the same server doesn't survive server compromise. A backup at the same provider doesn't survive provider-level failures — rare, but not theoretical. The 3-2-1 rule provides a baseline: three copies of the data, on two different media types, with one copy offsite. For VPS, this typically means: the live data, a backup at the provider (snapshots), and a backup at a different location (object storage at a different provider, local storage, or another cloud).

Backup frequency determines the recovery point objective — the maximum amount of data that can be lost in the worst case. Daily backups mean up to 24 hours of data loss. Hourly database backups mean up to one hour. The right frequency depends on how quickly data changes and how much loss is acceptable. A blog that publishes once a week tolerates daily backups. A transactional database processing hundreds of records per hour does not.

Recovery testing is the step most backup strategies skip and the one that makes all the others meaningful. A backup that has never been successfully restored to a working state is a file that looks like a backup. Testing the restore process — on a non-production server, on a schedule, with verification that the restored state is correct — is the only way to confirm that the backup system works before it's needed for an actual recovery.

Where it breaks

Backup jobs fail silently more often than users expect. Disk quotas fill and the backup job's output has nowhere to go. Permissions change and the backup user can no longer read the database. The backup destination's API credentials expire. None of these send an error to anyone unless monitoring is configured to check that backup jobs completed successfully. Checking that the last backup completed is a different monitoring task from checking that the server is up — and it is skipped far more often.

In context

Provider-managed backups — snapshots offered as a paid add-on by the VPS provider — are convenient and reliable for the failure scenarios they address: server corruption, accidental deletion, failed deployments. They restore the entire server state quickly through the provider's interface. What they don't address is data corruption that occurred before the most recent snapshot, loss of data granularity for large databases, or the scenario where the provider itself has a problem. Provider backups should be treated as one layer, not the complete strategy.

Self-managed backups give full control over what is backed up, how frequently, and where it goes. Database dumps to object storage at a different provider, encrypted and retained for 30 days with daily incremental updates, is a backup strategy that provider snapshots don't replicate. What self-managed backups require is the operational discipline to maintain the backup system — the script that runs, the monitoring that verifies it ran, the periodic restore test that confirms it works. This discipline is not automatic.

The combination of both — provider snapshots for fast server-level recovery, plus self-managed database exports to external storage for granular data recovery — provides coverage for different failure scenarios at modest additional cost. For production systems where data has real value, this layered approach is the appropriate investment. For development servers and low-stakes applications, provider snapshots alone may be sufficient.

From understanding to decision

What data is irreplaceable? How much of it can be lost in the worst case? What failure scenarios does the backup need to survive? And — the question that defines whether the backup is real — when was the last successful restore test? Answering these four questions produces a backup strategy. Having a backup script that runs nightly without answering them is a starting point that may or may not be adequate.

If data loss has direct business consequences and the backup strategy needs to match→If setting up a first backup system on a new VPS is the immediate task→If backup automation and recovery testing are part of a broader infrastructure-as-code approach→

Redundancy vs backups — two different problems requiring different solutions→The failure modes that backups are designed to recover from→Monitoring backup job completion as part of ongoing maintenance→DigitalOcean vs UpCloud — backup and snapshot options compared→Liquid Web vs Cloudways — managed backup coverage compared→