HitKeep Disaster Recovery Runbook for Operators

Deze inhoud is nog niet vertaald.

Disaster recovery for HitKeep is straightforward if you treat the storage layout correctly.

The most common mistake is restoring only the shared control-plane database while forgetting tenant-local analytics files.

What You Need To Recover

At minimum:

the shared control-plane database in {data-path}/hitkeep.db
all tenant analytics databases in {data-path}/tenants/**
the archive directory if you rely on retention archives for older raw data

If you use built-in backups, these are already exported into snapshot directories. If you use external tooling, your DR plan should capture the same boundary.

Recovery Scenarios

Plan for the failure you are most likely to face:

Scenario	Recovery source	Main risk
Bad deploy	Latest local backup	Restoring a snapshot from before a schema migration
Disk loss	Off-host backup or S3 backup	Missing tenant-local databases
Accidental team deletion	Backup from before purge	Retaining deleted tenant data longer than policy allows
Host migration	Latest verified snapshot	Forgetting archive and asset directories
Region outage	Object storage or external snapshot	Restore time and DNS cutover

Recovery Drill Checklist

Run this periodically on a disposable environment:

Provision an empty host or container.
Restore HitKeep from a recent snapshot.
Start the same HitKeep version, or a newer compatible one.
Log in as an admin.
Validate one default-tenant site.
Validate one non-default team site.
Validate goals, funnels, and ecommerce.
Confirm team membership and team switching still work.
Confirm retention archives are still present if you keep them separately.

If you cannot perform this drill successfully, you do not yet have a reliable recovery process.

Recommended Restore Command

./hitkeep recover restore-backup \
  -from /var/lib/hitkeep/backups \
  -snapshot 2026-03-08T120000Z \
  -db /var/lib/hitkeep/data/hitkeep.db \
  -data-path /var/lib/hitkeep/data \
  -yes

Restore is offline-only. Stop HitKeep before running it.

Team and Archive Lifecycle Considerations

Teams introduce two important operational facts:

archived teams can later be purged physically
tenant analytics may live outside the shared database

That means:

backups taken before a purge may still contain the purged tenant
backups taken after a purge should not
archive retention and backup retention are separate concerns

If you have GDPR or hard-deletion requirements, your DR runbooks should explicitly define how long old snapshots are retained and when they are expired.

What Success Looks Like

A good HitKeep DR posture means:

you know exactly where live data lives
you know exactly where backups are written
you have tested recover restore-backup
you can restore both shared and tenant-local data
you are not depending on replaying a stale WAL to make a restore boot

How often to test

For small self-hosted installs, run a restore drill after changing backup storage or before a major version upgrade. For teams using HitKeep as client or business reporting infrastructure, run a scheduled drill at least quarterly and record the snapshot timestamp, HitKeep version, restore target, and validation result.