Data Retention and Archiving
You decide how long your analytics data lives — not a cloud vendor’s pricing tier. HitKeep’s retention system follows one rule: data you choose to prune is archived to Parquet first, in an open format you own, before it’s removed from the live database.
Quick Start
Section titled “Quick Start”# Keep raw hits and events for 365 days; archive older data to /var/lib/hitkeep/archiveexport HITKEEP_DATA_RETENTION_DAYS=365export HITKEEP_ARCHIVE_PATH=/var/lib/hitkeep/archive
./hitkeepOr as startup flags:
./hitkeep -retention-days=365 -archive-path=/var/lib/hitkeep/archiveSee the Configuration Reference for all options.
How It Works
Section titled “How It Works”The retention worker runs once daily. For each site with a configured retention policy it will:
- Count hits and events older than the retention threshold.
- Export those rows to a compressed Parquet file in the archive directory — before touching the live database.
- Prune the archived records from
hitkeep.dbto reclaim disk space. - Leave rollups intact. Aggregated hourly, daily, and monthly rollups are never pruned — they power the trend charts in the dashboard indefinitely.
flowchart TD
subgraph RetentionWorker["Retention Worker (daily)"]
R1[Load site retention policies] --> R2{Hits/events older\nthan cutoff?}
R2 -->|No| R3[Skip site]
R2 -->|Yes| R4["COPY ... TO (FORMAT PARQUET)"]
R4 --> R5[DELETE archived rows]
R5 --> R6[Rollups untouched]
end
subgraph Storage
HOT[(DuckDB — hitkeep.db\nHot tier)]
COLD[("Archive directory\nCold tier — Parquet")]
end
R1 -.->|query| HOT
R4 -->|export| COLD
R5 -->|prune| HOT
The result is two data tiers:
| Tier | Location | What’s there | Query speed |
|---|---|---|---|
| Hot | hitkeep.db | Recent raw hits & events within the retention window | Instant |
| Cold | Archive directory | Older raw hits & events exported to Parquet | Fast (file scan) |
Dashboard trend views always show complete historical data because rollups remain in the hot database regardless of how the raw-data retention is configured.
Per-Site Overrides
Section titled “Per-Site Overrides”Different sites have different requirements. Override the default retention window per site via the API:
curl -X PUT https://your-hitkeep.example/api/sites/{site_id}/retention \ -H "Content-Type: application/json" \ -b "hk_token=YOUR_SESSION_COOKIE" \ -d '{"days": 90}'A high-traffic site may need only 90 days of raw data. A site subject to statutory record-keeping requirements may need seven years. You set the policy; HitKeep enforces it.
Archive Destination Overview
Section titled “Archive Destination Overview”The retention worker and backup worker can write to local disk or any S3-compatible object store. Both share the same S3 credential configuration.
flowchart LR
subgraph HitKeep["HitKeep (Leader Node)"]
RW[Retention Worker]
BW[Backup Worker]
end
subgraph Local["Local Filesystem"]
LA["/var/lib/hitkeep/archive/"]
LB["/var/lib/hitkeep/backups/"]
end
subgraph S3["S3-Compatible Storage"]
SA["s3://bucket/archive/"]
SB["s3://bucket/backups/"]
end
RW -->|"HITKEEP_ARCHIVE_PATH\n(local)"| LA
RW -->|"HITKEEP_ARCHIVE_PATH\n(s3://)"| SA
BW -->|"HITKEEP_BACKUP_PATH\n(local)"| LB
BW -->|"HITKEEP_BACKUP_PATH\n(s3://)"| SB
Archiving to S3
Section titled “Archiving to S3”Instead of writing Parquet files to a local directory, HitKeep can archive directly to any S3-compatible object store. Set HITKEEP_ARCHIVE_PATH to an s3:// URL and configure credentials.
AWS S3 with Static Keys
Section titled “AWS S3 with Static Keys”export HITKEEP_ARCHIVE_PATH=s3://my-analytics-bucket/hitkeep/archiveexport HITKEEP_S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLEexport HITKEEP_S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYexport HITKEEP_S3_REGION=eu-west-1
./hitkeepAWS S3 with IAM Role (Credential Chain)
Section titled “AWS S3 with IAM Role (Credential Chain)”On EC2, ECS, or Lambda with an attached IAM role, no explicit keys are needed. HitKeep falls back to the AWS SDK default credential chain automatically.
export HITKEEP_ARCHIVE_PATH=s3://my-analytics-bucket/hitkeep/archiveexport HITKEEP_S3_REGION=eu-west-1
./hitkeep# Logs: "S3 archive enabled" mode="credential chain" region="eu-west-1"MinIO (Custom Endpoint)
Section titled “MinIO (Custom Endpoint)”export HITKEEP_ARCHIVE_PATH=s3://hitkeep-archive/dataexport HITKEEP_S3_ACCESS_KEY_ID=minioadminexport HITKEEP_S3_SECRET_ACCESS_KEY=minioadminexport HITKEEP_S3_ENDPOINT=localhost:9000export HITKEEP_S3_URL_STYLE=pathexport HITKEEP_S3_USE_SSL=falseexport HITKEEP_S3_REGION=us-east-1
./hitkeepCloudflare R2
Section titled “Cloudflare R2”export HITKEEP_ARCHIVE_PATH=s3://my-r2-bucket/hitkeep/archiveexport HITKEEP_S3_ACCESS_KEY_ID=your-r2-access-keyexport HITKEEP_S3_SECRET_ACCESS_KEY=your-r2-secret-keyexport HITKEEP_S3_ENDPOINT=your-account-id.r2.cloudflarestorage.comexport HITKEEP_S3_REGION=auto
./hitkeepSee the Configuration Reference for the full list of S3 settings.
Querying Cold Data
Section titled “Querying Cold Data”Archived Parquet files are standard open-format files queryable with any compatible tool — no HitKeep license required.
flowchart LR
subgraph Query["DuckDB CLI or any Parquet tool"]
Q["SELECT ... FROM"]
end
subgraph Hot["Hot Tier"]
DB[("hitkeep.db\nhits / events")]
end
subgraph Cold["Cold Tier"]
PQ[("archive/\nsite_*.parquet")]
end
Q -->|"ATTACH 'hitkeep.db'"| DB
Q -->|"read_parquet('...')"| PQ
DB -->|"UNION ALL"| RESULT["Combined\nresult set"]
PQ -->|"UNION ALL"| RESULT
# DuckDB CLI — count page views per month from the archiveduckdb -c " SELECT date_trunc('month', timestamp) AS month, count(*) AS hits FROM read_parquet('/var/lib/hitkeep/archive/site_*.parquet') GROUP BY 1 ORDER BY 1;"# Merge hot and cold data in a single queryduckdb -c " ATTACH 'hitkeep.db' AS hot; SELECT timestamp::date AS day, count(*) AS hits FROM ( SELECT timestamp FROM hot.hits WHERE site_id = 'your-site-id' UNION ALL SELECT timestamp FROM read_parquet('/var/lib/hitkeep/archive/site_your-site-id_*.parquet') ) GROUP BY 1 ORDER BY 1;"The archive naming convention is site_{site_id}_{unix_timestamp}.parquet. Each archival run writes one file per site that had data past the cutoff.
Database Backups
Section titled “Database Backups”HitKeep includes a built-in backup worker that periodically exports all live databases to Parquet snapshots using DuckDB’s EXPORT DATABASE. This covers both the shared hitkeep.db and any per-tenant databases.
For a dedicated operational guide, see Backups and Restore and S3 Backups.
Backup Worker Lifecycle
Section titled “Backup Worker Lifecycle”flowchart TD
START([HitKeep starts]) --> CHECK{HITKEEP_BACKUP_PATH\nset?}
CHECK -->|No| DISABLED[Backups disabled — no-op]
CHECK -->|Yes| INIT["Wait 30 seconds\n(let DB settle)"]
INIT --> RUN
subgraph RUN["Run Backup Cycle"]
direction TB
S3{S3 path?}
S3 -->|Yes| HTTPFS["Load httpfs +\nconfigure S3 secret"]
S3 -->|No| CHECKPOINT
HTTPFS --> CHECKPOINT["CHECKPOINT shared DB"]
CHECKPOINT --> EXPORT_SHARED["EXPORT DATABASE\nshared → {path}/shared/{timestamp}/"]
EXPORT_SHARED --> TENANTS["List non-default tenant IDs"]
TENANTS --> LOOP["For each tenant:\nCHECKPOINT + EXPORT DATABASE\n→ {path}/tenants/{id}/{timestamp}/"]
LOOP --> PRUNE{"Local path?"}
PRUNE -->|Yes| PRUNE_LOCAL["Delete oldest snapshots\nbeyond retention count"]
PRUNE -->|No| PRUNE_S3["Log: use S3 lifecycle policies"]
end
RUN --> WAIT["Sleep backup-interval minutes"]
WAIT --> RUN
Enabling Backups
Section titled “Enabling Backups”# Local backups — every 60 minutes, keep 24 snapshotsexport HITKEEP_BACKUP_PATH=/var/lib/hitkeep/backups
./hitkeep# Logs: "Local backup enabled" path="/var/lib/hitkeep/backups" interval_min=60 retention=24# S3 backups — every 30 minutes, keep 48 snapshotsexport HITKEEP_BACKUP_PATH=s3://my-bucket/hitkeep/backupsexport HITKEEP_BACKUP_INTERVAL=30export HITKEEP_BACKUP_RETENTION=48
./hitkeepThe worker runs on the leader node only. The first backup is taken 30 seconds after startup, then at the configured interval.
Backup Layout
Section titled “Backup Layout”Each backup is a timestamped directory containing the output of DuckDB’s EXPORT DATABASE (a schema.sql file plus Parquet data files for each table).
graph TD
subgraph BackupPath["{backup-path}/"]
subgraph Shared["shared/"]
S1["2026-03-02T120000Z/\nschema.sql + *.parquet"]
S2["2026-03-02T130000Z/\nschema.sql + *.parquet"]
end
subgraph Tenants["tenants/"]
subgraph T1["{tenant-id-1}/"]
T1S1["2026-03-02T120000Z/\nschema.sql + *.parquet"]
T1S2["2026-03-02T130000Z/\nschema.sql + *.parquet"]
end
subgraph T2["{tenant-id-2}/"]
T2S1["2026-03-02T120000Z/\nschema.sql + *.parquet"]
T2S2["2026-03-02T130000Z/\nschema.sql + *.parquet"]
end
end
end
subgraph Sources["Live Databases"]
MAIN[("hitkeep.db\n(shared — identity,\nconfig, default tenant)")]
TD1[("tenants/{id-1}/hitkeep.db")]
TD2[("tenants/{id-2}/hitkeep.db")]
end
MAIN -.->|EXPORT DATABASE| Shared
TD1 -.->|EXPORT DATABASE| T1
TD2 -.->|EXPORT DATABASE| T2
Or as a directory tree:
{backup-path}/├── shared/│ ├── 2026-03-02T120000Z/ ← schema.sql + *.parquet│ └── 2026-03-02T130000Z/└── tenants/ └── {tenant-id}/ ├── 2026-03-02T120000Z/ └── 2026-03-02T130000Z/Snapshot Pruning
Section titled “Snapshot Pruning”For local backups, snapshots beyond the retention count are automatically deleted (oldest first). For S3 backups, configure S3 lifecycle policies on your bucket to manage snapshot retention.
Restoring from a Backup
Section titled “Restoring from a Backup”Use the hitkeep recover restore-backup command to import a snapshot into fresh databases. HitKeep must be stopped before running a restore (DuckDB allows only one writer at a time).
flowchart TD
CMD["hitkeep recover restore-backup\n-from {path} [-snapshot {ts}]"] --> SRC{Source type?}
SRC -->|Local| FIND["Find latest snapshot\n(lexicographic sort)"]
SRC -->|S3| REQ["-snapshot required"]
FIND --> SUMMARY
REQ --> SUMMARY["Print summary:\nsource, target, tenants"]
SUMMARY --> CONFIRM{"-yes flag or\nuser confirms?"}
CONFIRM -->|No| ABORT([Aborted])
CONFIRM -->|Yes| SHARED
subgraph SHARED["Restore Shared DB"]
SH1{"Existing\nhitkeep.db?"}
SH1 -->|Yes| SH2["Rename → .pre-restore.{ts}"]
SH1 -->|No| SH3
SH2 --> SH3["Open fresh empty DuckDB"]
SH3 --> SH4["IMPORT DATABASE\nfrom shared/{snapshot}/"]
end
SHARED --> DISC["Discover tenant backups\nunder tenants/*/"]
subgraph TENANT["For Each Tenant"]
T1{"Existing tenant\nhitkeep.db?"}
T1 -->|Yes| T2["Rename → .pre-restore.{ts}"]
T1 -->|No| T3
T2 --> T3["Open fresh empty DuckDB"]
T3 --> T4["IMPORT DATABASE\nfrom tenants/{id}/{snapshot}/"]
end
DISC --> TENANT
TENANT --> DONE(["Restore complete\n→ start hitkeep normally"])
# Restore the latest local snapshot./hitkeep recover restore-backup \ -from /var/lib/hitkeep/backups \ -yes# Restore a specific snapshot./hitkeep recover restore-backup \ -from /var/lib/hitkeep/backups \ -snapshot 2026-03-02T120000Z \ -db /var/lib/hitkeep/data/hitkeep.db \ -data-path /var/lib/hitkeep/data \ -yes# Restore from S3 (snapshot timestamp required)./hitkeep recover restore-backup \ -from s3://my-bucket/hitkeep/backups \ -snapshot 2026-03-02T120000Z \ -yesThe restore process:
- Finds the requested snapshot (or the latest one for local sources).
- Renames existing database files as a safety net (
.pre-restore.{timestamp}). - Imports the snapshot into temporary DuckDB files, checkpoints them, and only then promotes them into place.
- Discovers and restores any tenant databases from the backup.
On the next normal hitkeep startup, the migration system will apply any schema changes if HitKeep has been upgraded since the backup was taken.
See the Configuration Reference for all backup settings.
Backup Strategy (Manual)
Section titled “Backup Strategy (Manual)”For users who prefer external tooling, the complete HitKeep data footprint is:
- Live data tree: the full
data-pathdirectory, including the shared control plane and any tenant-local databases - Archive: the configured archive directory (Parquet files)
A reliable backup is a periodic file copy of both:
# Example: nightly sync to S3-compatible storage with rclonerclone sync /var/lib/hitkeep/data/ remote:my-bucket/hitkeep/data/rclone sync /var/lib/hitkeep/archive/ remote:my-bucket/hitkeep/archive/# Or with rsync to a remote hostrsync -az /var/lib/hitkeep/ backup-host:/backups/hitkeep/Because HitKeep stores its live state on the local filesystem, you can also use filesystem-level snapshots (LVM, ZFS, APFS) for point-in-time consistency — but snapshot the full data-path, not just the root database file.
Complete Data Lifecycle
Section titled “Complete Data Lifecycle”The following diagram shows how data moves through HitKeep — from ingestion to hot storage, through retention archiving to cold storage, and how backups and restores fit into the picture.
flowchart TB
subgraph Ingestion
BROWSER["Browser / Server"] -->|"HTTP POST /ingest"| HTTP["Go HTTP Server"]
HTTP -->|publish| NSQ{{"Embedded NSQ"}}
NSQ -->|"consume batches"| WORKER["Ingest Worker"]
end
WORKER -->|INSERT| HOT
subgraph HotStorage["Hot Storage (DuckDB)"]
HOT[("hitkeep.db\nShared DB")]
TENANT_DB[("tenants/{id}/hitkeep.db\nPer-tenant DBs")]
end
subgraph Workers["Background Workers (Leader only)"]
direction LR
RETENTION["Retention Worker\n(daily)"]
BACKUP["Backup Worker\n(configurable interval)"]
ROLLUP["Rollup Worker"]
end
HOT --- Workers
subgraph ColdStorage["Cold Storage"]
ARCHIVE[("Archive\nParquet per-site\nretention exports")]
SNAPSHOTS[("Backups\nParquet snapshots\nfull DB exports")]
end
RETENTION -->|"COPY ... TO\n(FORMAT PARQUET)"| ARCHIVE
RETENTION -->|"DELETE\npruned rows"| HOT
BACKUP -->|"EXPORT DATABASE\nshared + all tenants"| SNAPSHOTS
subgraph Destinations["Storage Destinations"]
LOCAL["Local Filesystem"]
S3["S3 / MinIO / R2"]
end
ARCHIVE --> LOCAL
ARCHIVE --> S3
SNAPSHOTS --> LOCAL
SNAPSHOTS --> S3
subgraph Recovery["Disaster Recovery"]
RESTORE["hitkeep recover\nrestore-backup"]
end
SNAPSHOTS -.->|"IMPORT DATABASE"| RESTORE
RESTORE -.->|"recreate"| HOT
RESTORE -.->|"recreate"| TENANT_DB
The key insight: retention archives and database backups serve different purposes. Retention exports are per-site, incremental Parquet files for long-term analytical querying. Database backups are full point-in-time snapshots for disaster recovery. Both can target local disk or S3.
Related
Section titled “Related”HitKeep Cloud manages retention policies, automated Parquet archiving, and encrypted off-site backups automatically — in your sovereign region (EU Frankfurt or US Virginia). Start with HitKeep Cloud →