Data Retention and Archiving
You decide how long your analytics data lives — not a cloud vendor’s pricing tier. HitKeep’s retention system follows one rule: data you choose to prune is archived to Parquet first, in an open format you own, before it’s removed from the live database.
Quick Start
Section titled “Quick Start”# Keep raw hits and events for 365 days; archive older data to /var/lib/hitkeep/archiveexport HITKEEP_DATA_RETENTION_DAYS=365export HITKEEP_ARCHIVE_PATH=/var/lib/hitkeep/archive
./hitkeepOr as startup flags:
./hitkeep -retention-days=365 -archive-path=/var/lib/hitkeep/archiveSee the Configuration Reference for all options.
How It Works
Section titled “How It Works”The retention worker runs once daily. For each site with a configured retention policy it will:
- Count hits and events older than the retention threshold.
- Export those rows to a compressed Parquet file in the archive directory — before touching the live database.
- Prune the archived records from
hitkeep.dbto reclaim disk space. - Leave rollups intact. Aggregated hourly, daily, and monthly rollups are never pruned — they power the trend charts in the dashboard indefinitely.
The result is two data tiers:
| Tier | Location | What’s there | Query speed |
|---|---|---|---|
| Hot | hitkeep.db | Recent raw hits & events within the retention window | Instant |
| Cold | Archive directory | Older raw hits & events exported to Parquet | Fast (file scan) |
Dashboard trend views always show complete historical data because rollups remain in the hot database regardless of how the raw-data retention is configured.
Per-Site Overrides
Section titled “Per-Site Overrides”Different sites have different requirements. Override the default retention window per site via the API:
curl -X PUT https://your-hitkeep.example/api/sites/{site_id}/retention \ -H "Content-Type: application/json" \ -b "hk_token=YOUR_SESSION_COOKIE" \ -d '{"days": 90}'A high-traffic site may need only 90 days of raw data. A site subject to statutory record-keeping requirements may need seven years. You set the policy; HitKeep enforces it.
Querying Cold Data
Section titled “Querying Cold Data”Archived Parquet files are standard open-format files queryable with any compatible tool — no HitKeep license required.
# DuckDB CLI — count page views per month from the archiveduckdb -c " SELECT date_trunc('month', timestamp) AS month, count(*) AS hits FROM read_parquet('/var/lib/hitkeep/archive/site_*.parquet') GROUP BY 1 ORDER BY 1;"# Merge hot and cold data in a single queryduckdb -c " ATTACH 'hitkeep.db' AS hot; SELECT timestamp::date AS day, count(*) AS hits FROM ( SELECT timestamp FROM hot.hits WHERE site_id = 'your-site-id' UNION ALL SELECT timestamp FROM read_parquet('/var/lib/hitkeep/archive/site_your-site-id_*.parquet') ) GROUP BY 1 ORDER BY 1;"The archive naming convention is site_{site_id}_{unix_timestamp}.parquet. Each archival run writes one file per site that had data past the cutoff.
Backup Strategy
Section titled “Backup Strategy”The complete HitKeep data footprint is two paths:
- Live database:
hitkeep.db(the DuckDB file) - Archive: the configured archive directory (Parquet files)
A reliable backup is a periodic file copy of both:
# Example: nightly sync to S3-compatible storage with rclonerclone copy /var/lib/hitkeep/hitkeep.db remote:my-bucket/hitkeep/live/rclone sync /var/lib/hitkeep/archive/ remote:my-bucket/hitkeep/archive/# Or with rsync to a remote hostrsync -az /var/lib/hitkeep/ backup-host:/backups/hitkeep/Because hitkeep.db is a single file, you can also use filesystem-level snapshots (LVM, ZFS, APFS) for point-in-time consistency.
Related
Section titled “Related”HitKeep Cloud manages retention policies, automated Parquet archiving, and encrypted off-site backups automatically — in your sovereign region (EU Frankfurt or US Virginia). Join the waitlist →