Skip to content
☁️ HitKeep Cloud is coming! Join the Early Access waitlist →

Data Retention and Archiving

You decide how long your analytics data lives — not a cloud vendor’s pricing tier. HitKeep’s retention system follows one rule: data you choose to prune is archived to Parquet first, in an open format you own, before it’s removed from the live database.

Terminal window
# Keep raw hits and events for 365 days; archive older data to /var/lib/hitkeep/archive
export HITKEEP_DATA_RETENTION_DAYS=365
export HITKEEP_ARCHIVE_PATH=/var/lib/hitkeep/archive
./hitkeep

Or as startup flags:

Terminal window
./hitkeep -retention-days=365 -archive-path=/var/lib/hitkeep/archive

See the Configuration Reference for all options.

The retention worker runs once daily. For each site with a configured retention policy it will:

  1. Count hits and events older than the retention threshold.
  2. Export those rows to a compressed Parquet file in the archive directory — before touching the live database.
  3. Prune the archived records from hitkeep.db to reclaim disk space.
  4. Leave rollups intact. Aggregated hourly, daily, and monthly rollups are never pruned — they power the trend charts in the dashboard indefinitely.

The result is two data tiers:

TierLocationWhat’s thereQuery speed
Hothitkeep.dbRecent raw hits & events within the retention windowInstant
ColdArchive directoryOlder raw hits & events exported to ParquetFast (file scan)

Dashboard trend views always show complete historical data because rollups remain in the hot database regardless of how the raw-data retention is configured.

Different sites have different requirements. Override the default retention window per site via the API:

Terminal window
curl -X PUT https://your-hitkeep.example/api/sites/{site_id}/retention \
-H "Content-Type: application/json" \
-b "hk_token=YOUR_SESSION_COOKIE" \
-d '{"days": 90}'

A high-traffic site may need only 90 days of raw data. A site subject to statutory record-keeping requirements may need seven years. You set the policy; HitKeep enforces it.

Archived Parquet files are standard open-format files queryable with any compatible tool — no HitKeep license required.

Terminal window
# DuckDB CLI — count page views per month from the archive
duckdb -c "
SELECT date_trunc('month', timestamp) AS month, count(*) AS hits
FROM read_parquet('/var/lib/hitkeep/archive/site_*.parquet')
GROUP BY 1 ORDER BY 1;
"
Terminal window
# Merge hot and cold data in a single query
duckdb -c "
ATTACH 'hitkeep.db' AS hot;
SELECT timestamp::date AS day, count(*) AS hits
FROM (
SELECT timestamp FROM hot.hits WHERE site_id = 'your-site-id'
UNION ALL
SELECT timestamp FROM read_parquet('/var/lib/hitkeep/archive/site_your-site-id_*.parquet')
)
GROUP BY 1 ORDER BY 1;
"

The archive naming convention is site_{site_id}_{unix_timestamp}.parquet. Each archival run writes one file per site that had data past the cutoff.

The complete HitKeep data footprint is two paths:

  • Live database: hitkeep.db (the DuckDB file)
  • Archive: the configured archive directory (Parquet files)

A reliable backup is a periodic file copy of both:

Terminal window
# Example: nightly sync to S3-compatible storage with rclone
rclone copy /var/lib/hitkeep/hitkeep.db remote:my-bucket/hitkeep/live/
rclone sync /var/lib/hitkeep/archive/ remote:my-bucket/hitkeep/archive/
Terminal window
# Or with rsync to a remote host
rsync -az /var/lib/hitkeep/ backup-host:/backups/hitkeep/

Because hitkeep.db is a single file, you can also use filesystem-level snapshots (LVM, ZFS, APFS) for point-in-time consistency.

HitKeep Cloud manages retention policies, automated Parquet archiving, and encrypted off-site backups automatically — in your sovereign region (EU Frankfurt or US Virginia). Join the waitlist →