Architecture
HitKeep is designed to give you full control over every layer of the analytics stack — without requiring you to operate multiple services. The entire pipeline from HTTP ingest to dashboard query is embedded in a single Go binary. Nothing runs outside it; nothing phones home.
High-Level Data Flow
Section titled “High-Level Data Flow”Data flows through four stages, all inside the single process:
- Ingestion: The HTTP server receives a pageview or event from the tracking script.
- Buffering: The hit is published to an in-memory NSQ queue, decoupling the HTTP response from the write path.
- Processing: Internal consumers read from the queue concurrently, group messages by resolved tenant store, and build short-lived micro-batches.
- Storage: Each store-local batch is written to embedded DuckDB through the appender API, then the queued messages are acknowledged.
graph LR
A[Browser / Server] -->|HTTP POST /ingest| B(Go HTTP Server)
B -->|Publish| C{Embedded NSQ}
C -->|Consume concurrently| D[Ingest Workers]
D -->|Resolve tenant| F{Tenant Store Manager}
F -->|Group by store| G[Micro-batch Builder]
G -->|Appender flush| E[(DuckDB — tenant data plane)]
The tracking script (hk.js) is also served from your instance — no CDN, no third-party domain.
System Boundaries
Section titled “System Boundaries”The easiest way to understand HitKeep today is to separate it into four concerns:
- Public ingest and API surface
- Shared control plane
- Tenant-local analytics data plane
- Background lifecycle workers
graph TD
Browser["Browser / Server Tracker"] -->|POST /ingest\nPOST /ingest/event| HTTP["Go HTTP Server"]
Dashboard["Angular Dashboard"] -->|JSON REST API| HTTP
APIClients["Personal + Team API Clients"] -->|Bearer token| HTTP
HTTP --> Auth["Auth + Rate Limit + Permission Layer"]
Auth --> Control["Shared control plane\nhitkeep.db"]
Auth --> Resolver["Tenant Store Manager"]
Resolver --> DefaultDB["Default tenant data plane\n(shared hitkeep.db)"]
Resolver --> TenantDB["Non-default tenant data plane\n{data-path}/tenants/{team_id}/hitkeep.db"]
HTTP --> Queue["Embedded NSQ"]
Queue --> Workers["Ingest + rollup + retention + report workers"]
Workers --> Resolver
1. Storage: DuckDB
Section titled “1. Storage: DuckDB”HitKeep uses DuckDB, an in-process OLAP database engine, as its primary store.
Why DuckDB instead of PostgreSQL or ClickHouse?
- Runs inside the Go process — no separate server, no socket, no authentication
- Columnar storage optimizes analytical queries (aggregations, time-bucketing) over row-based access patterns
- A single file (
hitkeep.db) per database — but multiteam installs contain multiple DuckDB files under onedata-path, so backups should capture the full directory tree - ~120 MB per million hits compressed — efficient for VPS-class storage
- Queryable directly with the DuckDB CLI or any Parquet-compatible tool — your data is portable by nature
# Read your analytics database directly — no HitKeep running requiredduckdb /var/lib/hitkeep/data/hitkeep.db \ "SELECT date_trunc('day', timestamp), count(*) FROM hits GROUP BY 1 ORDER BY 1 DESC LIMIT 7;"Control Plane and Data Plane
Section titled “Control Plane and Data Plane”HitKeep separates storage into two planes:
- Control plane (
hitkeep.db): users, sessions, authentication, site metadata, share links, team membership, and user preferences. - Data plane (per-team DuckDB files): hits, events, goals, funnels, and pre-aggregated rollups.
For single-team instances, both planes coexist in one hitkeep.db file. When additional teams are created, each team gets its own data plane database at {data_path}/tenants/{team_id}/hitkeep.db. The only reference crossing the boundary is the site_id — no queries JOIN across planes. During the bridge release, shared goals and funnels remain only as compatibility copies for rollback safety; tenant-local copies are authoritative for non-default teams.
/var/lib/hitkeep/data/├── hitkeep.db # Control plane + default team data plane└── tenants/ └── {team_id}/ └── hitkeep.db # Team-specific data planeEvery data pathway — ingestion, API reads, background workers, exports — resolves the correct data plane database before touching analytics data. See Teams and Data Isolation for details.
Storage Boundary Diagram
Section titled “Storage Boundary Diagram”graph TD
subgraph Shared["Shared control plane — hitkeep.db"]
Users["users\nsessions\npassword_resets\nsecurity_factors"]
Teams["tenants\nmemberships\ninvites\naudit log\narchives"]
Sites["sites\nsite_tenants\nsite_memberships"]
Shares["share_links\npreferences\napi_clients metadata"]
end
subgraph Default["Default tenant data plane"]
DefaultAnalytics["hits\nevents\ngoals\nfunnels\nrollups"]
end
subgraph TenantA["Tenant A data plane"]
TenantAAnalytics["tenants/{team_id}/hitkeep.db\nhits\nevents\ngoals\nfunnels\nrollups"]
end
subgraph TenantB["Tenant B data plane"]
TenantBAnalytics["tenants/{team_id}/hitkeep.db\nhits\nevents\ngoals\nfunnels\nrollups"]
end
Sites -->|site_id → tenant_id| DefaultAnalytics
Sites -->|site_id → tenant_id| TenantAAnalytics
Sites -->|site_id → tenant_id| TenantBAnalytics
The important rule is simple: identity and ownership metadata live in the control plane; analytics rows live in the resolved data plane.
2. Ingestion Buffering: Embedded NSQ
Section titled “2. Ingestion Buffering: Embedded NSQ”Writing to a columnar database synchronously per HTTP request creates lock contention at scale. HitKeep solves this by embedding NSQ, a distributed messaging platform, in-process, and by flushing queued analytics rows to DuckDB in short appender batches instead of issuing one SQL INSERT per message.
- Decoupling: The ingest HTTP handler validates and enqueues the hit in memory, completing the request in microseconds.
- Burst absorption: Traffic spikes (a site goes viral, a product launches) queue up without database pressure. The writer consumes at a steady, optimal pace.
- Store-local batching: The consumer resolves the destination data plane first, then groups queued hits and events per DuckDB store before flushing them.
- Columnar-friendly writes: DuckDB’s appender API bypasses per-row SQL parsing and matches the database’s preferred bulk-ingest path.
- Zero configuration: NSQ runs inside the process on configurable loopback ports. No Kafka cluster, no broker to manage.
NSQ’s TCP and HTTP interfaces bind to 127.0.0.1 by default and are not exposed externally.
In practice, the write path now looks like this:
- HTTP handler validates and publishes the hit or event to NSQ.
- One of the in-process consumers picks up the message.
- The consumer resolves the correct tenant analytics store for the
site_id. - Messages targeting the same store are coalesced into a small batch for a short time window.
- The batch is flushed through DuckDB’s appender API.
- Only after the flush succeeds are the corresponding NSQ messages acknowledged.
Read and Write Paths
Section titled “Read and Write Paths”The read path and write path both go through the same tenant resolution layer:
sequenceDiagram
participant Client as Browser / API Client
participant HTTP as Go HTTP Server
participant Auth as Auth + RBAC
participant Control as Shared Control Plane
participant Resolver as Tenant Store Manager
participant Data as Tenant Data Plane
Client->>HTTP: Request
HTTP->>Auth: Validate cookie / bearer token / site permission
Auth->>Control: Resolve user, team, site ownership
Control-->>Auth: site_id + tenant_id + permissions
Auth->>Resolver: Open analytics store for tenant
Resolver-->>Auth: Default or tenant-local DuckDB handle
Auth->>Data: Read or write analytics rows
Data-->>HTTP: JSON / success
HTTP-->>Client: Response
That applies to:
- dashboard reads
- API exports
- share link reads
- API clients
- ecommerce analytics
- background workers
- site transfer copy/delete operations
3. Clustering: Leader / Follower
Section titled “3. Clustering: Leader / Follower”For high availability, HitKeep supports a Leader/Follower topology using HashiCorp Memberlist (gossip protocol) for node discovery.
| Role | Behavior |
|---|---|
| Leader | Holds the hitkeep.db file lock, runs the NSQ consumer, writes to disk |
| Follower | Accepts HTTP traffic; proxies /ingest requests to the Leader |
There is exactly one Leader at any time. If the Leader goes down, the cluster elects a new one — provided the PVC or data volume can be re-attached (Kubernetes StatefulSets handle this automatically).
Request Flow
Section titled “Request Flow”Load Balancer → Follower → [proxy /ingest] → Leader → NSQ → DuckDB → [serve API reads] → DuckDB read replica (same file, read-only)For single-server deployments (Docker Compose, systemd), the node acts as Leader implicitly.
4. Security Layer
Section titled “4. Security Layer”The HTTP server enforces multiple security controls before any data is processed:
- Rate limiting: Per-IP token bucket limiters on
/ingest,/api/*, and/api/login. Configurable rate and burst. - Sec-Fetch validation: Checks
Sec-Fetch-*headers on state-changing requests. - JWT authentication: HTTP-only cookies signed with a configurable secret. Short expiry by default.
- WebAuthn: FIDO2/Passkey challenge-response for passwordless login.
- TOTP: RFC 6238 time-based one-time passwords for second-factor auth.
Authentication Modes
Section titled “Authentication Modes”HitKeep currently supports three main access modes into the same API surface:
graph LR
Session["Session cookie"] --> Authz["Auth + permissions"]
Personal["Personal API client"] --> Authz
Team["Team API client"] --> Authz
Share["Share token"] --> ShareRead["Read-only share handlers"]
Authz --> Control["Shared control plane"]
Authz --> Resolver["Tenant Store Manager"]
ShareRead --> Resolver
- Session cookies are for interactive users in the dashboard.
- Personal API clients are user-owned bearer tokens.
- Team API clients are tenant-owned bearer tokens that survive individual user departure.
- Share tokens are read-only public-style analytics access scoped to one shared site/dashboard view.
5. Frontend Architecture
Section titled “5. Frontend Architecture”The dashboard is a Single Page Application served from the same binary:
- Framework: Angular v21 with Signals for reactive state
- UI library: PrimeNG with Tailwind CSS
- API contract: All dashboard functionality uses the same JSON REST API as external clients
- Tracking snippet:
hk.jsis minified with esbuild (already part of the Angular toolchain) to stay under 2 KB. It is served from your instance — no third-party CDN, no cross-origin requests from your visitors’ browsers.
The dashboard itself is now multi-context:
- active site
- active team
- user/session state
- permission state
- analytics filters and date range
Those contexts stay on the client, but authoritative ownership and access checks always happen server-side in the shared control plane before analytics data is read.
6. Data Lifecycle and Recovery
Section titled “6. Data Lifecycle and Recovery”HitKeep now has three distinct storage lifecycle concepts and they should not be confused:
- Live databases for current reads and writes
- Retention archives for old analytics rows exported to Parquet
- Backup snapshots / purge flows for disaster recovery and irreversible cleanup
graph TD
Live["Live control plane + tenant DBs"] --> Retention["Retention worker\narchive old analytics to Parquet"]
Live --> Backup["Backup worker\nEXPORT DATABASE snapshots"]
ArchivedTeam["Archived team"] --> Purge["Admin purge\nDELETE /api/admin/teams/{id}"]
Purge --> Remove["Remove control-plane metadata\nand tenants/{id}/hitkeep.db"]
Important distinctions:
-archive-pathis for Parquet retention archives and related artifacts.-data-pathis where live tenant DuckDB files live.- archiving a team is a reversible control-plane state until you run the purge path.
- purging an archived team is irreversible and removes its live tenant database directory.
7. Ecommerce in the Architecture
Section titled “7. Ecommerce in the Architecture”Ecommerce is not a separate subsystem. It is an opinionated query layer over the same tenant-local events tables.
graph LR
Tracker["hk.js / server ingest"] --> Events["events table"]
Events --> Normalizer["GA4-inspired normalization\npurchase / begin_checkout / add_to_cart / view_item\n+ legacy aliases"]
Normalizer --> Queries["Revenue / products / sources queries"]
Queries --> Page["Ecommerce dashboard page"]
That means:
- ecommerce data inherits team isolation automatically
- ecommerce filters reuse the same site/session attribution model
- ecommerce backups, restores, transfers, and retention follow the same tenant data-plane rules as other analytics data
What the Binary Contains
Section titled “What the Binary Contains”hitkeep (single executable, ~80 MB)├── Go HTTP server├── Embedded NSQ broker + consumer├── DuckDB engine + SQL migrations├── Angular dashboard (compiled, embedded)├── hk.js tracking snippet (embedded)└── Background workers (retention, rollups, reports)Every layer is auditable. The full source is on GitHub under the MIT license.
Related
Section titled “Related”HitKeep Cloud runs this exact binary stack in managed EU (Frankfurt) or US (Virginia) infrastructure with the same auditability guarantees. Start with HitKeep Cloud →