Architecture
HitKeep is designed to give you control over every layer of the analytics stack without requiring you to operate multiple services. The main analytics path, dashboard API, embedded queue, and DuckDB storage live in one Go binary. Optional integrations such as SMTP, S3 backup/archive storage, favicon lookup, MCP docs tools, and AI provider enrichment can make outbound calls when you enable or configure them. For citable binary size, memory use, storage boundaries, privacy behavior, exports, and non-goals, see Facts and Limits.
High-Level Data Flow
Section titled “High-Level Data Flow”Browser and server-side pageview or event data flows through four stages on the leader node:
- Ingestion: The HTTP server receives a pageview or event from the tracking script.
- Buffering: The hit is published to an in-memory NSQ queue, decoupling the HTTP response from the write path.
- Processing: Internal consumers read from the queue concurrently, group messages by resolved tenant store, and build short-lived micro-batches.
- Storage: Each store-local batch is written to embedded DuckDB through the appender API, then the queued messages are acknowledged.
graph LR
A["Browser tracker\nor server-side ingest"] -->|"POST /ingest\nPOST /ingest/event\nPOST /ingest/web-vitals\nPOST /api/ingest/server/*"| B("Leader Go HTTP Server")
B -->|Publish hits/events/vitals| C{"Embedded NSQ"}
C -->|Consume concurrently| D["Ingest Consumer"]
D -->|Resolve tenant| F{Tenant Store Manager}
F -->|Queued hits/events/vitals| G[Micro-batch Builder]
G -->|Appender flush| E[(DuckDB — tenant data plane)]
AI["AI crawler log forwarder"] -->|"POST /api/sites/{id}/ingest/ai-fetch"| B
B -->|Permission check + direct insert| F
F -->|Direct AI fetch write| E
The tracking script (hk.js) is also served from your instance. Opt-in Web Vitals use the same-origin hk-vitals.js split bundle. No CDN or third-party analytics domain is required.
System Boundaries
Section titled “System Boundaries”The easiest way to understand HitKeep today is to separate it into four concerns:
- Public ingest and API surface
- Shared control plane
- Tenant-local analytics data plane
- Background lifecycle workers
graph TD
Browser["Browser / Server Tracker"] -->|POST /ingest\nPOST /ingest/event\nPOST /ingest/web-vitals\nPOST /api/ingest/server/*| HTTP["Go HTTP Server"]
AIFetch["AI fetch forwarder"] -->|POST /api/sites/{id}/ingest/ai-fetch| HTTP
Dashboard["Angular Dashboard"] -->|JSON REST API| HTTP
APIClients["Personal + Team API Clients"] -->|Bearer token| HTTP
MCPClient["MCP Client"] -->|Streamable HTTP /mcp\nBearer API client token| HTTP
AIGateway["Configured AI Provider\nor Gateway"] -.->|Optional provider call\nfor validated copy| HTTP
HTTP --> Auth["Auth / Validation + Rate Limit + Permission Layer"]
Auth --> Control["Shared control plane\nhitkeep.db"]
Auth --> Resolver["Tenant Store Manager"]
Resolver --> DefaultDB["Default tenant data plane\n(shared hitkeep.db)"]
Resolver --> TenantDB["Non-default tenant data plane\n{data-path}/tenants/{team_id}/hitkeep.db"]
HTTP -->|pageviews + events + web vitals| Queue["Embedded NSQ"]
Queue --> IngestConsumer["Ingest consumer\nstore-local batches"]
IngestConsumer --> Resolver
Workers["Leader-only lifecycle workers\nrollups + retention + reports + backups"] --> Resolver
1. Storage: DuckDB
Section titled “1. Storage: DuckDB”HitKeep uses DuckDB, an in-process OLAP database engine, as its primary store.
Why DuckDB instead of PostgreSQL or ClickHouse?
- Runs inside the Go process — no separate server, no socket, no authentication
- Columnar storage optimizes analytical queries (aggregations, time-bucketing) over row-based access patterns
- A single file (
hitkeep.db) per database — but multiteam installs contain multiple DuckDB files under onedata-path, so backups should capture the full directory tree - ~120 MB per million hits compressed — efficient for VPS-class storage
- Queryable directly with the DuckDB CLI or any Parquet-compatible tool — your data is portable by nature
# Read your analytics database directly — no HitKeep running requiredduckdb /var/lib/hitkeep/data/hitkeep.db \ "SELECT date_trunc('day', timestamp), count(*) FROM hits GROUP BY 1 ORDER BY 1 DESC LIMIT 7;"Control Plane and Data Plane
Section titled “Control Plane and Data Plane”HitKeep separates storage into two planes:
- Control plane (
hitkeep.db): users, sessions, authentication, site metadata, share links, team membership, and user preferences. - Data plane (per-team DuckDB files): hits, events, goals, funnels, and pre-aggregated rollups.
For single-team instances, both planes coexist in one hitkeep.db file. When additional teams are created, each team gets its own data plane database at {data_path}/tenants/{team_id}/hitkeep.db. The only reference crossing the boundary is the site_id — no queries JOIN across planes. During the bridge release, shared goals and funnels remain only as compatibility copies for rollback safety; tenant-local copies are authoritative for non-default teams.
/var/lib/hitkeep/data/├── hitkeep.db # Control plane + default team data plane└── tenants/ └── {team_id}/ └── hitkeep.db # Team-specific data planeEvery data pathway — ingestion, API reads, background workers, exports — resolves the correct data plane database before touching analytics data. See Teams and Data Isolation for details.
Storage Boundary Diagram
Section titled “Storage Boundary Diagram”graph TD
subgraph Shared["Shared control plane — hitkeep.db"]
Users["users\nsessions\npassword_resets\nsecurity_factors"]
Teams["tenants\nmemberships\ninvites\naudit log\narchives"]
Sites["sites\nsite_tenants\nsite_memberships"]
Shares["share_links\npreferences\napi_clients metadata"]
end
subgraph Default["Default tenant data plane"]
DefaultAnalytics["hits\nevents\ngoals\nfunnels\nrollups"]
end
subgraph TenantA["Tenant A data plane"]
TenantAAnalytics["tenants/{team_id}/hitkeep.db\nhits\nevents\ngoals\nfunnels\nrollups"]
end
subgraph TenantB["Tenant B data plane"]
TenantBAnalytics["tenants/{team_id}/hitkeep.db\nhits\nevents\ngoals\nfunnels\nrollups"]
end
Sites -->|site_id → tenant_id| DefaultAnalytics
Sites -->|site_id → tenant_id| TenantAAnalytics
Sites -->|site_id → tenant_id| TenantBAnalytics
The important rule is simple: identity and ownership metadata live in the control plane; analytics rows live in the resolved data plane.
2. Ingestion Buffering: Embedded NSQ
Section titled “2. Ingestion Buffering: Embedded NSQ”Writing to a columnar database synchronously per HTTP request creates lock contention when traffic spikes. HitKeep avoids that by embedding NSQ, a distributed messaging platform, in-process, and by flushing queued analytics rows to DuckDB in short appender batches instead of issuing one SQL INSERT per message.
- Decoupling: The ingest HTTP handler validates and enqueues the hit in memory, completing the request in microseconds.
- Burst absorption: Traffic spikes (a site goes viral, a product launches) queue up without database pressure. The writer consumes at a steady, optimal pace.
- Store-local batching: The consumer resolves the destination data plane first, then groups queued hits and events per DuckDB store before flushing them.
- Columnar-friendly writes: DuckDB’s appender API bypasses per-row SQL parsing and matches the database’s preferred bulk-ingest path.
- Zero configuration: NSQ runs inside the process on configurable loopback ports. No Kafka cluster, no broker to manage.
NSQ’s TCP and HTTP interfaces bind to 127.0.0.1 by default and are not exposed externally.
In practice, the write path now looks like this:
- HTTP handler validates and publishes the hit or event to NSQ.
- One of the in-process consumers picks up the message.
- The consumer resolves the correct tenant analytics store for the
site_id. - Messages targeting the same store are coalesced into a small batch for a short time window.
- The batch is flushed through DuckDB’s appender API.
- Only after the flush succeeds are the corresponding NSQ messages acknowledged.
Read and Write Paths
Section titled “Read and Write Paths”The read path and write path both go through the same tenant resolution layer:
sequenceDiagram
participant Client as Browser / API Client
participant HTTP as Go HTTP Server
participant Auth as Auth + RBAC
participant Control as Shared Control Plane
participant Queue as Embedded NSQ
participant Consumer as Ingest Consumer
participant Resolver as Tenant Store Manager
participant Data as Tenant Data Plane
Client->>HTTP: Request
HTTP->>Auth: Validate cookie / bearer token / site permission
Auth->>Control: Resolve user, team, site ownership
Control-->>Auth: site_id + tenant_id + permissions
alt Dashboard/API read or direct analytics write
Auth->>Resolver: Open analytics store for tenant
Resolver-->>Auth: Default or tenant-local DuckDB handle
Auth->>Data: Read rows or write direct-ingest records
Data-->>HTTP: JSON / success
else Browser/server pageview or event ingest
Auth->>Queue: Publish hit/event message
Queue-->>HTTP: Accepted
Queue->>Consumer: Deliver message
Consumer->>Resolver: Resolve analytics store for site_id
Resolver-->>Consumer: Default or tenant-local DuckDB handle
Consumer->>Data: Appender batch flush
end
HTTP-->>Client: Response
That applies to:
- dashboard reads
- API exports
- share link reads
- API clients
- ecommerce analytics
- opportunity detectors and saved recommendations
- background workers
- site transfer copy/delete operations
3. Clustering: Leader / Follower
Section titled “3. Clustering: Leader / Follower”For high availability, HitKeep supports a Leader/Follower topology using HashiCorp Memberlist (gossip protocol) for node discovery.
| Role | Behavior |
|---|---|
| Leader | Opens the control-plane and tenant DuckDB files, runs embedded NSQ, consumers, workers, MCP, and stateful API handlers |
| Follower | Serves the HTTP process without stateful stores; proxies browser /ingest and /ingest/event requests to the leader |
There is exactly one Leader at any time. If the Leader goes down, the cluster elects a new one — provided the PVC or data volume can be re-attached (Kubernetes StatefulSets handle this automatically).
Request Flow
Section titled “Request Flow”Load Balancer → Follower → [proxy /ingest or /ingest/event] → Leader → NSQ → DuckDBLoad Balancer → Leader → [dashboard/API/MCP/stateful handlers] → Tenant Store Manager → DuckDBFor single-server deployments (Docker Compose, systemd), the node acts as Leader implicitly.
Optional MCP Route
Section titled “Optional MCP Route”The optional MCP route is mounted on the main HitKeep HTTP server. It is disabled by default and is registered only on the leader after leader services are ready.
MCP requests use the Streamable HTTP transport and authenticate with existing API client bearer tokens. They do not accept dashboard cookies. Analytics reads pass through the same API client permission checks and tenant store resolution used by REST handlers.
The v1 MCP surface is read-only and aggregate-only. It can list visible sites, return overview, event, ecommerce, AI visibility, saved Opportunities, and imported Search Console analytics, and expose local help resources. Docs tools may fetch official HitKeep documentation as markdown from the configured docs origin, but analytics data is not sent to the docs site. Docs markdown is cached in a bounded in-memory LRU cache.
graph LR
Assistant["MCP Client"] -->|Bearer token| MCP["Leader HTTP server\n/mcp route"]
MCP --> Authz["API client auth\nsite.view checks\nAPI rate limiter"]
Authz --> Resolver["Tenant Store Manager"]
Resolver --> Data["Resolved DuckDB analytics store"]
MCP -->|Accept: text/markdown\nonly on docs tool calls| Docs["Official docs origin"]
Optional AI Model Route
Section titled “Optional AI Model Route”Optional AI provider calls are disabled by default. When enabled, HitKeep calls the configured provider, model, or OpenAI-compatible gateway route only for features that explicitly use AI.
The first feature is Opportunity Recommendations. Deterministic detectors read tenant-local analytics first and decide the opportunity type, evidence, impact, confidence, score, and status. HitKeep accepts only validated translation keys, params, cited evidence IDs, and safe structured output from AI enrichment. Raw prompts, raw provider payloads, and provider secrets are not persisted.
4. Security Layer
Section titled “4. Security Layer”The HTTP server enforces multiple security controls before any data is processed:
- Rate limiting: Per-IP token bucket limiters on
/ingest,/api/*, and/api/login. Configurable rate and burst. - Sec-Fetch validation: Checks
Sec-Fetch-*headers on state-changing requests. - JWT authentication: HTTP-only cookies signed with a configurable secret. Short expiry by default.
- WebAuthn: FIDO2/Passkey challenge-response for passwordless login.
- TOTP: RFC 6238 time-based one-time passwords for second-factor auth.
Authentication Modes
Section titled “Authentication Modes”HitKeep currently supports three main access modes into the same API surface:
graph LR
Session["Session cookie"] --> Authz["Auth + permissions"]
Personal["Personal API client"] --> Authz
Team["Team API client"] --> Authz
Share["Share token"] --> ShareRead["Read-only share handlers"]
Authz --> Control["Shared control plane"]
Authz --> Resolver["Tenant Store Manager"]
ShareRead --> Resolver
- Session cookies are for interactive users in the dashboard.
- Personal API clients are user-owned bearer tokens.
- Team API clients are tenant-owned bearer tokens that survive individual user departure.
- Share tokens are read-only public-style analytics access scoped to one shared site/dashboard view.
The optional MCP server reuses API client bearer tokens only. This makes MCP access revocable through the same API client lifecycle as other server-to-server integrations.
5. Frontend Architecture
Section titled “5. Frontend Architecture”The dashboard is a Single Page Application served from the same binary:
- Framework: Angular v21 with Signals for reactive state
- UI library: PrimeNG with Tailwind CSS
- API contract: All dashboard functionality uses the same JSON REST API as external clients
- Tracking snippet:
hk.jsis minified with esbuild and served from your instance. It usessendBeacon()with a keepalive fetch fallback, in-memory retry and dedupe state, and only stores the existing opaque session tuple insessionStorage. Web Vitals stay in a separate opt-inhk-vitals.jsbundle. See Tracker Architecture.
The dashboard itself is now multi-context:
- active site
- active team
- user/session state
- permission state
- analytics filters and date range
Those contexts stay on the client, but authoritative ownership and access checks always happen server-side in the shared control plane before analytics data is read.
6. Data Lifecycle and Recovery
Section titled “6. Data Lifecycle and Recovery”HitKeep now has three distinct storage lifecycle concepts and they should not be confused:
- Live databases for current reads and writes
- Retention archives for old analytics rows exported to Parquet
- Backup snapshots / purge flows for disaster recovery and irreversible cleanup
graph TD
Live["Live control plane + tenant DBs"] --> Retention["Retention worker\narchive old analytics to Parquet"]
Live --> Backup["Backup worker\nEXPORT DATABASE snapshots"]
ArchivedTeam["Archived team"] --> Purge["Admin purge\nDELETE /api/admin/teams/{id}"]
Purge --> Remove["Remove control-plane metadata\nand tenants/{id}/hitkeep.db"]
Important distinctions:
-archive-pathis for Parquet retention archives and related artifacts.-data-pathis where live tenant DuckDB files live.- archiving a team is a reversible control-plane state until you run the purge path.
- purging an archived team is irreversible and removes its live tenant database directory.
7. Ecommerce in the Architecture
Section titled “7. Ecommerce in the Architecture”Ecommerce is not a separate subsystem. It is an opinionated query layer over the same tenant-local events tables.
graph LR
Tracker["hk.js / server ingest"] --> Events["events table"]
Events --> Normalizer["GA4-inspired normalization\npurchase / begin_checkout / add_to_cart / view_item\n+ legacy aliases"]
Normalizer --> Queries["Revenue / products / sources queries"]
Queries --> Page["Ecommerce dashboard page"]
That means:
- ecommerce data inherits team isolation automatically
- ecommerce filters reuse the same site/session attribution model
- ecommerce backups, restores, transfers, and retention follow the same tenant data-plane rules as other analytics data
What the Binary Contains
Section titled “What the Binary Contains”hitkeep (single executable)├── Go HTTP server├── Embedded NSQ broker + consumer├── DuckDB engine + SQL migrations├── Angular dashboard (compiled, embedded)├── hk.js tracking snippet (embedded)├── Optional MCP route (leader only)├── Optional AI model route (disabled by default)└── Background workers (retention, rollups, reports, imports, backups)Every layer is auditable. The full source is on GitHub under the MIT license.
Related
Section titled “Related”- Configuration Reference
- Teams and Data Isolation
- Installation Guide
- Trusted Proxies
- Opportunity Recommendations
HitKeep Cloud runs this exact binary stack in managed EU (Frankfurt) or US (Virginia) infrastructure, with the same source-visible product foundation as self-hosted deployments. Start with HitKeep Cloud →