Skip to content
Start in Cloud

Architecture

HitKeep is designed to give you control over every layer of the analytics stack without requiring you to operate multiple services. The main analytics path, dashboard API, embedded queue, and DuckDB storage live in one Go binary. Optional integrations such as SMTP, S3 backup/archive storage, favicon lookup, MCP docs tools, and AI provider enrichment can make outbound calls when you enable or configure them. For citable binary size, memory use, storage boundaries, privacy behavior, exports, and non-goals, see Facts and Limits.

Browser and server-side pageview or event data flows through four stages on the leader node:

  1. Ingestion: The HTTP server receives a pageview or event from the tracking script.
  2. Buffering: The hit is published to an in-memory NSQ queue, decoupling the HTTP response from the write path.
  3. Processing: Internal consumers read from the queue concurrently, group messages by resolved tenant store, and build short-lived micro-batches.
  4. Storage: Each store-local batch is written to embedded DuckDB through the appender API, then the queued messages are acknowledged.
graph LR
    A["Browser tracker\nor server-side ingest"] -->|"POST /ingest\nPOST /ingest/event\nPOST /ingest/web-vitals\nPOST /api/ingest/server/*"| B("Leader Go HTTP Server")
    B -->|Publish hits/events/vitals| C{"Embedded NSQ"}
    C -->|Consume concurrently| D["Ingest Consumer"]
    D -->|Resolve tenant| F{Tenant Store Manager}
    F -->|Queued hits/events/vitals| G[Micro-batch Builder]
    G -->|Appender flush| E[(DuckDB — tenant data plane)]

    AI["AI crawler log forwarder"] -->|"POST /api/sites/{id}/ingest/ai-fetch"| B
    B -->|Permission check + direct insert| F
    F -->|Direct AI fetch write| E

The tracking script (hk.js) is also served from your instance. Opt-in Web Vitals use the same-origin hk-vitals.js split bundle. No CDN or third-party analytics domain is required.

The easiest way to understand HitKeep today is to separate it into four concerns:

  1. Public ingest and API surface
  2. Shared control plane
  3. Tenant-local analytics data plane
  4. Background lifecycle workers
graph TD
    Browser["Browser / Server Tracker"] -->|POST /ingest\nPOST /ingest/event\nPOST /ingest/web-vitals\nPOST /api/ingest/server/*| HTTP["Go HTTP Server"]
    AIFetch["AI fetch forwarder"] -->|POST /api/sites/{id}/ingest/ai-fetch| HTTP
    Dashboard["Angular Dashboard"] -->|JSON REST API| HTTP
    APIClients["Personal + Team API Clients"] -->|Bearer token| HTTP
    MCPClient["MCP Client"] -->|Streamable HTTP /mcp\nBearer API client token| HTTP
    AIGateway["Configured AI Provider\nor Gateway"] -.->|Optional provider call\nfor validated copy| HTTP

    HTTP --> Auth["Auth / Validation + Rate Limit + Permission Layer"]
    Auth --> Control["Shared control plane\nhitkeep.db"]
    Auth --> Resolver["Tenant Store Manager"]
    Resolver --> DefaultDB["Default tenant data plane\n(shared hitkeep.db)"]
    Resolver --> TenantDB["Non-default tenant data plane\n{data-path}/tenants/{team_id}/hitkeep.db"]

    HTTP -->|pageviews + events + web vitals| Queue["Embedded NSQ"]
    Queue --> IngestConsumer["Ingest consumer\nstore-local batches"]
    IngestConsumer --> Resolver
    Workers["Leader-only lifecycle workers\nrollups + retention + reports + backups"] --> Resolver

HitKeep uses DuckDB, an in-process OLAP database engine, as its primary store.

Why DuckDB instead of PostgreSQL or ClickHouse?

  • Runs inside the Go process — no separate server, no socket, no authentication
  • Columnar storage optimizes analytical queries (aggregations, time-bucketing) over row-based access patterns
  • A single file (hitkeep.db) per database — but multiteam installs contain multiple DuckDB files under one data-path, so backups should capture the full directory tree
  • ~120 MB per million hits compressed — efficient for VPS-class storage
  • Queryable directly with the DuckDB CLI or any Parquet-compatible tool — your data is portable by nature
Terminal window
# Read your analytics database directly — no HitKeep running required
duckdb /var/lib/hitkeep/data/hitkeep.db \
"SELECT date_trunc('day', timestamp), count(*) FROM hits GROUP BY 1 ORDER BY 1 DESC LIMIT 7;"

HitKeep separates storage into two planes:

  • Control plane (hitkeep.db): users, sessions, authentication, site metadata, share links, team membership, and user preferences.
  • Data plane (per-team DuckDB files): hits, events, goals, funnels, and pre-aggregated rollups.

For single-team instances, both planes coexist in one hitkeep.db file. When additional teams are created, each team gets its own data plane database at {data_path}/tenants/{team_id}/hitkeep.db. The only reference crossing the boundary is the site_id — no queries JOIN across planes. During the bridge release, shared goals and funnels remain only as compatibility copies for rollback safety; tenant-local copies are authoritative for non-default teams.

/var/lib/hitkeep/data/
├── hitkeep.db # Control plane + default team data plane
└── tenants/
└── {team_id}/
└── hitkeep.db # Team-specific data plane

Every data pathway — ingestion, API reads, background workers, exports — resolves the correct data plane database before touching analytics data. See Teams and Data Isolation for details.

graph TD
    subgraph Shared["Shared control plane — hitkeep.db"]
        Users["users\nsessions\npassword_resets\nsecurity_factors"]
        Teams["tenants\nmemberships\ninvites\naudit log\narchives"]
        Sites["sites\nsite_tenants\nsite_memberships"]
        Shares["share_links\npreferences\napi_clients metadata"]
    end

    subgraph Default["Default tenant data plane"]
        DefaultAnalytics["hits\nevents\ngoals\nfunnels\nrollups"]
    end

    subgraph TenantA["Tenant A data plane"]
        TenantAAnalytics["tenants/{team_id}/hitkeep.db\nhits\nevents\ngoals\nfunnels\nrollups"]
    end

    subgraph TenantB["Tenant B data plane"]
        TenantBAnalytics["tenants/{team_id}/hitkeep.db\nhits\nevents\ngoals\nfunnels\nrollups"]
    end

    Sites -->|site_id → tenant_id| DefaultAnalytics
    Sites -->|site_id → tenant_id| TenantAAnalytics
    Sites -->|site_id → tenant_id| TenantBAnalytics

The important rule is simple: identity and ownership metadata live in the control plane; analytics rows live in the resolved data plane.

Writing to a columnar database synchronously per HTTP request creates lock contention when traffic spikes. HitKeep avoids that by embedding NSQ, a distributed messaging platform, in-process, and by flushing queued analytics rows to DuckDB in short appender batches instead of issuing one SQL INSERT per message.

  • Decoupling: The ingest HTTP handler validates and enqueues the hit in memory, completing the request in microseconds.
  • Burst absorption: Traffic spikes (a site goes viral, a product launches) queue up without database pressure. The writer consumes at a steady, optimal pace.
  • Store-local batching: The consumer resolves the destination data plane first, then groups queued hits and events per DuckDB store before flushing them.
  • Columnar-friendly writes: DuckDB’s appender API bypasses per-row SQL parsing and matches the database’s preferred bulk-ingest path.
  • Zero configuration: NSQ runs inside the process on configurable loopback ports. No Kafka cluster, no broker to manage.

NSQ’s TCP and HTTP interfaces bind to 127.0.0.1 by default and are not exposed externally.

In practice, the write path now looks like this:

  1. HTTP handler validates and publishes the hit or event to NSQ.
  2. One of the in-process consumers picks up the message.
  3. The consumer resolves the correct tenant analytics store for the site_id.
  4. Messages targeting the same store are coalesced into a small batch for a short time window.
  5. The batch is flushed through DuckDB’s appender API.
  6. Only after the flush succeeds are the corresponding NSQ messages acknowledged.

The read path and write path both go through the same tenant resolution layer:

sequenceDiagram
    participant Client as Browser / API Client
    participant HTTP as Go HTTP Server
    participant Auth as Auth + RBAC
    participant Control as Shared Control Plane
    participant Queue as Embedded NSQ
    participant Consumer as Ingest Consumer
    participant Resolver as Tenant Store Manager
    participant Data as Tenant Data Plane

    Client->>HTTP: Request
    HTTP->>Auth: Validate cookie / bearer token / site permission
    Auth->>Control: Resolve user, team, site ownership
    Control-->>Auth: site_id + tenant_id + permissions
    alt Dashboard/API read or direct analytics write
        Auth->>Resolver: Open analytics store for tenant
        Resolver-->>Auth: Default or tenant-local DuckDB handle
        Auth->>Data: Read rows or write direct-ingest records
        Data-->>HTTP: JSON / success
    else Browser/server pageview or event ingest
        Auth->>Queue: Publish hit/event message
        Queue-->>HTTP: Accepted
        Queue->>Consumer: Deliver message
        Consumer->>Resolver: Resolve analytics store for site_id
        Resolver-->>Consumer: Default or tenant-local DuckDB handle
        Consumer->>Data: Appender batch flush
    end
    HTTP-->>Client: Response

That applies to:

  • dashboard reads
  • API exports
  • share link reads
  • API clients
  • ecommerce analytics
  • opportunity detectors and saved recommendations
  • background workers
  • site transfer copy/delete operations

For high availability, HitKeep supports a Leader/Follower topology using HashiCorp Memberlist (gossip protocol) for node discovery.

RoleBehavior
LeaderOpens the control-plane and tenant DuckDB files, runs embedded NSQ, consumers, workers, MCP, and stateful API handlers
FollowerServes the HTTP process without stateful stores; proxies browser /ingest and /ingest/event requests to the leader

There is exactly one Leader at any time. If the Leader goes down, the cluster elects a new one — provided the PVC or data volume can be re-attached (Kubernetes StatefulSets handle this automatically).

Load Balancer → Follower → [proxy /ingest or /ingest/event] → Leader → NSQ → DuckDB
Load Balancer → Leader → [dashboard/API/MCP/stateful handlers] → Tenant Store Manager → DuckDB

For single-server deployments (Docker Compose, systemd), the node acts as Leader implicitly.

The optional MCP route is mounted on the main HitKeep HTTP server. It is disabled by default and is registered only on the leader after leader services are ready.

MCP requests use the Streamable HTTP transport and authenticate with existing API client bearer tokens. They do not accept dashboard cookies. Analytics reads pass through the same API client permission checks and tenant store resolution used by REST handlers.

The v1 MCP surface is read-only and aggregate-only. It can list visible sites, return overview, event, ecommerce, AI visibility, saved Opportunities, and imported Search Console analytics, and expose local help resources. Docs tools may fetch official HitKeep documentation as markdown from the configured docs origin, but analytics data is not sent to the docs site. Docs markdown is cached in a bounded in-memory LRU cache.

graph LR
    Assistant["MCP Client"] -->|Bearer token| MCP["Leader HTTP server\n/mcp route"]
    MCP --> Authz["API client auth\nsite.view checks\nAPI rate limiter"]
    Authz --> Resolver["Tenant Store Manager"]
    Resolver --> Data["Resolved DuckDB analytics store"]
    MCP -->|Accept: text/markdown\nonly on docs tool calls| Docs["Official docs origin"]

Optional AI provider calls are disabled by default. When enabled, HitKeep calls the configured provider, model, or OpenAI-compatible gateway route only for features that explicitly use AI.

The first feature is Opportunity Recommendations. Deterministic detectors read tenant-local analytics first and decide the opportunity type, evidence, impact, confidence, score, and status. HitKeep accepts only validated translation keys, params, cited evidence IDs, and safe structured output from AI enrichment. Raw prompts, raw provider payloads, and provider secrets are not persisted.

The HTTP server enforces multiple security controls before any data is processed:

  • Rate limiting: Per-IP token bucket limiters on /ingest, /api/*, and /api/login. Configurable rate and burst.
  • Sec-Fetch validation: Checks Sec-Fetch-* headers on state-changing requests.
  • JWT authentication: HTTP-only cookies signed with a configurable secret. Short expiry by default.
  • WebAuthn: FIDO2/Passkey challenge-response for passwordless login.
  • TOTP: RFC 6238 time-based one-time passwords for second-factor auth.

HitKeep currently supports three main access modes into the same API surface:

graph LR
    Session["Session cookie"] --> Authz["Auth + permissions"]
    Personal["Personal API client"] --> Authz
    Team["Team API client"] --> Authz
    Share["Share token"] --> ShareRead["Read-only share handlers"]

    Authz --> Control["Shared control plane"]
    Authz --> Resolver["Tenant Store Manager"]
    ShareRead --> Resolver
  • Session cookies are for interactive users in the dashboard.
  • Personal API clients are user-owned bearer tokens.
  • Team API clients are tenant-owned bearer tokens that survive individual user departure.
  • Share tokens are read-only public-style analytics access scoped to one shared site/dashboard view.

The optional MCP server reuses API client bearer tokens only. This makes MCP access revocable through the same API client lifecycle as other server-to-server integrations.

The dashboard is a Single Page Application served from the same binary:

  • Framework: Angular v21 with Signals for reactive state
  • UI library: PrimeNG with Tailwind CSS
  • API contract: All dashboard functionality uses the same JSON REST API as external clients
  • Tracking snippet: hk.js is minified with esbuild and served from your instance. It uses sendBeacon() with a keepalive fetch fallback, in-memory retry and dedupe state, and only stores the existing opaque session tuple in sessionStorage. Web Vitals stay in a separate opt-in hk-vitals.js bundle. See Tracker Architecture.

The dashboard itself is now multi-context:

  • active site
  • active team
  • user/session state
  • permission state
  • analytics filters and date range

Those contexts stay on the client, but authoritative ownership and access checks always happen server-side in the shared control plane before analytics data is read.

HitKeep now has three distinct storage lifecycle concepts and they should not be confused:

  1. Live databases for current reads and writes
  2. Retention archives for old analytics rows exported to Parquet
  3. Backup snapshots / purge flows for disaster recovery and irreversible cleanup
graph TD
    Live["Live control plane + tenant DBs"] --> Retention["Retention worker\narchive old analytics to Parquet"]
    Live --> Backup["Backup worker\nEXPORT DATABASE snapshots"]
    ArchivedTeam["Archived team"] --> Purge["Admin purge\nDELETE /api/admin/teams/{id}"]
    Purge --> Remove["Remove control-plane metadata\nand tenants/{id}/hitkeep.db"]

Important distinctions:

  • -archive-path is for Parquet retention archives and related artifacts.
  • -data-path is where live tenant DuckDB files live.
  • archiving a team is a reversible control-plane state until you run the purge path.
  • purging an archived team is irreversible and removes its live tenant database directory.

Ecommerce is not a separate subsystem. It is an opinionated query layer over the same tenant-local events tables.

graph LR
    Tracker["hk.js / server ingest"] --> Events["events table"]
    Events --> Normalizer["GA4-inspired normalization\npurchase / begin_checkout / add_to_cart / view_item\n+ legacy aliases"]
    Normalizer --> Queries["Revenue / products / sources queries"]
    Queries --> Page["Ecommerce dashboard page"]

That means:

  • ecommerce data inherits team isolation automatically
  • ecommerce filters reuse the same site/session attribution model
  • ecommerce backups, restores, transfers, and retention follow the same tenant data-plane rules as other analytics data
hitkeep (single executable)
├── Go HTTP server
├── Embedded NSQ broker + consumer
├── DuckDB engine + SQL migrations
├── Angular dashboard (compiled, embedded)
├── hk.js tracking snippet (embedded)
├── Optional MCP route (leader only)
├── Optional AI model route (disabled by default)
└── Background workers (retention, rollups, reports, imports, backups)

Every layer is auditable. The full source is on GitHub under the MIT license.

HitKeep Cloud runs this exact binary stack in managed EU (Frankfurt) or US (Virginia) infrastructure, with the same source-visible product foundation as self-hosted deployments. Start with HitKeep Cloud →