Skip to content
Start in Cloud

AI Fetch Ingest

HitKeep cannot see AI crawler fetches from the browser tracker because most AI crawlers do not run JavaScript. To populate AI Visibility, forward matching edge, proxy, CDN, or origin log records to:

POST /api/sites/{site_id}/ingest/ai-fetch
Authorization: Bearer <hitkeep-api-client-token>
Content-Type: application/json

Use this guide for any platform: nginx, Caddy, Apache, Cloudflare Workers, Fastly Compute, Vercel edge logs, Netlify functions, app-server middleware, CDN log drains, or a small batch job that reads access logs.

AI fetch ingest stores server-side crawler fetch metadata for one HitKeep site. The dashboard uses those rows to show:

  • which AI crawlers fetched your pages
  • which paths and resource types they requested
  • 4xx and 5xx patterns
  • response-time and byte-size context
  • correlation with later AI-referred human visits

The endpoint records the time when HitKeep accepts the row. It does not accept a caller-provided historical timestamp. For delayed CDN logs, forward new batches as close to log creation time as practical and keep the source logs as your exact audit trail.

You need:

  • a HitKeep site ID for the site being tracked
  • an API client token with a site grant for that site
  • a site role grant that includes site.manage_data, such as site admin or owner
  • access to logs or middleware that includes request path, HTTP status, and user agent

Site grants are required. An instance/admin API-client role alone does not allow AI fetch ingest.

Guide: API Clients

Send one JSON object per AI crawler request:

{
"path": "/guides/analytics/ai-visibility/?utm_source=docs",
"hostname": "www.example.com",
"status_code": 200,
"content_type": "text/html; charset=utf-8",
"response_ms": 143,
"bytes_served": 48231,
"user_agent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)"
}
FieldRequiredNotes
pathYesURL path with optional query string. If your source log has a full URL, strip it to path and query.
status_codeYesHTTP status from the original crawler request. Must be between 100 and 599.
user_agentYesOriginal crawler user agent. HitKeep accepts known AI crawler tokens and rejects unknown user agents.
hostnameNoHost that served the request. Useful when one forwarder sees several hostnames.
content_typeNoResponse content type. HitKeep derives resource_type from this value.
response_msNoPositive response time in milliseconds.
bytes_servedNoPositive response byte count.

HitKeep derives assistant_name, assistant_family, and resource_type server-side. Do not send those fields as your durable contract.

Use a real site ID and an API client token with a site grant:

Terminal window
curl -i -X POST "https://analytics.example.com/api/sites/YOUR_SITE_ID/ingest/ai-fetch" \
-H "Authorization: Bearer YOUR_API_CLIENT_TOKEN" \
-H "Content-Type: application/json" \
--data '{
"path": "/docs/",
"hostname": "www.example.com",
"status_code": 200,
"content_type": "text/html",
"response_ms": 120,
"bytes_served": 18422,
"user_agent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)"
}'

A successful ingest returns 202 Accepted.

If you receive 400 user_agent must match a known AI bot, the row is not an AI crawler row HitKeep currently recognizes. Filter it out or update HitKeep if a new AI crawler needs first-class classification.

HitKeep classifies common AI crawler user-agent tokens, including:

FamilyExample tokens
OpenAIGPTBot, ChatGPT-User
AnthropicClaudeBot, Claude-Web
PerplexityPerplexityBot
GoogleGoogle-Extended, GoogleOther, Google-Safety
AppleApplebot-Extended
MetaMeta-ExternalAgent, Meta-ExternalFetcher
AmazonAmazonbot
Common CrawlCCBot
Other supported crawlersBytespider, Cohere, YouBot, AI2Bot, Diffbot, Timpibot, ImagesiftBot, DeepSeekBot, PetalBot

The exact classifier lives in the HitKeep runtime. Treat this table as the current public contract for integrations, not as a replacement for checking the ingest response.

Every forwarder follows the same shape:

  1. Read one request from an access log, edge event, or middleware hook.
  2. Keep only known AI crawler user agents.
  3. Normalize the request target to path plus query string.
  4. Map status, content type, latency, and bytes into the HitKeep payload.
  5. POST the payload to HitKeep with a site-granted API client token.
  6. Retry transient 5xx or network errors with a bounded retry policy.
  7. Do not retry permanent 4xx validation errors without changing the payload.

Forwarders should not send raw visitor IP addresses. The AI fetch endpoint does not accept or store them.

This example shows the platform-neutral part. Replace readLogRows() with your own source: an nginx log parser, CDN log drain, edge function event, or app-server middleware.

const aiBotTokens = [
"chatgpt-user",
"gptbot",
"claudebot",
"claude-web",
"perplexitybot",
"google-extended",
"googleother",
"google-safety",
"applebot-extended",
"bytespider",
"ccbot",
"meta-externalagent",
"meta-externalfetcher",
"amazonbot",
"cohere-ai",
"youbot",
"ai2bot",
"diffbot",
"timpibot",
"imagesiftbot",
"deepseekbot",
"petalbot",
];
const hitkeepBaseUrl = process.env.HITKEEP_BASE_URL.replace(/\/+$/, "");
const hitkeepSiteId = process.env.HITKEEP_SITE_ID;
const hitkeepToken = process.env.HITKEEP_API_TOKEN;
function isAIBot(userAgent) {
const normalized = (userAgent || "").toLowerCase();
return aiBotTokens.some((token) => normalized.includes(token));
}
function toPathWithQuery(rawUrl) {
if (!rawUrl) return "/";
try {
const parsed = new URL(rawUrl, "https://placeholder.invalid");
return `${parsed.pathname}${parsed.search}`;
} catch {
return rawUrl.startsWith("/") ? rawUrl : "/";
}
}
async function postToHitKeep(record) {
const response = await fetch(`${hitkeepBaseUrl}/api/sites/${hitkeepSiteId}/ingest/ai-fetch`, {
method: "POST",
headers: {
Authorization: `Bearer ${hitkeepToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify(record),
});
if (!response.ok) {
throw new Error(`HitKeep AI fetch ingest failed: ${response.status} ${await response.text()}`);
}
}
async function forwardRows(rows) {
for (const row of rows) {
if (!isAIBot(row.userAgent)) continue;
await postToHitKeep({
path: toPathWithQuery(row.url),
hostname: row.hostname,
status_code: row.statusCode,
content_type: row.contentType,
response_ms: row.responseMs,
bytes_served: row.bytesServed,
user_agent: row.userAgent,
});
}
}

Keep the token in an environment variable or secrets manager. Do not embed it in browser JavaScript.

SourceGood mapping
nginx access log$request_uri, $host, $status, $sent_http_content_type, $request_time, $bytes_sent, $http_user_agent
Caddy access logrequest.uri, request.host, status, response content type header, duration, size, request.headers.User-Agent
Apache access logrequest path, %>s, %b, %{User-agent}i, plus content type and duration if included in your log format
CDN logsURI stem/query, host header, status, content type, edge/origin duration, response bytes, user agent
App middlewarerequest URL, host, response status, response content type, measured duration, response bytes if available, user agent

You do not need a perfect first version. path, status_code, and user_agent are enough to start seeing fetch volume and error patterns. Add content type, latency, and byte counts when your log source provides them reliably.

  1. Open AI Visibility for the site.
  2. Select a date range that includes the forwarder runtime.
  3. Check total fetches, top assistants, top paths, and error paths.
  4. Use the assistant and resource-type filters to confirm classification.
  5. Open the correlation section after normal AI-referred visits arrive through hk.js.

For direct API checks:

Terminal window
curl "https://analytics.example.com/api/sites/YOUR_SITE_ID/ai-fetch/overview" \
-H "Authorization: Bearer YOUR_API_CLIENT_TOKEN"
SymptomLikely cause
401 UnauthorizedMissing or invalid bearer token.
403 ForbiddenToken has no site grant, or the grant does not include site.manage_data.
404 Site not foundThe site ID is wrong or belongs to a site the token cannot access.
400 user_agent must match a known AI botThe forwarder sent a non-AI crawler or an unsupported AI crawler token.
Rows arrive, but correlation is emptyAI crawler fetches exist, but matching AI-referred human visits have not arrived through the browser tracker for the same paths and window.
Timestamps look delayedThe endpoint records HitKeep ingest time. Forward CDN or batch logs promptly.