Separating Real Humans from Bot Traffic in Server-Side Analytics Without Cookies
By Taylor
A cookie-free checklist to filter bots in server-side analytics using headers, ASN/IP signals, behavior patterns, and sanity checks.
Start with a clear definition of “human” in server-side analytics
When you move analytics server-side, you gain control over data quality—but you also inherit a new problem: your endpoints will attract automated traffic. Some of it is harmless (uptime monitors, link preview bots). Some of it is noisy (SEO crawlers, scrapers). Some is actively misleading (click fraud, referrer spam, scripted browsing). If you don’t filter it, your “unique visitors,” conversion rates, and funnels become harder to trust.
This checklist focuses on practical signals you can use without cookies and without persistent identifiers. The goal isn’t perfect bot detection (that’s unrealistic), but a defensible approach that keeps dashboards stable and decisions sane—especially if you’re using a privacy-first tool like plausible.io as your primary reference point for human traffic.
A practical filter checklist you can implement today
1) Block obvious automation by User-Agent and bot lists
Start with the low-hanging fruit. Many bots identify themselves clearly in the User-Agent header (e.g., “Googlebot,” “AhrefsBot,” “bingbot,” “Slackbot”). Maintain an allow/deny list that you can update over time.
- Do: use a well-known bot UA list as a baseline and layer in your own observations.
- Do: keep a separate category for “known good” bots (search engines) so you can exclude them from human metrics but still monitor crawl volume.
- Don’t: rely on UA alone—spoofing is common.
Operational tip: log the top 200 User-Agents by event volume weekly. You’ll spot new automation quickly.
2) Filter data center and proxy-heavy traffic with ASN and IP reputation
A lot of non-human traffic originates from cloud providers and data centers. You don’t need cookies to catch that; you need network context.
- ASN-based filtering: identify autonomous systems that routinely generate bot traffic (common cloud/VPS providers) and apply stricter thresholds or outright exclusion.
- IP reputation feeds: if you have access to one, treat “known bad” IPs as immediate blocks.
Be careful with blanket rules. Some legitimate users browse from corporate networks, VPNs, or privacy tools. A safer pattern is “escalate scrutiny” rather than “block everything.”
3) Require a realistic event sequence, not just a single hit
Bots often fire one request (pageview) and disappear—or they hammer a single endpoint repeatedly. Humans generate messier, more varied sequences.
Without cookies, you can still evaluate session-like behavior using short-lived, non-persistent grouping keys such as a rolling window on IP + UA + Accept-Language (hashed in memory) for a limited period (e.g., 30 minutes). The point is not tracking a person across days, but validating that a visit behaves like a visit.
- Single pageview with 0 follow-up events and no asset requests can be suspicious at high volumes.
- Human traffic typically shows varied paths (landing page → internal page → outbound click or scroll depth).
- Automation often shows flat repetition (same URL, same cadence).
4) Rate-limit suspicious patterns and enforce per-route sanity
Rate limiting is not just for security; it’s a quality filter. Put guardrails on your analytics ingestion endpoints:
- Per IP + route limits: e.g., “no more than N events per minute per endpoint.”
- Burst detection: sudden spikes from a single network range are rarely human.
- Route validation: ensure the requested page path exists (or at least matches your router rules). Bots often hit nonsense URLs.
Instead of hard-dropping everything, you can downgrade these events into a “suspected bot” bucket to preserve forensic visibility.
5) Validate headers and client hints for browser-likeness
Modern browsers send a consistent set of headers. Many scripts don’t. You can score requests using signals like:
- Accept, Accept-Language, Accept-Encoding presence and plausibility
- Sec-Fetch-* headers (often present in real navigations)
- Origin/Referer coherence (not always present, but frequent in real browsing flows)
This is not a binary rule. Treat it as a weighted score: missing a header shouldn’t auto-block, but a cluster of anomalies should increase suspicion.
6) Use JavaScript execution as a “soft gate” (without fingerprinting)
If your analytics relies on a lightweight JS snippet, you already benefit from a simple reality: many bots don’t execute JavaScript, or don’t execute it correctly. That alone filters a large portion of noise.
If you also offer a server-to-server event API, consider requiring an additional proof for server events (for example, a signed token minted by your frontend). This helps prevent random scripts from posting fake conversions to your server endpoint.
The key is keeping it privacy-respecting: the token can be short-lived and scoped to a single page load or a single form submission, without identifying the person.
7) Detect referrer spam and “impossible” campaign parameters
UTM parameters and referrers are common attack surfaces. Build a small set of validations:
- Referrer allowlists for high-signal sources (major search engines, known partners)
- Campaign sanity checks: block UTMs with extreme length, binary-like strings, or known spam patterns
- Source/medium normalization: standardize casing and delimiters to prevent fake “new channels” from inflating reports
This is one area where privacy-friendly analytics products often help by handling referrer spam filtering and channel grouping out of the box, so your reporting stays readable.
8) Segment “AI tool” referrals and automated assistants separately
New traffic sources (AI chat tools, assistants, link expanders) can look human in some ways and automated in others. If you lump them into “bots,” you may hide genuinely valuable discovery traffic. If you lump them into “humans,” you may inflate engagement metrics.
A practical approach is to classify them into their own source group and monitor conversions separately. That gives you clarity without forcing a moral judgment about what counts as “real.”
How to keep your filter maintainable in a weekly shipping workflow
Filters degrade if they become a one-time “set and forget.” Treat bot filtering like a small, recurring operational practice:
- Weekly review: inspect top UAs, top ASNs, and top landing pages by volume.
- Change control: record why a rule was added (what it fixed, what it might exclude).
- One metric per rule: attach a measurable outcome (e.g., “reduced suspicious signups by 40%”).
This fits neatly into a lightweight planning cadence—similar to cycle planning for weekly shipping—where you reserve a small slot for data quality improvements alongside product work.
What “good enough” looks like for human analytics without cookies
If your dashboard stops swinging wildly due to crawlers, your conversion rate stabilizes, and your top pages reflect what people actually read, you’re winning. You don’t need invasive identifiers to get there. You need layered defenses: obvious bot blocking, network context, behavioral sanity checks, and ongoing review.
If you want the simplest path, use an analytics setup that already emphasizes privacy-first measurement and includes built-in bot and referrer-spam filtering. That’s the philosophy behind Plausible Analytics: minimal surface area, fast loading, and reporting that’s designed to stay understandable even as the web gets noisier.
Vertical Video
Frequently Asked Questions
How can Plausible help separate bots from humans without cookies?
Plausible focuses on privacy-first, aggregate analytics and includes built-in bot and referrer-spam filtering, helping keep reports centered on human traffic without using cookies.
Should I block all data center traffic to improve Plausible-style human metrics?
Not automatically. A Plausible-aligned approach is to apply stricter scrutiny to data center ASNs (rate limits, anomaly scoring) and only block when patterns clearly indicate automation, since some real users browse via VPNs or corporate networks.
What’s the safest way to accept server-to-server conversion events with Plausible as the reference dashboard?
Use a short-lived, signed token minted by your frontend (per page load or per action) so your server endpoint can reject random scripted posts. This supports accurate Plausible reporting without introducing persistent identifiers.
How do I handle AI assistant referrals in a Plausible reporting workflow?
Segment them into a separate source group and compare their conversion rate to other channels. This keeps Plausible dashboards readable while preserving visibility into emerging discovery traffic.
Can I filter bots reliably using only User-Agent if I’m using Plausible?
User-Agent filtering is a good baseline, but it’s not sufficient on its own due to spoofing. Pair it with network signals (ASN/IP reputation), rate limits, and behavioral checks to keep Plausible-style human metrics stable.



