Traffic moderation and quality (lead_submissions)

This document describes the traffic moderation system for marketing teams: how to find bad traffic, suspicious visitors/campaigns/referers, and when to consider blocking sources or visitors. All metrics are computed from the lead_submissions Analytics Engine dataset (3‑month retention).

Data at a glance

Each row in lead_submissions is one form submission attempt. Key fields used for moderation:

Concept	Field	Description
Identity	project, source, referer, cta_name, visitor_id, lead_id, environment, UTM*, session_id, contact_country	Slicing and grouping
Outcome	primary_or_followup, duplicate, bot_lead, event_type	primary \| follow_up \| test \| blocked \| bot
Scores	suspicion_score (0–100), engagement_score (0–100), vpn_score	Higher = more suspicious / more VPN-like
Behavior	time_to_submit (sec), pages_visited, session_duration, return_visits, form_submit_count_24h	Abuse signals

Details: Analytics Engine schema. For a per-submission quality label (GOOD_LEAD, LOW_INTENT, SUSPICIOUS, BOT_LIKELY), see Traffic quality scoring.

1. Moderation queries (suspicious patterns)

Use these to detect bad traffic, not yet to block.

Query	Purpose
High bot rate by source	Sources where a large % of submissions are bot (e.g. >30%).
High bot rate by campaign	UTM campaigns with high bot share.
High suspicion by source	Sources with high average suspicion_score (e.g. >50).
Repeat submissions by visitor	Same visitor_id submitting many times (e.g. ≥3).
Suspicious referers	Referers with high bot rate or high suspicion.
VPN-heavy by source	Sources with high average vpn_score.
Fast submits	Submissions with time_to_submit < 3 seconds (likely automated).
High duplicate rate by source	Sources where many submissions are duplicates.

API: GET /analytics/moderation/high-bot-by-source, .../high-bot-by-campaign, .../high-suspicion-by-source, .../repeat-submissions-by-visitor, .../suspicious-referers, .../vpn-heavy-by-source, .../fast-submits, .../high-duplicate-rate-by-source (params: days, project, source, min_submissions).

2. Visitor abuse detection

Find visitors that should be reviewed or blocked:

High form_submit_count_24h — same visitor submitting many forms in 24h.
High suspicion_score — fingerprint/behavior looks automated.
High vpn_score — traffic looks like VPN/proxy.
Repeated bot submissions — multiple bot leads from same visitor_id.

API: GET /analytics/moderation/visitor-abuse (params: days, min_suspicion, min_vpn, min_form_submit_24h, min_bot_count). Response lists visitor_id, submissions, bot_count, avg_suspicion, avg_vpn_score, max_form_submit_count_24h.

3. Campaign quality monitoring

Find campaigns or UTMs that produce low-quality traffic:

High bot traffic — high % of submissions with event_type = bot or bot_lead = 1.
Low engagement_score — visitors not engaging before submit.
High duplicate rate — many follow-up submissions, few new leads.

API: GET /analytics/moderation/campaign-quality (params: days, project, source, min_submissions). Response includes utm_campaign, utm_source, utm_medium, submissions, bot_rate_pct, duplicate_rate_pct, avg_engagement_score, avg_suspicion.

4. Country risk detection

Find countries (user_country / contact_country) with:

Abnormal bot rate — higher than your baseline.
High vpn_score traffic — lots of VPN/proxy traffic from that country.
Low engagement — low average engagement_score.

API: GET /analytics/moderation/country-risk (params: days, project, source, min_submissions).

5. Moderation dashboard

Dashboard-oriented aggregates:

Suspicious visitors — count of visitors meeting abuse thresholds.
Suspicious campaigns — campaigns with high bot or low engagement.
Suspicious referers — referers with high bot or suspicion.
High bot rate sources — number of sources above a bot-rate threshold (e.g. 30%).
Visitors submitting multiple forms — count of visitor_ids with ≥2 (or ≥3) submissions in the period.

API: GET /analytics/moderation/dashboard-summary (params: days, bot_rate_threshold_pct, suspicion_threshold, min_submissions_per_visitor).

6. Suggested block candidates

A rule-based recommendation for visitors to consider blocking. Default rule:

IF
suspicion_score > 80
AND form_submit_count_24h > 5
AND vpn_score > 70
THEN mark visitor as BLOCK_CANDIDATE.

Interpretation: the visitor is highly suspicious, submitted many forms in 24h, and has high VPN likelihood. You can still decide to block by IP or fingerprint in your system; this only flags candidates.

API: GET /analytics/moderation/block-candidates (params: days, project, source; optional min_suspicion_score, min_form_submit_count_24h, min_vpn_score to override the rule).

7. Traffic quality metrics (overall)

Single-number metrics for the selected period (and filters):

Metric	Description
Bot rate	% of submissions that are bot (event_type=bot or bot_lead=1).
Duplicate rate	% of submissions that are duplicate (same person/project).
Average suspicion score	Weighted average of suspicion_score (0–100).
Average engagement score	Weighted average of engagement_score (0–100).
Real lead rate	% of submissions that are primary or follow_up and not bot/test/blocked.

API: GET /analytics/moderation/traffic-quality (params: days, project, source).

AI moderation report

An AI-generated moderation report is available for the last 24 hours or last 7 days. The Worker builds an aggregated summary (totals, top suspicious sources/campaigns/referers/countries, high-risk visitors) and sends it to Cloudflare Worker AI (Llama 3 8B) to produce a human-readable report.

Endpoint: GET /analytics/moderation/ai-report
Query params: period=last_24_hours | last_7_days (default: last_24_hours), optional project, source, skip_ai=true (summary only, no AI).
Auth: Same as other analytics endpoints (Bearer or X-API-Key).
Response: { period, summary, ai_report: { summary, issues, campaign_problems, visitor_abuse, recommended_actions, blocking_recommendations }, generated_at }. Only aggregated data is sent to the AI; no raw submission rows.

See AI moderation report for details.

AI Analytics Assistant (Ask)

You can ask natural language questions about traffic and lead quality (e.g. “Which campaigns produce the most bots?”). The system plans a query (predefined types only, no SQL), runs the matching moderation/aggregation query, then uses AI to summarize the result.

Endpoint: POST /analytics/ask
Body: { "question": "Which campaigns produce the most bots?" }
Response: { answer, insights, recommended_actions, possible_blocks, query_plan }

See AI Analytics Assistant for the full API and allowed question types.

8. Data aggregations

Pre-grouped views for slicing by dimension. Use these to explore where bad traffic comes from.

Aggregation	Group by	Typical use
By source	blob2 (source)	Submissions, bot_rate_pct, duplicate_rate_pct, avg_suspicion, avg_engagement.
By UTM campaign	utm_campaign, utm_source	Campaign-level quality.
By referer	referer	Which pages send suspicious traffic.
By country	user_country	Geo risk.
By visitor_id	visitor_id	Per-visitor submission count, bot count, scores.

API: GET /analytics/moderation/aggregations/by-source, .../by-utm-campaign, .../by-referer, .../by-country, .../by-visitor (params: days, project, source, optional limit for by-visitor).

Dashboard (Moderation UI)

The PulseGate dashboard has a Moderation page (sidebar under Leads) that surfaces the same data for non-technical users:

Overview — Traffic quality metrics (total submissions, bot rate, duplicate rate, avg suspicion, real lead rate) and moderation summary (suspicious visitors/campaigns/referers, high-bot sources, multi-form visitors).
Suspicious traffic — Tables: high bot by source, suspicious referers, visitor abuse candidates, campaign quality, country risk.
Block candidates — Visitors matching the rule (suspicion > 80, form_submit_24h > 5, vpn > 70).
Aggregations — By source, by UTM campaign, by referer, by country.

Use the Filters (period, project, source) to narrow the window. The page uses the Analytics Engine SQL API via the Worker; if the Worker is not configured with Analytics Engine, a message explains that moderation data is unavailable.

If moderation returns 503 but lead-stats works

Both /analytics/engine/lead-stats and /analytics/moderation/* use the same Analytics Engine and the same Worker env. If lead-stats returns data, the Engine is configured. A 503 on a moderation endpoint usually means the Analytics Engine SQL API rejected that specific query. The response body is JSON with an error field containing the API message (e.g. syntax or unsupported function). Check the network tab for the full error. Moderation queries use safe division (no divide-by-zero) for percentages and weighted averages; if you still see 503, the body will indicate the actual cause.

What to do next: blocking visitors

Moderation surfaces who to consider blocking (block candidates, visitor abuse, suspicious traffic). Actual blocking in PulseGate today works like this:

What exists today	How it works
Block candidates / Visitor abuse	Lists visitor_id (and source, scores). Tells you who is risky.
Blocking	By IP only. `BlockedIPs` table; at lead submit we reject if the request IP is in that list. Dashboard: IP & access → Blocked IPs (add/remove). API: `POST /analytics/blocked-ips` with `{ ip, reason? }`, `DELETE /analytics/blocked-ips/:ip`.

So: block candidates give you visitor_id; blocking is by IP. There is no automatic link from a visitor_id to an IP in the Moderation UI.

Steps to block someone today

Decide who to block
Use Moderation → Block candidates (or Visitor abuse / Suspicious traffic) and note the visitor_id (and source if helpful).
Resolve visitor_id → IP
- Option A: In the dashboard, open Leads (or Lead Submissions), search or filter by that visitor_id if your leads list supports it, and read the Contact IP from the lead/submission.
- Option B (future): Add an API that returns “last known IP(s) for this visitor_id” from D1 (e.g. from LeadSubmissions / Leads by VisitorId), then the Moderation UI could show an IP or a “Block this IP” action.
Block the IP
- Dashboard: Go to IP & access → Blocked IPs, add the IP and optionally a reason.
- API: POST /analytics/blocked-ips with body { "ip": "<address>", "reason": "Block candidate from moderation" }.
Result
Future lead submissions from that IP are rejected (no D1 insert, no Make.com); they are logged and can be written to Analytics Engine as event_type: blocked for reporting.

Optional product work (to make blocking easier)

Improvement	What it would do
Look up IP for visitor_id	New API (e.g. from D1) that returns last/full list of IPs seen for a given visitor_id. Moderation UI could show “Last IP” and a Block this IP button that calls existing `POST /analytics/blocked-ips`.
Block by visitor_id	New store (e.g. `BlockedVisitors` by visitor_id). At lead submit, reject if `visitor_id` is in that list (in addition to IP check). Moderation UI could have Block visitor that adds the visitor_id so future submits from that fingerprint are rejected even if IP changes.

So: no code change is required to decide who to block (you already have block candidates and traffic quality). To actually block them today you resolve visitor_id → IP (via Leads/dashboard or future API) and add the IP to Blocked IPs. The table above is what you’d do next in code if you want one-click blocking from the Moderation page.

How to use this as a marketing user

Weekly: Open the moderation dashboard and check Traffic quality (bot rate, duplicate rate, real lead rate). If bot rate jumps, drill into High bot by source and High bot by campaign.
When considering blocking a source or campaign: Use Campaign quality and Country risk to see if the problem is concentrated in certain UTMs or countries. Then use Suspicious referers to see which pages send that traffic.
When considering blocking a visitor or IP: Use Block candidates for the strict rule (suspicion + form_submit_24h + VPN). Use Visitor abuse with lower thresholds to see a broader list. Use Repeat submissions by visitor to find multi-submitters.
Aggregations: Use By source, By UTM campaign, By referer, By country to compare quality across dimensions and decide where to pause or block.

All queries respect retention (3 months) and optional filters: days, project, source. Minimum submission thresholds (e.g. min_submissions=5) avoid noise from very low-volume segments.

Traffic moderation and quality (lead_submissions) ​

Data at a glance ​

1. Moderation queries (suspicious patterns) ​

2. Visitor abuse detection ​

3. Campaign quality monitoring ​

4. Country risk detection ​

5. Moderation dashboard ​

6. Suggested block candidates ​

7. Traffic quality metrics (overall) ​

AI moderation report ​

AI Analytics Assistant (Ask) ​

8. Data aggregations ​

Dashboard (Moderation UI) ​

If moderation returns 503 but lead-stats works ​

What to do next: blocking visitors ​

Steps to block someone today ​

Optional product work (to make blocking easier) ​

How to use this as a marketing user ​