Skip to content

Traffic moderation and quality (lead_submissions)

This document describes the traffic moderation system for marketing teams: how to find bad traffic, suspicious visitors/campaigns/referers, and when to consider blocking sources or visitors. All metrics are computed from the lead_submissions Analytics Engine dataset (3‑month retention).

Data at a glance

Each row in lead_submissions is one form submission attempt. Key fields used for moderation:

ConceptFieldDescription
Identityproject, source, referer, cta_name, visitor_id, lead_id, environment, UTM*, session_id, contact_countrySlicing and grouping
Outcomeprimary_or_followup, duplicate, bot_lead, event_typeprimary | follow_up | test | blocked | bot
Scoressuspicion_score (0–100), engagement_score (0–100), vpn_scoreHigher = more suspicious / more VPN-like
Behaviortime_to_submit (sec), pages_visited, session_duration, return_visits, form_submit_count_24hAbuse signals

Details: Analytics Engine schema. For a per-submission quality label (GOOD_LEAD, LOW_INTENT, SUSPICIOUS, BOT_LIKELY), see Traffic quality scoring.


1. Moderation queries (suspicious patterns)

Use these to detect bad traffic, not yet to block.

QueryPurpose
High bot rate by sourceSources where a large % of submissions are bot (e.g. >30%).
High bot rate by campaignUTM campaigns with high bot share.
High suspicion by sourceSources with high average suspicion_score (e.g. >50).
Repeat submissions by visitorSame visitor_id submitting many times (e.g. ≥3).
Suspicious referersReferers with high bot rate or high suspicion.
VPN-heavy by sourceSources with high average vpn_score.
Fast submitsSubmissions with time_to_submit < 3 seconds (likely automated).
High duplicate rate by sourceSources where many submissions are duplicates.

API: GET /analytics/moderation/high-bot-by-source, .../high-bot-by-campaign, .../high-suspicion-by-source, .../repeat-submissions-by-visitor, .../suspicious-referers, .../vpn-heavy-by-source, .../fast-submits, .../high-duplicate-rate-by-source (params: days, project, source, min_submissions).


2. Visitor abuse detection

Find visitors that should be reviewed or blocked:

  • High form_submit_count_24h — same visitor submitting many forms in 24h.
  • High suspicion_score — fingerprint/behavior looks automated.
  • High vpn_score — traffic looks like VPN/proxy.
  • Repeated bot submissions — multiple bot leads from same visitor_id.

API: GET /analytics/moderation/visitor-abuse (params: days, min_suspicion, min_vpn, min_form_submit_24h, min_bot_count). Response lists visitor_id, submissions, bot_count, avg_suspicion, avg_vpn_score, max_form_submit_count_24h.


3. Campaign quality monitoring

Find campaigns or UTMs that produce low-quality traffic:

  • High bot traffic — high % of submissions with event_type = bot or bot_lead = 1.
  • Low engagement_score — visitors not engaging before submit.
  • High duplicate rate — many follow-up submissions, few new leads.

API: GET /analytics/moderation/campaign-quality (params: days, project, source, min_submissions). Response includes utm_campaign, utm_source, utm_medium, submissions, bot_rate_pct, duplicate_rate_pct, avg_engagement_score, avg_suspicion.


4. Country risk detection

Find countries (user_country / contact_country) with:

  • Abnormal bot rate — higher than your baseline.
  • High vpn_score traffic — lots of VPN/proxy traffic from that country.
  • Low engagement — low average engagement_score.

API: GET /analytics/moderation/country-risk (params: days, project, source, min_submissions).


5. Moderation dashboard

Dashboard-oriented aggregates:

  • Suspicious visitors — count of visitors meeting abuse thresholds.
  • Suspicious campaigns — campaigns with high bot or low engagement.
  • Suspicious referers — referers with high bot or suspicion.
  • High bot rate sources — number of sources above a bot-rate threshold (e.g. 30%).
  • Visitors submitting multiple forms — count of visitor_ids with ≥2 (or ≥3) submissions in the period.

API: GET /analytics/moderation/dashboard-summary (params: days, bot_rate_threshold_pct, suspicion_threshold, min_submissions_per_visitor).


6. Suggested block candidates

A rule-based recommendation for visitors to consider blocking. Default rule:

  • IF
    suspicion_score > 80
    AND form_submit_count_24h > 5
    AND vpn_score > 70
  • THEN mark visitor as BLOCK_CANDIDATE.

Interpretation: the visitor is highly suspicious, submitted many forms in 24h, and has high VPN likelihood. You can still decide to block by IP or fingerprint in your system; this only flags candidates.

API: GET /analytics/moderation/block-candidates (params: days, project, source; optional min_suspicion_score, min_form_submit_count_24h, min_vpn_score to override the rule).


7. Traffic quality metrics (overall)

Single-number metrics for the selected period (and filters):

MetricDescription
Bot rate% of submissions that are bot (event_type=bot or bot_lead=1).
Duplicate rate% of submissions that are duplicate (same person/project).
Average suspicion scoreWeighted average of suspicion_score (0–100).
Average engagement scoreWeighted average of engagement_score (0–100).
Real lead rate% of submissions that are primary or follow_up and not bot/test/blocked.

API: GET /analytics/moderation/traffic-quality (params: days, project, source).


AI moderation report

An AI-generated moderation report is available for the last 24 hours or last 7 days. The Worker builds an aggregated summary (totals, top suspicious sources/campaigns/referers/countries, high-risk visitors) and sends it to Cloudflare Worker AI (Llama 3 8B) to produce a human-readable report.

  • Endpoint: GET /analytics/moderation/ai-report
  • Query params: period=last_24_hours | last_7_days (default: last_24_hours), optional project, source, skip_ai=true (summary only, no AI).
  • Auth: Same as other analytics endpoints (Bearer or X-API-Key).
  • Response: { period, summary, ai_report: { summary, issues, campaign_problems, visitor_abuse, recommended_actions, blocking_recommendations }, generated_at }. Only aggregated data is sent to the AI; no raw submission rows.

See AI moderation report for details.


AI Analytics Assistant (Ask)

You can ask natural language questions about traffic and lead quality (e.g. “Which campaigns produce the most bots?”). The system plans a query (predefined types only, no SQL), runs the matching moderation/aggregation query, then uses AI to summarize the result.

  • Endpoint: POST /analytics/ask
  • Body: { "question": "Which campaigns produce the most bots?" }
  • Response: { answer, insights, recommended_actions, possible_blocks, query_plan }

See AI Analytics Assistant for the full API and allowed question types.


8. Data aggregations

Pre-grouped views for slicing by dimension. Use these to explore where bad traffic comes from.

AggregationGroup byTypical use
By sourceblob2 (source)Submissions, bot_rate_pct, duplicate_rate_pct, avg_suspicion, avg_engagement.
By UTM campaignutm_campaign, utm_sourceCampaign-level quality.
By refererrefererWhich pages send suspicious traffic.
By countryuser_countryGeo risk.
By visitor_idvisitor_idPer-visitor submission count, bot count, scores.

API: GET /analytics/moderation/aggregations/by-source, .../by-utm-campaign, .../by-referer, .../by-country, .../by-visitor (params: days, project, source, optional limit for by-visitor).


Dashboard (Moderation UI)

The PulseGate dashboard has a Moderation page (sidebar under Leads) that surfaces the same data for non-technical users:

  • Overview — Traffic quality metrics (total submissions, bot rate, duplicate rate, avg suspicion, real lead rate) and moderation summary (suspicious visitors/campaigns/referers, high-bot sources, multi-form visitors).
  • Suspicious traffic — Tables: high bot by source, suspicious referers, visitor abuse candidates, campaign quality, country risk.
  • Block candidates — Visitors matching the rule (suspicion > 80, form_submit_24h > 5, vpn > 70).
  • Aggregations — By source, by UTM campaign, by referer, by country.

Use the Filters (period, project, source) to narrow the window. The page uses the Analytics Engine SQL API via the Worker; if the Worker is not configured with Analytics Engine, a message explains that moderation data is unavailable.

If moderation returns 503 but lead-stats works

Both /analytics/engine/lead-stats and /analytics/moderation/* use the same Analytics Engine and the same Worker env. If lead-stats returns data, the Engine is configured. A 503 on a moderation endpoint usually means the Analytics Engine SQL API rejected that specific query. The response body is JSON with an error field containing the API message (e.g. syntax or unsupported function). Check the network tab for the full error. Moderation queries use safe division (no divide-by-zero) for percentages and weighted averages; if you still see 503, the body will indicate the actual cause.


What to do next: blocking visitors

Moderation surfaces who to consider blocking (block candidates, visitor abuse, suspicious traffic). Actual blocking in PulseGate today works like this:

What exists todayHow it works
Block candidates / Visitor abuseLists visitor_id (and source, scores). Tells you who is risky.
BlockingBy IP only. BlockedIPs table; at lead submit we reject if the request IP is in that list. Dashboard: IP & access → Blocked IPs (add/remove). API: POST /analytics/blocked-ips with { ip, reason? }, DELETE /analytics/blocked-ips/:ip.

So: block candidates give you visitor_id; blocking is by IP. There is no automatic link from a visitor_id to an IP in the Moderation UI.

Steps to block someone today

  1. Decide who to block
    Use Moderation → Block candidates (or Visitor abuse / Suspicious traffic) and note the visitor_id (and source if helpful).

  2. Resolve visitor_id → IP

    • Option A: In the dashboard, open Leads (or Lead Submissions), search or filter by that visitor_id if your leads list supports it, and read the Contact IP from the lead/submission.
    • Option B (future): Add an API that returns “last known IP(s) for this visitor_id” from D1 (e.g. from LeadSubmissions / Leads by VisitorId), then the Moderation UI could show an IP or a “Block this IP” action.
  3. Block the IP

    • Dashboard: Go to IP & access → Blocked IPs, add the IP and optionally a reason.
    • API: POST /analytics/blocked-ips with body { "ip": "<address>", "reason": "Block candidate from moderation" }.
  4. Result
    Future lead submissions from that IP are rejected (no D1 insert, no Make.com); they are logged and can be written to Analytics Engine as event_type: blocked for reporting.

Optional product work (to make blocking easier)

ImprovementWhat it would do
Look up IP for visitor_idNew API (e.g. from D1) that returns last/full list of IPs seen for a given visitor_id. Moderation UI could show “Last IP” and a Block this IP button that calls existing POST /analytics/blocked-ips.
Block by visitor_idNew store (e.g. BlockedVisitors by visitor_id). At lead submit, reject if visitor_id is in that list (in addition to IP check). Moderation UI could have Block visitor that adds the visitor_id so future submits from that fingerprint are rejected even if IP changes.

So: no code change is required to decide who to block (you already have block candidates and traffic quality). To actually block them today you resolve visitor_id → IP (via Leads/dashboard or future API) and add the IP to Blocked IPs. The table above is what you’d do next in code if you want one-click blocking from the Moderation page.


How to use this as a marketing user

  1. Weekly: Open the moderation dashboard and check Traffic quality (bot rate, duplicate rate, real lead rate). If bot rate jumps, drill into High bot by source and High bot by campaign.
  2. When considering blocking a source or campaign: Use Campaign quality and Country risk to see if the problem is concentrated in certain UTMs or countries. Then use Suspicious referers to see which pages send that traffic.
  3. When considering blocking a visitor or IP: Use Block candidates for the strict rule (suspicion + form_submit_24h + VPN). Use Visitor abuse with lower thresholds to see a broader list. Use Repeat submissions by visitor to find multi-submitters.
  4. Aggregations: Use By source, By UTM campaign, By referer, By country to compare quality across dimensions and decide where to pause or block.

All queries respect retention (3 months) and optional filters: days, project, source. Minimum submission thresholds (e.g. min_submissions=5) avoid noise from very low-volume segments.