Skip to content

Traffic quality scoring (lead_submissions)

This document defines a traffic quality classification for lead submissions using the lead_submissions Analytics Engine schema. Each submission is classified into one of four labels: GOOD_LEAD, LOW_INTENT, SUSPICIOUS, BOT_LIKELY.

Schema fields used

ConceptColumnTypeRange / valuesDescription
suspicion_scoredouble20–100doubleHigher = more automated / risky
engagement_scoredouble30–100doubleHigher = more engaged before submit
time_to_submitdouble4secondsdoubleTime from first touch to submit
pages_visiteddouble5countdoublePages seen in session
vpn_scoredouble80–100doubleVPN/proxy likelihood
duplicateblob17"1"/"0"string"1" = duplicate, "0" = new
bot_leadblob19"1"/"0"string"1" = bot lead, "0" = not

Reference: Analytics Engine schema.


Classification labels

LabelMeaning
BOT_LIKELYStrong bot/automation signals; treat as non-human or test.
SUSPICIOUSHigh risk (suspicion, VPN, very fast submit); review before treating as real lead.
LOW_INTENTDuplicate or low engagement; lower priority or nurture.
GOOD_LEADReal, engaged, new lead; full value for pipeline.

Classification rules (evaluation order)

Rules are evaluated in order. The first matching rule wins. Thresholds are tunable; below are recommended defaults.

1. BOT_LIKELY

  • blob19 = '1' (explicit bot flag), OR
  • double2 (suspicion_score) ≥ 90, OR
  • double4 (time_to_submit) < 1 (submit in under 1 second) AND double5 (pages_visited) ≤ 1

Rationale: Explicit bot flag or extreme automation (very high suspicion or instant submit with no browsing).

2. SUSPICIOUS

  • double2 (suspicion_score) ≥ 70, OR
  • double8 (vpn_score) ≥ 70, OR
  • double4 (time_to_submit) < 3 (under 3 seconds) AND double5 (pages_visited) ≤ 2, OR
  • double2 (suspicion_score) ≥ 50 AND double8 (vpn_score) ≥ 50

Rationale: High suspicion or VPN, or very fast/low-browse submit, or combined medium suspicion + VPN.

3. LOW_INTENT

  • blob17 = '1' (duplicate), OR
  • double3 (engagement_score) < 30, OR
  • double5 (pages_visited) ≤ 1 AND double3 (engagement_score) < 50

Rationale: Duplicate = not a new lead; low engagement or single-page + low engagement = weak intent.

4. GOOD_LEAD (default)

  • If none of the above match: GOOD_LEAD.

Rationale: New, sufficiently engaged, not bot-like or suspicious.


Rule summary (pseudo-SQL)

text
CASE
  WHEN blob19 = '1' OR double2 >= 90 OR (double4 < 1 AND double5 <= 1)
    THEN 'BOT_LIKELY'
  WHEN double2 >= 70 OR double8 >= 70
       OR (double4 < 3 AND double5 <= 2)
       OR (double2 >= 50 AND double8 >= 50)
    THEN 'SUSPICIOUS'
  WHEN blob17 = '1' OR double3 < 30 OR (double5 <= 1 AND double3 < 50)
    THEN 'LOW_INTENT'
  ELSE 'GOOD_LEAD'
END

(In Analytics Engine SQL you would use nested if(cond, then, else) instead of CASE; see implementation notes below.)


Threshold reference (defaults)

Rule / conditionField(s)Default threshold
Bot flagblob19= '1'
Very high suspiciondouble2≥ 90
Instant submit + no browsedouble4, double5< 1 s, ≤ 1 page
High suspiciondouble2≥ 70
High VPNdouble8≥ 70
Fast submit + low browsedouble4, double5< 3 s, ≤ 2 pages
Suspicion + VPN combineddouble2, double8both ≥ 50
Duplicateblob17= '1'
Low engagementdouble3< 30
Single page + low engagementdouble5, double3≤ 1 page, < 50

Implementation notes

  1. Analytics Engine: Use if(condition, 'BOT_LIKELY', if(condition, 'SUSPICIOUS', ...)) since CASE WHEN is not supported.
  2. Null/missing: Treat missing numeric fields as 0 (or skip from condition) so rows still get a label.
  3. Tuning: Adjust thresholds per product (e.g. lower suspicion for high-volume forms, stricter for high-value leads).
  4. Aggregation: For dashboards, count or weight by _sample_interval and group by this label to get mix over time or by source/campaign.

Example: share by quality (last 30 days)

Conceptual query shape (percentages via safe division as in moderation queries):

  • Select the classification expression as quality_label.
  • Filter timestamp >= NOW() - INTERVAL '30' DAY.
  • Group by quality_label.
  • Sum _sample_interval for counts; use if(SUM(_sample_interval) > 0, ..., 0.0) for any ratios.

This enables reporting such as: % GOOD_LEAD, % LOW_INTENT, % SUSPICIOUS, % BOT_LIKELY by day, source, or campaign.