Traffic quality scoring (lead_submissions)

This document defines a traffic quality classification for lead submissions using the lead_submissions Analytics Engine schema. Each submission is classified into one of four labels: GOOD_LEAD, LOW_INTENT, SUSPICIOUS, BOT_LIKELY.

Schema fields used

Concept	Column	Type	Range / values	Description
suspicion_score	double2	0–100	double	Higher = more automated / risky
engagement_score	double3	0–100	double	Higher = more engaged before submit
time_to_submit	double4	seconds	double	Time from first touch to submit
pages_visited	double5	count	double	Pages seen in session
vpn_score	double8	0–100	double	VPN/proxy likelihood
duplicate	blob17	"1"/"0"	string	"1" = duplicate, "0" = new
bot_lead	blob19	"1"/"0"	string	"1" = bot lead, "0" = not

Reference: Analytics Engine schema.

Classification labels

Label	Meaning
BOT_LIKELY	Strong bot/automation signals; treat as non-human or test.
SUSPICIOUS	High risk (suspicion, VPN, very fast submit); review before treating as real lead.
LOW_INTENT	Duplicate or low engagement; lower priority or nurture.
GOOD_LEAD	Real, engaged, new lead; full value for pipeline.

Classification rules (evaluation order)

Rules are evaluated in order. The first matching rule wins. Thresholds are tunable; below are recommended defaults.

1. BOT_LIKELY

blob19 = '1' (explicit bot flag), OR
double2 (suspicion_score) ≥ 90, OR
double4 (time_to_submit) < 1 (submit in under 1 second) AND double5 (pages_visited) ≤ 1

Rationale: Explicit bot flag or extreme automation (very high suspicion or instant submit with no browsing).

2. SUSPICIOUS

double2 (suspicion_score) ≥ 70, OR
double8 (vpn_score) ≥ 70, OR
double4 (time_to_submit) < 3 (under 3 seconds) AND double5 (pages_visited) ≤ 2, OR
double2 (suspicion_score) ≥ 50 AND double8 (vpn_score) ≥ 50

Rationale: High suspicion or VPN, or very fast/low-browse submit, or combined medium suspicion + VPN.

3. LOW_INTENT

blob17 = '1' (duplicate), OR
double3 (engagement_score) < 30, OR
double5 (pages_visited) ≤ 1 AND double3 (engagement_score) < 50

Rationale: Duplicate = not a new lead; low engagement or single-page + low engagement = weak intent.

4. GOOD_LEAD (default)

If none of the above match: GOOD_LEAD.

Rationale: New, sufficiently engaged, not bot-like or suspicious.

Rule summary (pseudo-SQL)

text

CASE
  WHEN blob19 = '1' OR double2 >= 90 OR (double4 < 1 AND double5 <= 1)
    THEN 'BOT_LIKELY'
  WHEN double2 >= 70 OR double8 >= 70
       OR (double4 < 3 AND double5 <= 2)
       OR (double2 >= 50 AND double8 >= 50)
    THEN 'SUSPICIOUS'
  WHEN blob17 = '1' OR double3 < 30 OR (double5 <= 1 AND double3 < 50)
    THEN 'LOW_INTENT'
  ELSE 'GOOD_LEAD'
END

(In Analytics Engine SQL you would use nested if(cond, then, else) instead of CASE; see implementation notes below.)

Threshold reference (defaults)

Rule / condition	Field(s)	Default threshold
Bot flag	blob19	= '1'
Very high suspicion	double2	≥ 90
Instant submit + no browse	double4, double5	< 1 s, ≤ 1 page
High suspicion	double2	≥ 70
High VPN	double8	≥ 70
Fast submit + low browse	double4, double5	< 3 s, ≤ 2 pages
Suspicion + VPN combined	double2, double8	both ≥ 50
Duplicate	blob17	= '1'
Low engagement	double3	< 30
Single page + low engagement	double5, double3	≤ 1 page, < 50

Implementation notes

Analytics Engine: Use if(condition, 'BOT_LIKELY', if(condition, 'SUSPICIOUS', ...)) since CASE WHEN is not supported.
Null/missing: Treat missing numeric fields as 0 (or skip from condition) so rows still get a label.
Tuning: Adjust thresholds per product (e.g. lower suspicion for high-volume forms, stricter for high-value leads).
Aggregation: For dashboards, count or weight by _sample_interval and group by this label to get mix over time or by source/campaign.

Conceptual query shape (percentages via safe division as in moderation queries):

Select the classification expression as quality_label.
Filter timestamp >= NOW() - INTERVAL '30' DAY.
Group by quality_label.
Sum _sample_interval for counts; use if(SUM(_sample_interval) > 0, ..., 0.0) for any ratios.

This enables reporting such as: % GOOD_LEAD, % LOW_INTENT, % SUSPICIOUS, % BOT_LIKELY by day, source, or campaign.

Traffic quality scoring (lead_submissions) ​

Schema fields used ​

Classification labels ​

Classification rules (evaluation order) ​

1. BOT_LIKELY ​

2. SUSPICIOUS ​

3. LOW_INTENT ​

4. GOOD_LEAD (default) ​

Rule summary (pseudo-SQL) ​

Threshold reference (defaults) ​

Implementation notes ​

Example: share by quality (last 30 days) ​