Traffic quality scoring (lead_submissions)
This document defines a traffic quality classification for lead submissions using the lead_submissions Analytics Engine schema. Each submission is classified into one of four labels: GOOD_LEAD, LOW_INTENT, SUSPICIOUS, BOT_LIKELY.
Schema fields used
| Concept | Column | Type | Range / values | Description |
|---|---|---|---|---|
| suspicion_score | double2 | 0–100 | double | Higher = more automated / risky |
| engagement_score | double3 | 0–100 | double | Higher = more engaged before submit |
| time_to_submit | double4 | seconds | double | Time from first touch to submit |
| pages_visited | double5 | count | double | Pages seen in session |
| vpn_score | double8 | 0–100 | double | VPN/proxy likelihood |
| duplicate | blob17 | "1"/"0" | string | "1" = duplicate, "0" = new |
| bot_lead | blob19 | "1"/"0" | string | "1" = bot lead, "0" = not |
Reference: Analytics Engine schema.
Classification labels
| Label | Meaning |
|---|---|
| BOT_LIKELY | Strong bot/automation signals; treat as non-human or test. |
| SUSPICIOUS | High risk (suspicion, VPN, very fast submit); review before treating as real lead. |
| LOW_INTENT | Duplicate or low engagement; lower priority or nurture. |
| GOOD_LEAD | Real, engaged, new lead; full value for pipeline. |
Classification rules (evaluation order)
Rules are evaluated in order. The first matching rule wins. Thresholds are tunable; below are recommended defaults.
1. BOT_LIKELY
- blob19 = '1' (explicit bot flag), OR
- double2 (suspicion_score) ≥ 90, OR
- double4 (time_to_submit) < 1 (submit in under 1 second) AND double5 (pages_visited) ≤ 1
Rationale: Explicit bot flag or extreme automation (very high suspicion or instant submit with no browsing).
2. SUSPICIOUS
- double2 (suspicion_score) ≥ 70, OR
- double8 (vpn_score) ≥ 70, OR
- double4 (time_to_submit) < 3 (under 3 seconds) AND double5 (pages_visited) ≤ 2, OR
- double2 (suspicion_score) ≥ 50 AND double8 (vpn_score) ≥ 50
Rationale: High suspicion or VPN, or very fast/low-browse submit, or combined medium suspicion + VPN.
3. LOW_INTENT
- blob17 = '1' (duplicate), OR
- double3 (engagement_score) < 30, OR
- double5 (pages_visited) ≤ 1 AND double3 (engagement_score) < 50
Rationale: Duplicate = not a new lead; low engagement or single-page + low engagement = weak intent.
4. GOOD_LEAD (default)
- If none of the above match: GOOD_LEAD.
Rationale: New, sufficiently engaged, not bot-like or suspicious.
Rule summary (pseudo-SQL)
CASE
WHEN blob19 = '1' OR double2 >= 90 OR (double4 < 1 AND double5 <= 1)
THEN 'BOT_LIKELY'
WHEN double2 >= 70 OR double8 >= 70
OR (double4 < 3 AND double5 <= 2)
OR (double2 >= 50 AND double8 >= 50)
THEN 'SUSPICIOUS'
WHEN blob17 = '1' OR double3 < 30 OR (double5 <= 1 AND double3 < 50)
THEN 'LOW_INTENT'
ELSE 'GOOD_LEAD'
END(In Analytics Engine SQL you would use nested if(cond, then, else) instead of CASE; see implementation notes below.)
Threshold reference (defaults)
| Rule / condition | Field(s) | Default threshold |
|---|---|---|
| Bot flag | blob19 | = '1' |
| Very high suspicion | double2 | ≥ 90 |
| Instant submit + no browse | double4, double5 | < 1 s, ≤ 1 page |
| High suspicion | double2 | ≥ 70 |
| High VPN | double8 | ≥ 70 |
| Fast submit + low browse | double4, double5 | < 3 s, ≤ 2 pages |
| Suspicion + VPN combined | double2, double8 | both ≥ 50 |
| Duplicate | blob17 | = '1' |
| Low engagement | double3 | < 30 |
| Single page + low engagement | double5, double3 | ≤ 1 page, < 50 |
Implementation notes
- Analytics Engine: Use
if(condition, 'BOT_LIKELY', if(condition, 'SUSPICIOUS', ...))sinceCASE WHENis not supported. - Null/missing: Treat missing numeric fields as 0 (or skip from condition) so rows still get a label.
- Tuning: Adjust thresholds per product (e.g. lower suspicion for high-volume forms, stricter for high-value leads).
- Aggregation: For dashboards, count or weight by
_sample_intervaland group by this label to get mix over time or by source/campaign.
Example: share by quality (last 30 days)
Conceptual query shape (percentages via safe division as in moderation queries):
- Select the classification expression as
quality_label. - Filter
timestamp >= NOW() - INTERVAL '30' DAY. - Group by
quality_label. - Sum
_sample_intervalfor counts; useif(SUM(_sample_interval) > 0, ..., 0.0)for any ratios.
This enables reporting such as: % GOOD_LEAD, % LOW_INTENT, % SUSPICIOUS, % BOT_LIKELY by day, source, or campaign.