Skip to content

Planning: Lead stats for marketing campaign focus

Implementation tracking: This work is scheduled as an important project. Phased breakdown and progress: lead-stats-project-phases.md. Roadmap: ROADMAP.md — Phase 11.

Goal: Support these analyses without building anything yet — only plan.

  1. Time-based volume: In which time windows (e.g. 30 or 60 minutes, day-wise or with filters) did the system receive how many leads?
  2. Breakdown by type: Of those, how many were actual leads, follow-up leads, bot leads, test leads, VPN, suspicious, hot?
  3. Breakdown by dimension: Stats project-wise, country-wise, city-wise, source-wise, and by other dimensions (referer, CTA, UTM, engagement, etc.) — so marketing can compare campaigns, geos, and channels.

This helps focus marketing campaigns (when to run ads, which segments perform, quality vs volume).


1. What we already store (current state)

1.1 Time of receipt

SourceColumn / fieldGranularityNotes
LeadsEnquiryDateTime (TEXT)Set at insert via getISTTime()IST (Asia/Kolkata). Format: toLocaleString('en-IN', { timeZone: 'Asia/Kolkata' }) (e.g. "7/3/2026, 3:45:00 pm") — not ISO.
LeadsCreatedAt (TEXT)SQL default datetime('now')UTC, ISO-like.
LeadSubmissionsSubmittedAt (TEXT)SQL default datetime('now')UTC. One row per submit (primary + follow-up).
TestingLeadsReceivedAt (TEXT)Set at insertISO-style strftime('%Y-%m-%dT%H:%M:%fZ', 'now') (UTC).
BlockedIPLeadsReceivedAt (TEXT)Set at insertSame as above (UTC).
Analytics Enginetimestamp (DateTime)Per submissionSet by Workers; used for day-level and hour-level queries (e.g. toStartOfHour(timestamp)).

So we do have timestamps for:

  • All submissions (LeadSubmissions, Analytics Engine).
  • Primary lead creation (Leads).
  • Test leads (TestingLeads).
  • Blocked/bot leads (BlockedIPLeads).

Gaps / caveats:

  • 30/60 minute buckets: Possible with D1 using SubmittedAt or EnquiryDateTime if we normalize to a parseable format and apply timezone. Analytics Engine already supports toStartOfHour(timestamp); 30-minute buckets would need a small expression (e.g. floor to 30 min).
  • Timezone: Leads use IST for EnquiryDateTime; LeadSubmissions and other tables use UTC. For “day-wise” in a specific timezone (e.g. IST), we need to either store timezone or derive day from a consistent reference (e.g. always IST day for reporting).
  • Day-wise filter: Today we have day-level in Analytics Engine (toDate(timestamp)). For D1, we can use date(SubmittedAt) or date(CreatedAt) (UTC) or parse EnquiryDateTime for IST day.

No new storage is strictly required for “in which 30/60 min or day we received how many leads” — we have the raw timestamps. What we may want is consistent reporting timezone and possibly explicit stored timezone for clarity.


2. Breakdown by “type” — what we have vs what we need

Definitions used below:

  • Actual lead: Primary, non-test, non-bot, stored in Leads.
  • Follow-up lead: A subsequent form submit for the same lead (same project + identity); stored as another row in LeadSubmissions with IsPrimaryLead = 0.
  • Bot lead: Honeypot-filled submit; when BOT_LEAD_DETECTION is on, not written to Leads/LeadSubmissions; stored in BlockedIPLeads (and IP in BlockedIPs).
  • Test lead: From IP whitelisted for “test”; stored in TestingLeads, not in Leads/LeadSubmissions.
  • VPN: Visitor/lookup flagged as VPN; on lead we have VpnSuspected (0/1) on Leads.
  • Suspicious: High fraud/suspicion score; we have SuspicionScore (0–100) and SuspicionReasons on Leads.
  • Hot: High engagement; we have EngagementLabel on Leads (e.g. Cold | Warm | Hot | Very Hot).

2.1 Where each type is stored

TypeStored inTime columnCount / filter
Actual (primary)LeadsEnquiryDateTime, CreatedAtOne row per primary lead.
Follow-upLeadSubmissions (IsPrimaryLead = 0)SubmittedAtCount by time bucket.
BotBlockedIPLeads (honeypot)ReceivedAtCount by time bucket.
TestTestingLeadsReceivedAtCount by time bucket.
VPNLeads.VpnSuspected (0/1)EnquiryDateTime / CreatedAtFilter / group by VpnSuspected.
SuspiciousLeads.SuspicionScore, SuspicionReasonsSameFilter by score band (e.g. ≥ 40).
HotLeads.EngagementLabelSameFilter EngagementLabel IN ('Hot','Very Hot').

So:

  • Actual, follow-up, VPN, suspicious, hot — we already store everything needed in Leads and LeadSubmissions. No new columns required for these classifications.
  • Bot — we have counts and time via BlockedIPLeads.ReceivedAt.
  • Test — we have counts and time via TestingLeads.ReceivedAt.

2.2 Cross-table view (one time window)

To get “in this 30/60 min or day, how many leads and how many by type?” we’d conceptually:

  1. Volume (all submissions in window):
    Count from LeadSubmissions by SubmittedAt (and optionally from Leads by EnquiryDateTime/CreatedAt for primary-only).
    Or use Analytics Engine: timestamp + blob5 (primary vs follow-up) for submission-level buckets.

  2. Primary (actual) leads in window:
    Leads where EnquiryDateTime or CreatedAt in window (with timezone handled).

  3. Follow-up in window:
    LeadSubmissions where IsPrimaryLead = 0 and SubmittedAt in window.

  4. Bot in window:
    BlockedIPLeads where ReceivedAt in window.

  5. Test in window:
    TestingLeads where ReceivedAt in window.

  6. VPN / suspicious / hot in window:
    Restrict the “primary leads in window” set to VpnSuspected = 1, or SuspicionScore >= threshold, or EngagementLabel IN ('Hot','Very Hot').

So we do maintain the data needed for these stats. What we don’t have yet is:

  • A single API or report that returns “per 30/60 min (or per day), total received + breakdown (actual, follow-up, bot, test, VPN, suspicious, hot)”.
  • Optional: a single, consistent reporting timezone (e.g. IST) for all “day” and “hour” buckets so marketing can compare apples to apples.

3. Optional: explicit “lead type” or reporting view

We don’t need new columns to compute the breakdown: it can be derived from existing tables and columns. If we want to simplify reporting or reporting performance later, we could:

  • Add a reporting view (or materialized aggregation) that precomputes counts by time bucket and type (actual, follow-up, bot, test, VPN, suspicious, hot), or
  • Add a denormalized “lead_type” or “submission_type” on LeadSubmissions / Analytics Engine (e.g. primary | follow_up | test | blocked_bot) so one query can group by type without joining multiple tables.

That’s an optimization; not required for the use case.


Yes — we can (and should) use Cloudflare Workers Analytics Engine as the primary store for these stats so that reporting and dashboards do not add load on D1. D1 stays for transactional and audit data; Analytics Engine handles time-bucketed counts and breakdowns.

4.1 What we already stream to Analytics Engine

We already write every accepted lead submission (primary and follow-up) to the lead_submissions dataset from src/lead.tswriteLeadSubmissionToAnalyticsEngine(). Each data point includes:

NeedAnalytics EngineAlready there?
Time bucket (hour / day)timestampYes — use toStartOfHour(timestamp) or toDate(timestamp).
30-min bucketExpression on timestampYes — e.g. floor to 30 min in SQL.
Primary vs follow-upblob5 ("1" / "0")Yes.
Engagement (Hot / Very Hot)blob15Yes.
VPNdouble8Yes.
Suspicious (score)double2Yes — filter e.g. double2 >= 40.
Projectblob1Yes.
Sourceblob2Yes.
Countryblob16 (user country from CF geo)Yes.
CityNo — not sent to Analytics Engine today (only in D1 / LeadPayload).
Referer, CTA, UTMblob3, blob4, blob1014Yes.

So time-bucketed volume, breakdown by type, and project-, source-, country-wise (and referer/CTA/UTM) stats can be served from Analytics Engine. City-wise stats would require either adding city to the Analytics Engine write (see §4.6) or querying D1 for city breakdown.

4.2 Gap: test leads and bot/blocked leads

Today we do not write to Analytics Engine when:

  • A test lead is received (whitelisted IP) — we only write to TestingLeads in D1.
  • A blocked IP or bot (honeypot) lead is rejected — we only write to BlockedIPLeads (and block the IP).

So “total received” and “by type: test, bot, blocked” would still require D1 (or a second dataset) unless we add events for these.

  • Add one extra dimension to the existing lead_submissions schema: event type (e.g. a new blob such as blob20) with values: primary | follow_up | test | blocked | bot.
  • Normal path: Keep current write; set this blob to primary or follow_up (from existing isPrimaryLead).
  • Test-lead path: After writing to TestingLeads, also call the Analytics Engine writer once with minimal fields and event_type = 'test' (project/source/referer from request; rest optional or empty).
  • Blocked / bot path: When we reject (blocked IP or honeypot), after writing to BlockedIPLeads, call the Analytics Engine writer once with event_type = 'blocked' or 'bot' and minimal fields.

Then a single dataset has every “received” event. All reporting queries (time buckets, breakdown by type) run only against Analytics Engine; D1 is not hit for stats. Schema change: one additional blob in the same dataset (Cloudflare creates schema on first write; new blobs can be added by starting to send more columns in writeDataPoint).

4.4 Option B: Analytics Engine for accepted only; D1 for test + blocked counts only

  • Use Analytics Engine for all time-bucketed stats for accepted submissions (primary, follow-up, VPN, suspicious, hot).
  • For “total received” and “bot / test” counts, run lightweight D1 queries only: e.g. COUNT(*) from TestingLeads and BlockedIPLeads by time range (with indexes on ReceivedAt). No heavy joins or full-table scans.

This reduces D1 load a lot (all heavy aggregation is off D1) but still uses D1 for test and blocked/bot counts.

4.5 Recommendation

  • Prefer Option A: Add event_type (or lead_type) and write test + blocked/bot events to the same Analytics Engine dataset. Then all lead-stats reporting (including “in which 30/60 min or day, how many total and how many by type”) is served from Analytics Engine with zero D1 load for this use case.
  • Retention: 3 months in Analytics Engine is acceptable for campaign tuning; D1 keeps full history for audit and compliance.
  • Implementation: When building the report/API, read only from Analytics Engine (and optionally from D1 for test/blocked if Option B is used). No implementation in this planning phase — this section only records the design choice.

4.6 Project-, country-, city-, source-wise (and other) stats

Marketing needs to slice the same time-bucketed counts and type breakdowns by dimension: project, country, city, source, and optionally referer, CTA, UTM campaign/source/medium, engagement label. All of this can be served from Analytics Engine without D1 if we have the dimensions in the dataset.

DimensionAnalytics Engine todayD1 / payloadFor stats from AE only
Projectblob1Leads.ProjectName, LeadSubmissions.ProjectName✓ Use blob1; filter/group by project.
Sourceblob2Leads.ProjectSource, LeadSubmissions.Source✓ Use blob2; filter/group by source.
Countryblob16 (user country from CF)Leads.ContactCountryName, payload user_country✓ Use blob16; filter/group by country.
CityNot sentLeads.ContactCity, payload user_cityAdd a new blob (e.g. user_city) to the Analytics Engine write so we can group/filter by city without querying D1.
RegionNot sentLeads.ContactRegion, payload user_regionOptional: add if region-wise stats are required; same pattern as city.
Refererblob3Leads.Referer, LeadSubmissions.Referer✓ Use blob3 (submission page URL; max 500 chars).
CTAblob4Form CTA name✓ Use blob4.
UTMblob1014 (source, medium, campaign, term, content)PayloadJson, form✓ Use for campaign/channel breakdown.
Engagement labelblob15Leads.EngagementLabel✓ Use blob15 (Cold / Warm / Hot / Very Hot).
Country code (phone)blob6Contact country code✓ Use blob6 for phone-region breakdown if needed.

Recommendation: When implementing the Analytics Engine–backed report, support project, country, source (and referer, CTA, UTM, engagement) from day one — they are already in the dataset. Add city (and optionally region) to the Analytics Engine writer (LeadSubmissionDataPoint + new blob(s)) so city-wise (and region-wise) stats can be computed from Analytics Engine with no D1 load. The payload already has user_city and user_region; we only need to pass them into writeLeadSubmissionToAnalyticsEngine and extend the blob array (e.g. blob20 = city, blob21 = region, or reserve blob20 for event_type and use blob21 = city, blob22 = region).


5. Summary: do we maintain such data, and what else might we need?

QuestionAnswer
Do we store “when” we received leads (30/60 min or day)?Yes. Leads.EnquiryDateTime (IST), Leads.CreatedAt, LeadSubmissions.SubmittedAt, TestingLeads.ReceivedAt, BlockedIPLeads.ReceivedAt, and Analytics Engine timestamp.
Can we get “how many leads in a given time range”?Yes. From Leads, LeadSubmissions, or Analytics Engine, with filters (e.g. date range, project, source).
Do we store “actual vs follow-up vs bot vs test vs VPN vs suspicious vs hot”?Yes. Actual/follow-up: Leads + LeadSubmissions.IsPrimaryLead. Bot: BlockedIPLeads. Test: TestingLeads. VPN: Leads.VpnSuspected. Suspicious: Leads.SuspicionScore. Hot: Leads.EngagementLabel.
What’s missing for the use case?(1) One place (API or dashboard) that returns time-bucketed volume + breakdown by type + by dimension (project, country, city, source, etc.). (2) Consistent timezone for “day” and “30/60 min” (e.g. IST) across all sources. (3) City (and optionally region) in Analytics Engine for city-wise stats without D1. (4) Optionally, a view or denormalized type for easier reporting.
Can we get stats project-, country-, city-, source-wise?Yes for project, source, country, referer, CTA, UTM — in Analytics Engine (blob1, blob2, blob16, blob3, blob4, blob10–14). City (and optionally region) must be added to the AE write for city-wise stats without D1 (§4.6).
Do we need to store anything new?For D1: no. For Analytics Engine–only reporting: add event_type and test/blocked/bot events; add city (and optionally region) to the AE writer so all dimension breakdowns come from AE (§4, §4.6).

  1. Define “day” and “time window” for reporting
    Decide: 30 vs 60 min buckets; day in IST vs UTC; and whether all stats (Leads, LeadSubmissions, TestingLeads, BlockedIPLeads) should use the same clock (e.g. all IST or all UTC with display in IST).

  2. Spec one report/API
    Document desired output: e.g. “for date D and optional project/source filter, return for each 60-min bucket (or each day): total submissions, primary count, follow-up count, bot count, test count, and among primary: VPN count, suspicious count (e.g. SuspicionScore ≥ 40), hot count (Hot + Very Hot).” No implementation yet.

  3. Use Analytics Engine as the source for this report
    Run all aggregation and filtering for this report against the lead_submissions dataset (SQL API or Worker proxy). If we add event_type and write test + blocked/bot events (§4), no D1 queries are needed for this use case.

  4. Optional: timezone and format
    If we want IST day and IST hour everywhere: either store a dedicated EnquiryDateIST / SubmittedAtIST or document that “day” is computed in app from timestamp using a fixed timezone (e.g. Asia/Kolkata). Analytics Engine SQL can do timezone conversion in queries.

  5. Later (implementation phase)
    Implement the report/API reading from Analytics Engine only; add event_type and test/blocked/bot writes (§4); add city (and optionally region) to the Analytics Engine writer (§4.6) so project-, country-, city-, and source-wise stats are all available from AE.


7. Using AI on this data to optimize team, campaign, and focus area

Once the lead-stats pipeline (time buckets, type breakdown, project/country/city/source dimensions) is in place, AI can use this data to help optimize team capacity, campaign mix, and focus areas. Below are use cases and how they map to the data — planning only; no implementation in this doc.

7.1 What data AI can consume

  • Aggregated stats from the report/API: per time window (e.g. 30/60 min or day), counts by type (actual, follow-up, bot, test, VPN, suspicious, hot) and by dimension (project, country, city, source, referer, CTA, UTM, engagement).
  • Trends over time: day-over-day, week-over-week, hour-of-day patterns.
  • Ratios and quality signals: e.g. actual vs bot rate, hot % by source, suspicion score distribution by country.

AI does not need raw PII or full payloads for optimization; aggregated or anonymized stats are enough and safer.

7.2 Use cases for team optimization

Use caseData usedHow AI helps
Staffing / capacityLead volume by hour and day, by project/sourceIdentify peak windows and busy projects; suggest when to have more people on follow-up or when capacity is underused.
Focus allocationVolume and quality (hot %, suspicion %) by project, source, countryRecommend which projects or geos deserve more team attention vs which are low-yield or high-fraud.
SLA / response timeTime-bucketed volume + (if added later) first-touch or response timestampsFlag hours or segments where volume spikes and suggest response-time targets or alerts.

7.3 Use cases for campaign optimization

Use caseData usedHow AI helps
Where to spendVolume, actual-lead %, hot % by source, UTM (campaign, medium), refererCompare channels and campaigns; recommend scaling winners (high volume + good quality) and pausing or fixing underperformers (low actual %, high bot/suspicious).
When to run adsLead volume and quality by hour-of-day and day-of-weekSuggest best time windows (e.g. 30/60 min slots or days) to run paid campaigns so spend aligns with when real leads come.
Geo focusVolume and quality by country, city (and region if added)Recommend which countries/cities to target or exclude (e.g. high bot or low conversion); suggest budget shift by geo.
Creative / CTAVolume and quality by CTA, referer (landing page)Identify which CTAs or pages drive more actual/hot leads; suggest A/B tests or retirement of weak CTAs.

7.4 Use cases for focus-area optimization

Use caseData usedHow AI helps
Priority segmentsAll dimensions + type breakdownRank segments (e.g. project × country × source) by volume and quality; output a short “focus list” (e.g. top 5 projects, top 3 countries) with reasons.
Anomaly / alertingTime-series volume and quality by segmentDetect sudden drops (e.g. no leads in a usually busy hour) or spikes in bot/suspicious % and suggest “investigate this segment” or “check campaign X”.
Simple forecastingHistorical volume by time and segmentRough “expected volume” for next day or week to set team expectations or campaign budgets (no need for heavy ML at first).

7.5 How to expose data to AI (options)

  • Option A — Pre-built report + copy/paste or export: The lead-stats API/dashboard exports tables or CSV (e.g. by day, by project, by source). Users paste or upload into an AI chat (e.g. Claude, ChatGPT, internal copilot) and ask: “Where should we focus?” or “Which campaigns should we scale?” No integration; manual but quick.
  • Option B — API + AI agent: An AI agent (or MCP tool) calls the lead-stats API with filters (date range, project, source, etc.), receives JSON, and answers natural-language questions or generates a short written summary (e.g. “This week, focus on Project X and Country Y; reduce spend on Source Z due to high bot rate.”). Integrates with your existing AI stack.
  • Option C — Scheduled digest: A job runs daily/weekly, fetches aggregated stats from the API, and sends a structured summary (or a prompt plus data) to an LLM; the LLM returns a “weekly optimization brief” (bullets: team focus, campaign changes, geo recommendations). Delivered via email, Slack, or dashboard.

Choose based on how much automation you want and whether you already have an AI assistant or MCP in the loop.

7.6 What to build first (data side)

For AI to be useful, the data foundation is the same as in §4–§4.6:

  1. Lead-stats API that returns time-bucketed counts + breakdown by type and by dimension (project, country, city, source, referer, CTA, UTM, engagement).
  2. Stable, documented response shape (e.g. JSON with clear field names) so prompts or agents can reliably say “by project”, “by country”, “actual_lead_count”, “bot_count”, etc.
  3. Optional: a small “summary” endpoint that returns a fixed set of top-level KPIs (e.g. last 7 days: total leads, actual %, hot %, top 5 projects, top 3 countries) so AI has one call for a quick overview.

No PII or raw events need to be exposed to AI; aggregates are enough for team/campaign/focus optimization and keep compliance simple.


8. References

  • Lead storage: docs/features/lead-capture.md, migrations/001_init.sql, 005_lead_scoring_columns.sql, 006_lead_fraud_detection.sql, 009_lead_submissions_history.sql, 012_lead_status.sql, 026_blocked_ip_leads.sql, 029_testing_leads.sql
  • Payload and analytics: docs/analytics/payloadjson.md, docs/analytics/analytics-engine.md
  • Lead scoring and fraud: docs/features/lead-scoring.md, docs/features/fraud-detection.md
  • Types: src/lead.ts (LeadPayload), src/types.ts (EngagementLabel)