Planning: Lead stats for marketing campaign focus

Implementation tracking: This work is scheduled as an important project. Phased breakdown and progress: lead-stats-project-phases.md. Roadmap: ROADMAP.md — Phase 11.

Goal: Support these analyses without building anything yet — only plan.

Time-based volume: In which time windows (e.g. 30 or 60 minutes, day-wise or with filters) did the system receive how many leads?
Breakdown by type: Of those, how many were actual leads, follow-up leads, bot leads, test leads, VPN, suspicious, hot?
Breakdown by dimension: Stats project-wise, country-wise, city-wise, source-wise, and by other dimensions (referer, CTA, UTM, engagement, etc.) — so marketing can compare campaigns, geos, and channels.

This helps focus marketing campaigns (when to run ads, which segments perform, quality vs volume).

1. What we already store (current state)

1.1 Time of receipt

Source	Column / field	Granularity	Notes
Leads	`EnquiryDateTime` (TEXT)	Set at insert via `getISTTime()`	IST (Asia/Kolkata). Format: `toLocaleString('en-IN', { timeZone: 'Asia/Kolkata' })` (e.g. `"7/3/2026, 3:45:00 pm"`) — not ISO.
Leads	`CreatedAt` (TEXT)	SQL default `datetime('now')`	UTC, ISO-like.
LeadSubmissions	`SubmittedAt` (TEXT)	SQL default `datetime('now')`	UTC. One row per submit (primary + follow-up).
TestingLeads	`ReceivedAt` (TEXT)	Set at insert	ISO-style `strftime('%Y-%m-%dT%H:%M:%fZ', 'now')` (UTC).
BlockedIPLeads	`ReceivedAt` (TEXT)	Set at insert	Same as above (UTC).
Analytics Engine	`timestamp` (DateTime)	Per submission	Set by Workers; used for day-level and hour-level queries (e.g. `toStartOfHour(timestamp)`).

So we do have timestamps for:

All submissions (LeadSubmissions, Analytics Engine).
Primary lead creation (Leads).
Test leads (TestingLeads).
Blocked/bot leads (BlockedIPLeads).

Gaps / caveats:

30/60 minute buckets: Possible with D1 using SubmittedAt or EnquiryDateTime if we normalize to a parseable format and apply timezone. Analytics Engine already supports toStartOfHour(timestamp); 30-minute buckets would need a small expression (e.g. floor to 30 min).
Timezone: Leads use IST for EnquiryDateTime; LeadSubmissions and other tables use UTC. For “day-wise” in a specific timezone (e.g. IST), we need to either store timezone or derive day from a consistent reference (e.g. always IST day for reporting).
Day-wise filter: Today we have day-level in Analytics Engine (toDate(timestamp)). For D1, we can use date(SubmittedAt) or date(CreatedAt) (UTC) or parse EnquiryDateTime for IST day.

No new storage is strictly required for “in which 30/60 min or day we received how many leads” — we have the raw timestamps. What we may want is consistent reporting timezone and possibly explicit stored timezone for clarity.

2. Breakdown by “type” — what we have vs what we need

Definitions used below:

Actual lead: Primary, non-test, non-bot, stored in Leads.
Follow-up lead: A subsequent form submit for the same lead (same project + identity); stored as another row in LeadSubmissions with IsPrimaryLead = 0.
Bot lead: Honeypot-filled submit; when BOT_LEAD_DETECTION is on, not written to Leads/LeadSubmissions; stored in BlockedIPLeads (and IP in BlockedIPs).
Test lead: From IP whitelisted for “test”; stored in TestingLeads, not in Leads/LeadSubmissions.
VPN: Visitor/lookup flagged as VPN; on lead we have VpnSuspected (0/1) on Leads.
Suspicious: High fraud/suspicion score; we have SuspicionScore (0–100) and SuspicionReasons on Leads.
Hot: High engagement; we have EngagementLabel on Leads (e.g. Cold | Warm | Hot | Very Hot).

2.1 Where each type is stored

Type	Stored in	Time column	Count / filter
Actual (primary)	`Leads`	`EnquiryDateTime`, `CreatedAt`	One row per primary lead.
Follow-up	`LeadSubmissions` (`IsPrimaryLead = 0`)	`SubmittedAt`	Count by time bucket.
Bot	`BlockedIPLeads` (honeypot)	`ReceivedAt`	Count by time bucket.
Test	`TestingLeads`	`ReceivedAt`	Count by time bucket.
VPN	`Leads.VpnSuspected` (0/1)	`EnquiryDateTime` / `CreatedAt`	Filter / group by `VpnSuspected`.
Suspicious	`Leads.SuspicionScore`, `SuspicionReasons`	Same	Filter by score band (e.g. ≥ 40).
Hot	`Leads.EngagementLabel`	Same	Filter `EngagementLabel IN ('Hot','Very Hot')`.

So:

Actual, follow-up, VPN, suspicious, hot — we already store everything needed in Leads and LeadSubmissions. No new columns required for these classifications.
Bot — we have counts and time via BlockedIPLeads.ReceivedAt.
Test — we have counts and time via TestingLeads.ReceivedAt.

2.2 Cross-table view (one time window)

To get “in this 30/60 min or day, how many leads and how many by type?” we’d conceptually:

Volume (all submissions in window):
Count from LeadSubmissions by SubmittedAt (and optionally from Leads by EnquiryDateTime/CreatedAt for primary-only).
Or use Analytics Engine: timestamp + blob5 (primary vs follow-up) for submission-level buckets.
Primary (actual) leads in window:
Leads where EnquiryDateTime or CreatedAt in window (with timezone handled).
Follow-up in window:
LeadSubmissions where IsPrimaryLead = 0 and SubmittedAt in window.
Bot in window:
BlockedIPLeads where ReceivedAt in window.
Test in window:
TestingLeads where ReceivedAt in window.
VPN / suspicious / hot in window:
Restrict the “primary leads in window” set to VpnSuspected = 1, or SuspicionScore >= threshold, or EngagementLabel IN ('Hot','Very Hot').

So we do maintain the data needed for these stats. What we don’t have yet is:

A single API or report that returns “per 30/60 min (or per day), total received + breakdown (actual, follow-up, bot, test, VPN, suspicious, hot)”.
Optional: a single, consistent reporting timezone (e.g. IST) for all “day” and “hour” buckets so marketing can compare apples to apples.

3. Optional: explicit “lead type” or reporting view

We don’t need new columns to compute the breakdown: it can be derived from existing tables and columns. If we want to simplify reporting or reporting performance later, we could:

Add a reporting view (or materialized aggregation) that precomputes counts by time bucket and type (actual, follow-up, bot, test, VPN, suspicious, hot), or
Add a denormalized “lead_type” or “submission_type” on LeadSubmissions / Analytics Engine (e.g. primary | follow_up | test | blocked_bot) so one query can group by type without joining multiple tables.

That’s an optimization; not required for the use case.

4. Use Analytics Engine to avoid D1 load (recommended)

Yes — we can (and should) use Cloudflare Workers Analytics Engine as the primary store for these stats so that reporting and dashboards do not add load on D1. D1 stays for transactional and audit data; Analytics Engine handles time-bucketed counts and breakdowns.

4.1 What we already stream to Analytics Engine

We already write every accepted lead submission (primary and follow-up) to the lead_submissions dataset from src/lead.ts → writeLeadSubmissionToAnalyticsEngine(). Each data point includes:

Need	Analytics Engine	Already there?
Time bucket (hour / day)	`timestamp`	Yes — use `toStartOfHour(timestamp)` or `toDate(timestamp)`.
30-min bucket	Expression on `timestamp`	Yes — e.g. floor to 30 min in SQL.
Primary vs follow-up	`blob5` (`"1"` / `"0"`)	Yes.
Engagement (Hot / Very Hot)	`blob15`	Yes.
VPN	`double8`	Yes.
Suspicious (score)	`double2`	Yes — filter e.g. `double2 >= 40`.
Project	`blob1`	Yes.
Source	`blob2`	Yes.
Country	`blob16` (user country from CF geo)	Yes.
City	—	No — not sent to Analytics Engine today (only in D1 / LeadPayload).
Referer, CTA, UTM	`blob3`, `blob4`, `blob10`–`14`	Yes.

So time-bucketed volume, breakdown by type, and project-, source-, country-wise (and referer/CTA/UTM) stats can be served from Analytics Engine. City-wise stats would require either adding city to the Analytics Engine write (see §4.6) or querying D1 for city breakdown.

4.2 Gap: test leads and bot/blocked leads

Today we do not write to Analytics Engine when:

A test lead is received (whitelisted IP) — we only write to TestingLeads in D1.
A blocked IP or bot (honeypot) lead is rejected — we only write to BlockedIPLeads (and block the IP).

So “total received” and “by type: test, bot, blocked” would still require D1 (or a second dataset) unless we add events for these.

4.3 Option A: Add test and blocked/bot events to the same dataset (recommended)

Add one extra dimension to the existing lead_submissions schema: event type (e.g. a new blob such as blob20) with values: primary | follow_up | test | blocked | bot.
Normal path: Keep current write; set this blob to primary or follow_up (from existing isPrimaryLead).
Test-lead path: After writing to TestingLeads, also call the Analytics Engine writer once with minimal fields and event_type = 'test' (project/source/referer from request; rest optional or empty).
Blocked / bot path: When we reject (blocked IP or honeypot), after writing to BlockedIPLeads, call the Analytics Engine writer once with event_type = 'blocked' or 'bot' and minimal fields.

Then a single dataset has every “received” event. All reporting queries (time buckets, breakdown by type) run only against Analytics Engine; D1 is not hit for stats. Schema change: one additional blob in the same dataset (Cloudflare creates schema on first write; new blobs can be added by starting to send more columns in writeDataPoint).

4.4 Option B: Analytics Engine for accepted only; D1 for test + blocked counts only

Use Analytics Engine for all time-bucketed stats for accepted submissions (primary, follow-up, VPN, suspicious, hot).
For “total received” and “bot / test” counts, run lightweight D1 queries only: e.g. COUNT(*) from TestingLeads and BlockedIPLeads by time range (with indexes on ReceivedAt). No heavy joins or full-table scans.

This reduces D1 load a lot (all heavy aggregation is off D1) but still uses D1 for test and blocked/bot counts.

4.5 Recommendation

Prefer Option A: Add event_type (or lead_type) and write test + blocked/bot events to the same Analytics Engine dataset. Then all lead-stats reporting (including “in which 30/60 min or day, how many total and how many by type”) is served from Analytics Engine with zero D1 load for this use case.
Retention: 3 months in Analytics Engine is acceptable for campaign tuning; D1 keeps full history for audit and compliance.
Implementation: When building the report/API, read only from Analytics Engine (and optionally from D1 for test/blocked if Option B is used). No implementation in this planning phase — this section only records the design choice.

4.6 Project-, country-, city-, source-wise (and other) stats

Marketing needs to slice the same time-bucketed counts and type breakdowns by dimension: project, country, city, source, and optionally referer, CTA, UTM campaign/source/medium, engagement label. All of this can be served from Analytics Engine without D1 if we have the dimensions in the dataset.

Dimension	Analytics Engine today	D1 / payload	For stats from AE only
Project	`blob1`	`Leads.ProjectName`, `LeadSubmissions.ProjectName`	✓ Use blob1; filter/group by project.
Source	`blob2`	`Leads.ProjectSource`, `LeadSubmissions.Source`	✓ Use blob2; filter/group by source.
Country	`blob16` (user country from CF)	`Leads.ContactCountryName`, payload `user_country`	✓ Use blob16; filter/group by country.
City	Not sent	`Leads.ContactCity`, payload `user_city`	Add a new blob (e.g. `user_city`) to the Analytics Engine write so we can group/filter by city without querying D1.
Region	Not sent	`Leads.ContactRegion`, payload `user_region`	Optional: add if region-wise stats are required; same pattern as city.
Referer	`blob3`	`Leads.Referer`, `LeadSubmissions.Referer`	✓ Use blob3 (submission page URL; max 500 chars).
CTA	`blob4`	Form CTA name	✓ Use blob4.
UTM	`blob10`–`14` (source, medium, campaign, term, content)	PayloadJson, form	✓ Use for campaign/channel breakdown.
Engagement label	`blob15`	`Leads.EngagementLabel`	✓ Use blob15 (Cold / Warm / Hot / Very Hot).
Country code (phone)	`blob6`	Contact country code	✓ Use blob6 for phone-region breakdown if needed.

Recommendation: When implementing the Analytics Engine–backed report, support project, country, source (and referer, CTA, UTM, engagement) from day one — they are already in the dataset. Add city (and optionally region) to the Analytics Engine writer (LeadSubmissionDataPoint + new blob(s)) so city-wise (and region-wise) stats can be computed from Analytics Engine with no D1 load. The payload already has user_city and user_region; we only need to pass them into writeLeadSubmissionToAnalyticsEngine and extend the blob array (e.g. blob20 = city, blob21 = region, or reserve blob20 for event_type and use blob21 = city, blob22 = region).

5. Summary: do we maintain such data, and what else might we need?

Question	Answer
Do we store “when” we received leads (30/60 min or day)?	Yes. `Leads.EnquiryDateTime` (IST), `Leads.CreatedAt`, `LeadSubmissions.SubmittedAt`, `TestingLeads.ReceivedAt`, `BlockedIPLeads.ReceivedAt`, and Analytics Engine `timestamp`.
Can we get “how many leads in a given time range”?	Yes. From `Leads`, `LeadSubmissions`, or Analytics Engine, with filters (e.g. date range, project, source).
Do we store “actual vs follow-up vs bot vs test vs VPN vs suspicious vs hot”?	Yes. Actual/follow-up: `Leads` + `LeadSubmissions.IsPrimaryLead`. Bot: `BlockedIPLeads`. Test: `TestingLeads`. VPN: `Leads.VpnSuspected`. Suspicious: `Leads.SuspicionScore`. Hot: `Leads.EngagementLabel`.
What’s missing for the use case?	(1) One place (API or dashboard) that returns time-bucketed volume + breakdown by type + by dimension (project, country, city, source, etc.). (2) Consistent timezone for “day” and “30/60 min” (e.g. IST) across all sources. (3) City (and optionally region) in Analytics Engine for city-wise stats without D1. (4) Optionally, a view or denormalized type for easier reporting.
Can we get stats project-, country-, city-, source-wise?	Yes for project, source, country, referer, CTA, UTM — in Analytics Engine (blob1, blob2, blob16, blob3, blob4, blob10–14). City (and optionally region) must be added to the AE write for city-wise stats without D1 (§4.6).
Do we need to store anything new?	For D1: no. For Analytics Engine–only reporting: add `event_type` and test/blocked/bot events; add city (and optionally region) to the AE writer so all dimension breakdowns come from AE (§4, §4.6).

6. Recommended next steps (planning only)

Define “day” and “time window” for reporting
Decide: 30 vs 60 min buckets; day in IST vs UTC; and whether all stats (Leads, LeadSubmissions, TestingLeads, BlockedIPLeads) should use the same clock (e.g. all IST or all UTC with display in IST).
Spec one report/API
Document desired output: e.g. “for date D and optional project/source filter, return for each 60-min bucket (or each day): total submissions, primary count, follow-up count, bot count, test count, and among primary: VPN count, suspicious count (e.g. SuspicionScore ≥ 40), hot count (Hot + Very Hot).” No implementation yet.
Use Analytics Engine as the source for this report
Run all aggregation and filtering for this report against the lead_submissions dataset (SQL API or Worker proxy). If we add event_type and write test + blocked/bot events (§4), no D1 queries are needed for this use case.
Optional: timezone and format
If we want IST day and IST hour everywhere: either store a dedicated EnquiryDateIST / SubmittedAtIST or document that “day” is computed in app from timestamp using a fixed timezone (e.g. Asia/Kolkata). Analytics Engine SQL can do timezone conversion in queries.
Later (implementation phase)
Implement the report/API reading from Analytics Engine only; add event_type and test/blocked/bot writes (§4); add city (and optionally region) to the Analytics Engine writer (§4.6) so project-, country-, city-, and source-wise stats are all available from AE.

7. Using AI on this data to optimize team, campaign, and focus area

Once the lead-stats pipeline (time buckets, type breakdown, project/country/city/source dimensions) is in place, AI can use this data to help optimize team capacity, campaign mix, and focus areas. Below are use cases and how they map to the data — planning only; no implementation in this doc.

7.1 What data AI can consume

Aggregated stats from the report/API: per time window (e.g. 30/60 min or day), counts by type (actual, follow-up, bot, test, VPN, suspicious, hot) and by dimension (project, country, city, source, referer, CTA, UTM, engagement).
Trends over time: day-over-day, week-over-week, hour-of-day patterns.
Ratios and quality signals: e.g. actual vs bot rate, hot % by source, suspicion score distribution by country.

AI does not need raw PII or full payloads for optimization; aggregated or anonymized stats are enough and safer.

7.2 Use cases for team optimization

Use case	Data used	How AI helps
Staffing / capacity	Lead volume by hour and day, by project/source	Identify peak windows and busy projects; suggest when to have more people on follow-up or when capacity is underused.
Focus allocation	Volume and quality (hot %, suspicion %) by project, source, country	Recommend which projects or geos deserve more team attention vs which are low-yield or high-fraud.
SLA / response time	Time-bucketed volume + (if added later) first-touch or response timestamps	Flag hours or segments where volume spikes and suggest response-time targets or alerts.

7.3 Use cases for campaign optimization

Use case	Data used	How AI helps
Where to spend	Volume, actual-lead %, hot % by source, UTM (campaign, medium), referer	Compare channels and campaigns; recommend scaling winners (high volume + good quality) and pausing or fixing underperformers (low actual %, high bot/suspicious).
When to run ads	Lead volume and quality by hour-of-day and day-of-week	Suggest best time windows (e.g. 30/60 min slots or days) to run paid campaigns so spend aligns with when real leads come.
Geo focus	Volume and quality by country, city (and region if added)	Recommend which countries/cities to target or exclude (e.g. high bot or low conversion); suggest budget shift by geo.
Creative / CTA	Volume and quality by CTA, referer (landing page)	Identify which CTAs or pages drive more actual/hot leads; suggest A/B tests or retirement of weak CTAs.

7.4 Use cases for focus-area optimization

Use case	Data used	How AI helps
Priority segments	All dimensions + type breakdown	Rank segments (e.g. project × country × source) by volume and quality; output a short “focus list” (e.g. top 5 projects, top 3 countries) with reasons.
Anomaly / alerting	Time-series volume and quality by segment	Detect sudden drops (e.g. no leads in a usually busy hour) or spikes in bot/suspicious % and suggest “investigate this segment” or “check campaign X”.
Simple forecasting	Historical volume by time and segment	Rough “expected volume” for next day or week to set team expectations or campaign budgets (no need for heavy ML at first).

7.5 How to expose data to AI (options)

Option A — Pre-built report + copy/paste or export: The lead-stats API/dashboard exports tables or CSV (e.g. by day, by project, by source). Users paste or upload into an AI chat (e.g. Claude, ChatGPT, internal copilot) and ask: “Where should we focus?” or “Which campaigns should we scale?” No integration; manual but quick.
Option B — API + AI agent: An AI agent (or MCP tool) calls the lead-stats API with filters (date range, project, source, etc.), receives JSON, and answers natural-language questions or generates a short written summary (e.g. “This week, focus on Project X and Country Y; reduce spend on Source Z due to high bot rate.”). Integrates with your existing AI stack.
Option C — Scheduled digest: A job runs daily/weekly, fetches aggregated stats from the API, and sends a structured summary (or a prompt plus data) to an LLM; the LLM returns a “weekly optimization brief” (bullets: team focus, campaign changes, geo recommendations). Delivered via email, Slack, or dashboard.

Choose based on how much automation you want and whether you already have an AI assistant or MCP in the loop.

7.6 What to build first (data side)

For AI to be useful, the data foundation is the same as in §4–§4.6:

Lead-stats API that returns time-bucketed counts + breakdown by type and by dimension (project, country, city, source, referer, CTA, UTM, engagement).
Stable, documented response shape (e.g. JSON with clear field names) so prompts or agents can reliably say “by project”, “by country”, “actual_lead_count”, “bot_count”, etc.
Optional: a small “summary” endpoint that returns a fixed set of top-level KPIs (e.g. last 7 days: total leads, actual %, hot %, top 5 projects, top 3 countries) so AI has one call for a quick overview.

No PII or raw events need to be exposed to AI; aggregates are enough for team/campaign/focus optimization and keep compliance simple.

8. References

Lead storage: docs/features/lead-capture.md, migrations/001_init.sql, 005_lead_scoring_columns.sql, 006_lead_fraud_detection.sql, 009_lead_submissions_history.sql, 012_lead_status.sql, 026_blocked_ip_leads.sql, 029_testing_leads.sql
Payload and analytics: docs/analytics/payloadjson.md, docs/analytics/analytics-engine.md
Lead scoring and fraud: docs/features/lead-scoring.md, docs/features/fraud-detection.md
Types: src/lead.ts (LeadPayload), src/types.ts (EngagementLabel)

Planning: Lead stats for marketing campaign focus ​

1. What we already store (current state) ​

1.1 Time of receipt ​

2. Breakdown by “type” — what we have vs what we need ​

2.1 Where each type is stored ​

2.2 Cross-table view (one time window) ​

3. Optional: explicit “lead type” or reporting view ​

4. Use Analytics Engine to avoid D1 load (recommended) ​

4.1 What we already stream to Analytics Engine ​

4.2 Gap: test leads and bot/blocked leads ​

4.3 Option A: Add test and blocked/bot events to the same dataset (recommended) ​

4.4 Option B: Analytics Engine for accepted only; D1 for test + blocked counts only ​

4.5 Recommendation ​

4.6 Project-, country-, city-, source-wise (and other) stats ​

5. Summary: do we maintain such data, and what else might we need? ​

6. Recommended next steps (planning only) ​

7. Using AI on this data to optimize team, campaign, and focus area ​

7.1 What data AI can consume ​

7.2 Use cases for team optimization ​

7.3 Use cases for campaign optimization ​

7.4 Use cases for focus-area optimization ​

7.5 How to expose data to AI (options) ​

7.6 What to build first (data side) ​

8. References ​