AI-Powered Moderation Reporting System — Phased Roadmap
This roadmap plans the build of an AI-powered moderation reporting system using Cloudflare Worker AI and the existing lead_submissions Analytics Engine dataset. The system will produce daily, human-readable traffic quality reports for marketing stakeholders.
Data source: lead_submissions (3-month retention).
AI model: @cf/meta/llama-3-8b-instruct.
Constraints: Aggregated data only (no raw rows to AI), response time < 2 seconds.
Current State (What Exists)
- Analytics Engine:
LEAD_ANALYTICSbinding →lead_submissions; SQL vialib/analyticsEngineSql.ts(requiresCLOUDFLARE_ACCOUNT_ID+CLOUDFLARE_ANALYTICS_ENGINE_API_TOKEN). - Moderation queries:
src/tracking/services/leadModerationQueries.ts— traffic quality, high bot by source/campaign, suspicion by source, suspicious referers, country risk, visitor abuse, block candidates, aggregations, dashboard summary. - API: Moderation endpoints under
/analytics/moderation/*(e.g.traffic-quality,high-bot-by-source,visitor-abuse,block-candidates). - No Worker AI binding in
wrangler.tomlorEnvyet.
Target Architecture (High Level)
┌─────────────────────────────────────────────────────────────────────────┐
│ Triggers: Cron (daily) | API GET /analytics/moderation/ai-report │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Moderation Report Flow │
│ 1. Build 24h summary (totals + signals) from Analytics Engine │
│ 2. Send summary JSON to Worker AI (Llama 3 8B) │
│ 3. Parse structured AI output (summary, issues, actions) │
│ 4. Return JSON and/or send to Slack / email │
└─────────────────────────────────────────────────────────────────────────┘Code layout (proposed):
src/analytics/moderationQueries.ts— 24h aggregation + structured summary builder (composes existingleadModerationQuerieswhere possible).src/ai/moderationPrompt.ts— System and user prompt templates.src/ai/aiClient.ts— Worker AI client (run model, parse JSON).src/worker/moderationReportWorker.ts— Orchestration: fetch summary → AI → format response; used by route and cron.
Phase 1: Analytics — 24h Aggregation & Structured Summary
Goal: One function that returns the exact JSON shape needed for the AI (period, totals, suspicious_sources, suspicious_campaigns, suspicious_referers, suspicious_countries, high_risk_visitors). No raw rows.
1.1 Totals for last 24 hours
- Reuse:
getTrafficQualityMetrics(env, { days: 1 })for most totals. - Gap: Add
blocked_submissionsif not already in schema (e.g. count whereevent_type/blob20 = 'blocked'). If the dataset already has blocked in event_type, add a small query or extend the existing traffic-quality SQL to include blocked count. - Output shape:
totals: { total_submissions, real_leads, bot_submissions, blocked_submissions, duplicate_rate, average_suspicion_score, average_engagement_score }.
1.2 Moderation signals (top N, 24h)
- Reuse with
days: 1:- Top suspicious sources →
getHighBotRateBySourceand/orgetHighSuspicionBySource(e.g. top 10 of each, merge/dedupe by source). - Top suspicious campaigns →
getHighBotRateByCampaign(top 10). - Top suspicious referers →
getSuspiciousReferers(top 10). - Countries with high bot rates →
getCountryRisk(top 10). - Visitors with repeated submissions →
getRepeatSubmissionsByVisitor(e.g. min 2, top 20). - Visitors with high suspicion → part of
getVisitorAbuseCandidatesorgetAggregationByVisitorfiltered/sorted. - Visitors with high VPN →
getVisitorAbuseCandidatesor filter by vpn_score.
- Top suspicious sources →
- New module:
src/analytics/moderationQueries.ts(ormoderationSummary.ts) that:- Accepts
Envand optionalperiod: 'last_24_hours' | 'last_7_days'. - Calls the above with
days: 1(or 7) and optionalmin_submissionsto avoid noise. - Maps results into a single ModerationSummary type (see below).
- Accepts
- Important: Only aggregated, small result sets (e.g. top 10–20 per category). No raw submission rows.
1.3 Structured summary type
Define in TypeScript and document in specs:
interface ModerationSummary {
period: 'last_24_hours' | 'last_7_days';
totals: {
total_submissions: number;
real_leads: number;
bot_submissions: number;
blocked_submissions: number;
duplicate_rate: number;
average_suspicion_score: number;
average_engagement_score: number;
};
suspicious_sources: Array<{ source: string; submissions: number; bot_rate_pct?: number; avg_suspicion?: number }>;
suspicious_campaigns: Array<{ utm_campaign: string; utm_source: string; submissions: number; bot_rate_pct?: number }>;
suspicious_referers: Array<{ referer: string; submissions: number; bot_rate_pct?: number; avg_suspicion?: number }>;
suspicious_countries: Array<{ country: string; submissions: number; bot_rate_pct?: number }>;
high_risk_visitors: Array<{
visitor_id: string;
submissions: number;
avg_suspicion?: number;
avg_vpn_score?: number;
form_submit_count_24h?: number;
}>;
}1.4 Performance
- Run independent queries in parallel (
Promise.all) where possible. - Limit result sizes (e.g. LIMIT 10 or 20 per query).
- If needed, add a single “dashboard-style” Analytics Engine query that returns one row per dimension (e.g. with GROUP BY) to reduce round-trips; otherwise keep composing existing functions.
Deliverables:
src/analytics/moderationQueries.ts(or similarly named) withgetModerationSummary(env, period).- Types for
ModerationSummary. - Unit tests with mocked
queryAnalyticsEngine/ Env.
Phase 2: Worker AI — Binding, Prompt, Client
Goal: Worker can call Llama 3 8B with the moderation summary and receive a structured JSON report.
2.1 Wrangler and Env
- Add Workers AI binding in
wrangler.toml(e.g.[ai]or binding nameAI).
Ref: Cloudflare Workers AI. - Extend
Envinsrc/types.tswith optionalAI: Ai(or the correct type from@cloudflare/workers-types).
2.2 Prompt design — src/ai/moderationPrompt.ts
- System prompt: Instruct the model to act as a traffic quality analyst. Output only valid JSON. No raw submission data in the prompt; only the aggregated summary is provided.
- User prompt: Include the serialized
ModerationSummary(e.g.JSON.stringify(summary)) and ask for a report in the exact schema:- Executive summary (short paragraph).
- Key issues (list).
- Campaign quality issues (list).
- Suspicious visitor patterns (list).
- Recommended actions (list).
- Suggested blocking actions (list).
- Output schema (enforced in prompt):
{
"summary": "",
"issues": [],
"campaign_problems": [],
"visitor_abuse": [],
"recommended_actions": [],
"blocking_recommendations": []
}- Best practices: Clear instructions that input is aggregated only; response must be valid JSON; tone suitable for marketing stakeholders.
2.3 AI client — src/ai/aiClient.ts
generateModerationReport(env, summary: ModerationSummary): Promise<StructuredAiReport>.- Call
env.AI.run('@cf/meta/llama-3-8b-instruct', { messages: [...] })(or equivalent API). - Parse the model output as JSON. If the model returns markdown code blocks, strip them before parsing.
- Validate shape (e.g. with a type guard or Zod) and map to
StructuredAiReport. - Handle errors (model unavailable, invalid JSON) with clear errors and optional fallback (e.g. return summary without AI text).
2.4 Types
StructuredAiReport:{ summary, issues, campaign_problems, visitor_abuse, recommended_actions, blocking_recommendations }(all strings or string arrays as per schema above).
Deliverables:
src/ai/moderationPrompt.ts(system + user prompt builders).src/ai/aiClient.ts(run model, parse and validate JSON).StructuredAiReporttype and optional schema validation.- Tests with mocked AI binding.
Phase 3: Report Worker & API
Goal: A single orchestration path: fetch 24h summary → generate AI report → return (and optionally send to Slack/email).
3.1 Orchestration — src/worker/moderationReportWorker.ts
getModerationReport(env, options?: { period?, sendToSlack?, skipAi? }): Promise<ModerationReportResponse>.- Steps:
- Call
getModerationSummary(env, period)(Phase 1). - If
skipAior AI binding missing, return summary + empty AI fields or a placeholder message. - Otherwise call
generateModerationReport(env, summary)(Phase 2). - Build response:
{ period, summary: ModerationSummary, ai_report: StructuredAiReport, generated_at }. - If
sendToSlackand webhook configured, send a condensed version (e.g. summary + top issues + top blocking recommendations) to Slack.
- Call
- Keep the function pure of HTTP; the route/cron will call it and set
sendToSlackbased on context.
3.2 API endpoint
- GET (or POST)
/analytics/moderation/ai-report(or/analytics/moderation/report). - Query params: e.g.
period=last_24_hours,slack=false. - Auth: reuse existing analytics auth (e.g.
ANALYTICS_API_KEYor dashboard auth). - Response: JSON with
summary,ai_report,generated_at. Status 200; on failure (e.g. Analytics Engine down) return 503 or 500 with error message. - Performance: Ensure the whole flow (queries + AI) stays under 2 seconds where possible; document that AI may add ~1–2 s.
3.3 Routing
- In
src/index.ts(or existing analytics router), add a route that callsgetModerationReportand returns the JSON. No raw submission data in the response; only aggregated summary + AI report.
Deliverables:
src/worker/moderationReportWorker.tswithgetModerationReport.- Route and auth for
GET /analytics/moderation/ai-report. - Integration test (or manual test) that hits the endpoint and receives valid JSON.
Phase 4: Automation — Slack, Email, Cron
Goal: Optional delivery of the report to Slack and email; daily run via cron.
4.1 Slack
- Config: Optional env/secret e.g.
MODERATION_SLACK_WEBHOOK_URL. If absent, skip Slack. - Payload: Short, readable message: period, 2–3 headline numbers (e.g. total submissions, bot rate, real lead rate), link to dashboard or report URL, and top 3–5 “blocking recommendations” or “issues” from the AI report.
- Implementation: In
moderationReportWorkeror a smallsrc/worker/slackNotifier.ts, POST to the webhook. No retry required for MVP; log failures.
4.2 Email (daily digest)
- Option A: Use a third-party (e.g. Resend, SendGrid) with an API key in secrets; worker sends one email per day with the report body (summary + AI sections).
- Option B: Use Cloudflare Email Workers / Email Routing if available and sufficient.
- Config: e.g.
MODERATION_EMAIL_ENABLED,MODERATION_EMAIL_TO, and API key for the provider. - Content: HTML or plain text: executive summary, key issues, recommended actions, blocking recommendations. Optionally attach or link to the JSON report.
4.3 Cron
- Add a scheduled handler (e.g. daily at 09:00 UTC) in
wrangler.tomlthat invokes the same logic as the report worker: build summary → AI report → send to Slack (if configured) and/or email (if configured). Response can be fire-and-forget; log success/failure.
Deliverables:
- Slack notification with report snippet when webhook URL is set.
- Optional email sending (document provider and env vars).
- Cron entry and handler; docs for enabling Slack/email.
Phase 5: Documentation, Example Output, and Polish
Goal: Docs and an example report so stakeholders and future devs understand the system.
5.1 Documentation
- docs/analytics/traffic-moderation.md: Add a section “AI moderation report” describing: how to trigger (API vs cron), what the summary contains, what the AI adds, and how to interpret the JSON.
- docs/analytics/ai-moderation-report.md (new):
- Data flow (Analytics Engine → summary → Worker AI → structured output).
- API:
GET /analytics/moderation/ai-report, params, auth, response shape. - Automation: Slack webhook, email, cron schedule.
- Env vars and secrets (AI binding, Slack, email, Analytics Engine).
- ARCHITECTURE.md: Add
src/analytics,src/ai,src/workerand the report flow.
5.2 Example output
- Add an example moderation report (JSON) in docs, e.g.
docs/analytics/ai-moderation-report-example.json, with sampleModerationSummaryandStructuredAiReportso stakeholders see the exact format and tone.
5.3 Performance and safety checklist
- Confirm only aggregated data is sent to the AI (no raw
lead_submissionsrows). - Confirm response time is measured (target < 2 s for summary + AI); document if AI pushes it over.
- Add a short “Operational runbook” note: what to do if the report fails (check Analytics Engine token, AI binding, Slack/email config).
Deliverables:
- Updated traffic-moderation doc and new ai-moderation-report doc.
- Example JSON report.
- ARCHITECTURE.md update and runbook note.
Phase Summary Table
| Phase | Focus | Key deliverables |
|---|---|---|
| 1 | Analytics — 24h aggregation & structured summary | getModerationSummary(env, period), ModerationSummary type, no raw rows |
| 2 | Worker AI — binding, prompt, client | AI binding, moderationPrompt.ts, aiClient.ts, StructuredAiReport |
| 3 | Report worker & API | getModerationReport(), GET /analytics/moderation/ai-report |
| 4 | Automation | Slack webhook, optional email, daily cron |
| 5 | Docs & example | traffic-moderation + ai-moderation-report docs, example JSON, runbook |
Dependency Order
- Phase 2 depends on Phase 1 (summary type and data).
- Phase 3 depends on Phase 1 and 2.
- Phase 4 depends on Phase 3.
- Phase 5 can be done in parallel with 3–4 and finalized after 4.
Suggested File Layout (Final)
src/
├── analytics/
│ └── moderationQueries.ts # getModerationSummary(env, period) → ModerationSummary
├── ai/
│ ├── moderationPrompt.ts # buildSystemPrompt(), buildUserPrompt(summary)
│ └── aiClient.ts # generateModerationReport(env, summary) → StructuredAiReport
├── worker/
│ ├── moderationReportWorker.ts # getModerationReport(env, options)
│ └── slackNotifier.ts # (optional) sendReportToSlack(env, report)
├── tracking/
│ └── services/
│ └── leadModerationQueries.ts # existing; used by analytics/moderationQueries
└── ...Existing leadModerationQueries.ts stays the single place for Analytics Engine SQL; src/analytics/moderationQueries.ts composes it and builds the summary only.
Risk and Mitigations
| Risk | Mitigation |
|---|---|
| AI model returns non-JSON | Strip markdown code fences; retry with “output only JSON”; fallback to summary-only response. |
| Analytics Engine slow | Use parallel queries; limit result sizes; consider caching summary for 5–10 min if needed. |
| Report timeout > 2 s | Document; consider making AI optional (e.g. ?ai=false) for fast summary-only. |
| Slack/email secrets | Document required vars; no defaults; feature off when not set. |
Next Steps
- Kick off Phase 1: Implement
getModerationSummaryandModerationSummarytype; add blocked count to totals if needed. - Then Phase 2: Add AI binding, prompts, and client; validate with a manual test.
- Then Phase 3: Wire route and orchestration; verify end-to-end under 2 s where feasible.
- Then Phase 4 and 5: Add Slack/email/cron and complete documentation and example output.
This roadmap is ready for team review and phase-by-phase implementation.