Skip to content

AI-Powered Moderation Reporting System — Phased Roadmap

This roadmap plans the build of an AI-powered moderation reporting system using Cloudflare Worker AI and the existing lead_submissions Analytics Engine dataset. The system will produce daily, human-readable traffic quality reports for marketing stakeholders.

Data source: lead_submissions (3-month retention).
AI model: @cf/meta/llama-3-8b-instruct.
Constraints: Aggregated data only (no raw rows to AI), response time < 2 seconds.


Current State (What Exists)

  • Analytics Engine: LEAD_ANALYTICS binding → lead_submissions; SQL via lib/analyticsEngineSql.ts (requires CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_ANALYTICS_ENGINE_API_TOKEN).
  • Moderation queries: src/tracking/services/leadModerationQueries.ts — traffic quality, high bot by source/campaign, suspicion by source, suspicious referers, country risk, visitor abuse, block candidates, aggregations, dashboard summary.
  • API: Moderation endpoints under /analytics/moderation/* (e.g. traffic-quality, high-bot-by-source, visitor-abuse, block-candidates).
  • No Worker AI binding in wrangler.toml or Env yet.

Target Architecture (High Level)

┌─────────────────────────────────────────────────────────────────────────┐
│  Triggers: Cron (daily) | API GET /analytics/moderation/ai-report       │
└─────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────┐
│  Moderation Report Flow                                                  │
│  1. Build 24h summary (totals + signals) from Analytics Engine          │
│  2. Send summary JSON to Worker AI (Llama 3 8B)                          │
│  3. Parse structured AI output (summary, issues, actions)               │
│  4. Return JSON and/or send to Slack / email                             │
└─────────────────────────────────────────────────────────────────────────┘

Code layout (proposed):

  • src/analytics/moderationQueries.ts — 24h aggregation + structured summary builder (composes existing leadModerationQueries where possible).
  • src/ai/moderationPrompt.ts — System and user prompt templates.
  • src/ai/aiClient.ts — Worker AI client (run model, parse JSON).
  • src/worker/moderationReportWorker.ts — Orchestration: fetch summary → AI → format response; used by route and cron.

Phase 1: Analytics — 24h Aggregation & Structured Summary

Goal: One function that returns the exact JSON shape needed for the AI (period, totals, suspicious_sources, suspicious_campaigns, suspicious_referers, suspicious_countries, high_risk_visitors). No raw rows.

1.1 Totals for last 24 hours

  • Reuse: getTrafficQualityMetrics(env, { days: 1 }) for most totals.
  • Gap: Add blocked_submissions if not already in schema (e.g. count where event_type / blob20 = 'blocked'). If the dataset already has blocked in event_type, add a small query or extend the existing traffic-quality SQL to include blocked count.
  • Output shape:
    totals: { total_submissions, real_leads, bot_submissions, blocked_submissions, duplicate_rate, average_suspicion_score, average_engagement_score }.

1.2 Moderation signals (top N, 24h)

  • Reuse with days: 1:
    • Top suspicious sourcesgetHighBotRateBySource and/or getHighSuspicionBySource (e.g. top 10 of each, merge/dedupe by source).
    • Top suspicious campaignsgetHighBotRateByCampaign (top 10).
    • Top suspicious referersgetSuspiciousReferers (top 10).
    • Countries with high bot rates → getCountryRisk (top 10).
    • Visitors with repeated submissions → getRepeatSubmissionsByVisitor (e.g. min 2, top 20).
    • Visitors with high suspicion → part of getVisitorAbuseCandidates or getAggregationByVisitor filtered/sorted.
    • Visitors with high VPN → getVisitorAbuseCandidates or filter by vpn_score.
  • New module: src/analytics/moderationQueries.ts (or moderationSummary.ts) that:
    • Accepts Env and optional period: 'last_24_hours' | 'last_7_days'.
    • Calls the above with days: 1 (or 7) and optional min_submissions to avoid noise.
    • Maps results into a single ModerationSummary type (see below).
  • Important: Only aggregated, small result sets (e.g. top 10–20 per category). No raw submission rows.

1.3 Structured summary type

Define in TypeScript and document in specs:

ts
interface ModerationSummary {
  period: 'last_24_hours' | 'last_7_days';
  totals: {
    total_submissions: number;
    real_leads: number;
    bot_submissions: number;
    blocked_submissions: number;
    duplicate_rate: number;
    average_suspicion_score: number;
    average_engagement_score: number;
  };
  suspicious_sources: Array<{ source: string; submissions: number; bot_rate_pct?: number; avg_suspicion?: number }>;
  suspicious_campaigns: Array<{ utm_campaign: string; utm_source: string; submissions: number; bot_rate_pct?: number }>;
  suspicious_referers: Array<{ referer: string; submissions: number; bot_rate_pct?: number; avg_suspicion?: number }>;
  suspicious_countries: Array<{ country: string; submissions: number; bot_rate_pct?: number }>;
  high_risk_visitors: Array<{
    visitor_id: string;
    submissions: number;
    avg_suspicion?: number;
    avg_vpn_score?: number;
    form_submit_count_24h?: number;
  }>;
}

1.4 Performance

  • Run independent queries in parallel (Promise.all) where possible.
  • Limit result sizes (e.g. LIMIT 10 or 20 per query).
  • If needed, add a single “dashboard-style” Analytics Engine query that returns one row per dimension (e.g. with GROUP BY) to reduce round-trips; otherwise keep composing existing functions.

Deliverables:

  • src/analytics/moderationQueries.ts (or similarly named) with getModerationSummary(env, period).
  • Types for ModerationSummary.
  • Unit tests with mocked queryAnalyticsEngine / Env.

Phase 2: Worker AI — Binding, Prompt, Client

Goal: Worker can call Llama 3 8B with the moderation summary and receive a structured JSON report.

2.1 Wrangler and Env

  • Add Workers AI binding in wrangler.toml (e.g. [ai] or binding name AI).
    Ref: Cloudflare Workers AI.
  • Extend Env in src/types.ts with optional AI: Ai (or the correct type from @cloudflare/workers-types).

2.2 Prompt design — src/ai/moderationPrompt.ts

  • System prompt: Instruct the model to act as a traffic quality analyst. Output only valid JSON. No raw submission data in the prompt; only the aggregated summary is provided.
  • User prompt: Include the serialized ModerationSummary (e.g. JSON.stringify(summary)) and ask for a report in the exact schema:
    • Executive summary (short paragraph).
    • Key issues (list).
    • Campaign quality issues (list).
    • Suspicious visitor patterns (list).
    • Recommended actions (list).
    • Suggested blocking actions (list).
  • Output schema (enforced in prompt):
json
{
  "summary": "",
  "issues": [],
  "campaign_problems": [],
  "visitor_abuse": [],
  "recommended_actions": [],
  "blocking_recommendations": []
}
  • Best practices: Clear instructions that input is aggregated only; response must be valid JSON; tone suitable for marketing stakeholders.

2.3 AI client — src/ai/aiClient.ts

  • generateModerationReport(env, summary: ModerationSummary): Promise<StructuredAiReport>.
  • Call env.AI.run('@cf/meta/llama-3-8b-instruct', { messages: [...] }) (or equivalent API).
  • Parse the model output as JSON. If the model returns markdown code blocks, strip them before parsing.
  • Validate shape (e.g. with a type guard or Zod) and map to StructuredAiReport.
  • Handle errors (model unavailable, invalid JSON) with clear errors and optional fallback (e.g. return summary without AI text).

2.4 Types

  • StructuredAiReport: { summary, issues, campaign_problems, visitor_abuse, recommended_actions, blocking_recommendations } (all strings or string arrays as per schema above).

Deliverables:

  • src/ai/moderationPrompt.ts (system + user prompt builders).
  • src/ai/aiClient.ts (run model, parse and validate JSON).
  • StructuredAiReport type and optional schema validation.
  • Tests with mocked AI binding.

Phase 3: Report Worker & API

Goal: A single orchestration path: fetch 24h summary → generate AI report → return (and optionally send to Slack/email).

3.1 Orchestration — src/worker/moderationReportWorker.ts

  • getModerationReport(env, options?: { period?, sendToSlack?, skipAi? }): Promise<ModerationReportResponse>.
  • Steps:
    1. Call getModerationSummary(env, period) (Phase 1).
    2. If skipAi or AI binding missing, return summary + empty AI fields or a placeholder message.
    3. Otherwise call generateModerationReport(env, summary) (Phase 2).
    4. Build response: { period, summary: ModerationSummary, ai_report: StructuredAiReport, generated_at }.
    5. If sendToSlack and webhook configured, send a condensed version (e.g. summary + top issues + top blocking recommendations) to Slack.
  • Keep the function pure of HTTP; the route/cron will call it and set sendToSlack based on context.

3.2 API endpoint

  • GET (or POST) /analytics/moderation/ai-report (or /analytics/moderation/report).
  • Query params: e.g. period=last_24_hours, slack=false.
  • Auth: reuse existing analytics auth (e.g. ANALYTICS_API_KEY or dashboard auth).
  • Response: JSON with summary, ai_report, generated_at. Status 200; on failure (e.g. Analytics Engine down) return 503 or 500 with error message.
  • Performance: Ensure the whole flow (queries + AI) stays under 2 seconds where possible; document that AI may add ~1–2 s.

3.3 Routing

  • In src/index.ts (or existing analytics router), add a route that calls getModerationReport and returns the JSON. No raw submission data in the response; only aggregated summary + AI report.

Deliverables:

  • src/worker/moderationReportWorker.ts with getModerationReport.
  • Route and auth for GET /analytics/moderation/ai-report.
  • Integration test (or manual test) that hits the endpoint and receives valid JSON.

Phase 4: Automation — Slack, Email, Cron

Goal: Optional delivery of the report to Slack and email; daily run via cron.

4.1 Slack

  • Config: Optional env/secret e.g. MODERATION_SLACK_WEBHOOK_URL. If absent, skip Slack.
  • Payload: Short, readable message: period, 2–3 headline numbers (e.g. total submissions, bot rate, real lead rate), link to dashboard or report URL, and top 3–5 “blocking recommendations” or “issues” from the AI report.
  • Implementation: In moderationReportWorker or a small src/worker/slackNotifier.ts, POST to the webhook. No retry required for MVP; log failures.

4.2 Email (daily digest)

  • Option A: Use a third-party (e.g. Resend, SendGrid) with an API key in secrets; worker sends one email per day with the report body (summary + AI sections).
  • Option B: Use Cloudflare Email Workers / Email Routing if available and sufficient.
  • Config: e.g. MODERATION_EMAIL_ENABLED, MODERATION_EMAIL_TO, and API key for the provider.
  • Content: HTML or plain text: executive summary, key issues, recommended actions, blocking recommendations. Optionally attach or link to the JSON report.

4.3 Cron

  • Add a scheduled handler (e.g. daily at 09:00 UTC) in wrangler.toml that invokes the same logic as the report worker: build summary → AI report → send to Slack (if configured) and/or email (if configured). Response can be fire-and-forget; log success/failure.

Deliverables:

  • Slack notification with report snippet when webhook URL is set.
  • Optional email sending (document provider and env vars).
  • Cron entry and handler; docs for enabling Slack/email.

Phase 5: Documentation, Example Output, and Polish

Goal: Docs and an example report so stakeholders and future devs understand the system.

5.1 Documentation

  • docs/analytics/traffic-moderation.md: Add a section “AI moderation report” describing: how to trigger (API vs cron), what the summary contains, what the AI adds, and how to interpret the JSON.
  • docs/analytics/ai-moderation-report.md (new):
    • Data flow (Analytics Engine → summary → Worker AI → structured output).
    • API: GET /analytics/moderation/ai-report, params, auth, response shape.
    • Automation: Slack webhook, email, cron schedule.
    • Env vars and secrets (AI binding, Slack, email, Analytics Engine).
  • ARCHITECTURE.md: Add src/analytics, src/ai, src/worker and the report flow.

5.2 Example output

  • Add an example moderation report (JSON) in docs, e.g. docs/analytics/ai-moderation-report-example.json, with sample ModerationSummary and StructuredAiReport so stakeholders see the exact format and tone.

5.3 Performance and safety checklist

  • Confirm only aggregated data is sent to the AI (no raw lead_submissions rows).
  • Confirm response time is measured (target < 2 s for summary + AI); document if AI pushes it over.
  • Add a short “Operational runbook” note: what to do if the report fails (check Analytics Engine token, AI binding, Slack/email config).

Deliverables:

  • Updated traffic-moderation doc and new ai-moderation-report doc.
  • Example JSON report.
  • ARCHITECTURE.md update and runbook note.

Phase Summary Table

PhaseFocusKey deliverables
1Analytics — 24h aggregation & structured summarygetModerationSummary(env, period), ModerationSummary type, no raw rows
2Worker AI — binding, prompt, clientAI binding, moderationPrompt.ts, aiClient.ts, StructuredAiReport
3Report worker & APIgetModerationReport(), GET /analytics/moderation/ai-report
4AutomationSlack webhook, optional email, daily cron
5Docs & exampletraffic-moderation + ai-moderation-report docs, example JSON, runbook

Dependency Order

  • Phase 2 depends on Phase 1 (summary type and data).
  • Phase 3 depends on Phase 1 and 2.
  • Phase 4 depends on Phase 3.
  • Phase 5 can be done in parallel with 3–4 and finalized after 4.

Suggested File Layout (Final)

src/
├── analytics/
│   └── moderationQueries.ts    # getModerationSummary(env, period) → ModerationSummary
├── ai/
│   ├── moderationPrompt.ts     # buildSystemPrompt(), buildUserPrompt(summary)
│   └── aiClient.ts             # generateModerationReport(env, summary) → StructuredAiReport
├── worker/
│   ├── moderationReportWorker.ts   # getModerationReport(env, options)
│   └── slackNotifier.ts            # (optional) sendReportToSlack(env, report)
├── tracking/
│   └── services/
│       └── leadModerationQueries.ts   # existing; used by analytics/moderationQueries
└── ...

Existing leadModerationQueries.ts stays the single place for Analytics Engine SQL; src/analytics/moderationQueries.ts composes it and builds the summary only.


Risk and Mitigations

RiskMitigation
AI model returns non-JSONStrip markdown code fences; retry with “output only JSON”; fallback to summary-only response.
Analytics Engine slowUse parallel queries; limit result sizes; consider caching summary for 5–10 min if needed.
Report timeout > 2 sDocument; consider making AI optional (e.g. ?ai=false) for fast summary-only.
Slack/email secretsDocument required vars; no defaults; feature off when not set.

Next Steps

  1. Kick off Phase 1: Implement getModerationSummary and ModerationSummary type; add blocked count to totals if needed.
  2. Then Phase 2: Add AI binding, prompts, and client; validate with a manual test.
  3. Then Phase 3: Wire route and orchestration; verify end-to-end under 2 s where feasible.
  4. Then Phase 4 and 5: Add Slack/email/cron and complete documentation and example output.

This roadmap is ready for team review and phase-by-phase implementation.