AI-Powered Moderation Reporting System — Phased Roadmap

This roadmap plans the build of an AI-powered moderation reporting system using Cloudflare Worker AI and the existing lead_submissions Analytics Engine dataset. The system will produce daily, human-readable traffic quality reports for marketing stakeholders.

Data source: lead_submissions (3-month retention).
AI model: @cf/meta/llama-3-8b-instruct.
Constraints: Aggregated data only (no raw rows to AI), response time < 2 seconds.

Current State (What Exists)

Analytics Engine: LEAD_ANALYTICS binding → lead_submissions; SQL via lib/analyticsEngineSql.ts (requires CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_ANALYTICS_ENGINE_API_TOKEN).
Moderation queries: src/tracking/services/leadModerationQueries.ts — traffic quality, high bot by source/campaign, suspicion by source, suspicious referers, country risk, visitor abuse, block candidates, aggregations, dashboard summary.
API: Moderation endpoints under /analytics/moderation/* (e.g. traffic-quality, high-bot-by-source, visitor-abuse, block-candidates).
No Worker AI binding in wrangler.toml or Env yet.

Target Architecture (High Level)

┌─────────────────────────────────────────────────────────────────────────┐
│  Triggers: Cron (daily) | API GET /analytics/moderation/ai-report       │
└─────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  Moderation Report Flow                                                  │
│  1. Build 24h summary (totals + signals) from Analytics Engine          │
│  2. Send summary JSON to Worker AI (Llama 3 8B)                          │
│  3. Parse structured AI output (summary, issues, actions)               │
│  4. Return JSON and/or send to Slack / email                             │
└─────────────────────────────────────────────────────────────────────────┘

Code layout (proposed):

src/analytics/moderationQueries.ts — 24h aggregation + structured summary builder (composes existing leadModerationQueries where possible).
src/ai/moderationPrompt.ts — System and user prompt templates.
src/ai/aiClient.ts — Worker AI client (run model, parse JSON).
src/worker/moderationReportWorker.ts — Orchestration: fetch summary → AI → format response; used by route and cron.

Phase 1: Analytics — 24h Aggregation & Structured Summary

Goal: One function that returns the exact JSON shape needed for the AI (period, totals, suspicious_sources, suspicious_campaigns, suspicious_referers, suspicious_countries, high_risk_visitors). No raw rows.

1.1 Totals for last 24 hours

Reuse: getTrafficQualityMetrics(env, { days: 1 }) for most totals.
Gap: Add blocked_submissions if not already in schema (e.g. count where event_type / blob20 = 'blocked'). If the dataset already has blocked in event_type, add a small query or extend the existing traffic-quality SQL to include blocked count.
Output shape:
totals: { total_submissions, real_leads, bot_submissions, blocked_submissions, duplicate_rate, average_suspicion_score, average_engagement_score }.

1.2 Moderation signals (top N, 24h)

Reuse with days: 1:
- Top suspicious sources → getHighBotRateBySource and/or getHighSuspicionBySource (e.g. top 10 of each, merge/dedupe by source).
- Top suspicious campaigns → getHighBotRateByCampaign (top 10).
- Top suspicious referers → getSuspiciousReferers (top 10).
- Countries with high bot rates → getCountryRisk (top 10).
- Visitors with repeated submissions → getRepeatSubmissionsByVisitor (e.g. min 2, top 20).
- Visitors with high suspicion → part of getVisitorAbuseCandidates or getAggregationByVisitor filtered/sorted.
- Visitors with high VPN → getVisitorAbuseCandidates or filter by vpn_score.
New module: src/analytics/moderationQueries.ts (or moderationSummary.ts) that:
- Accepts Env and optional period: 'last_24_hours' | 'last_7_days'.
- Calls the above with days: 1 (or 7) and optional min_submissions to avoid noise.
- Maps results into a single ModerationSummary type (see below).
Important: Only aggregated, small result sets (e.g. top 10–20 per category). No raw submission rows.

1.3 Structured summary type

Define in TypeScript and document in specs:

interface ModerationSummary {
  period: 'last_24_hours' | 'last_7_days';
  totals: {
    total_submissions: number;
    real_leads: number;
    bot_submissions: number;
    blocked_submissions: number;
    duplicate_rate: number;
    average_suspicion_score: number;
    average_engagement_score: number;
  };
  suspicious_sources: Array<{ source: string; submissions: number; bot_rate_pct?: number; avg_suspicion?: number }>;
  suspicious_campaigns: Array<{ utm_campaign: string; utm_source: string; submissions: number; bot_rate_pct?: number }>;
  suspicious_referers: Array<{ referer: string; submissions: number; bot_rate_pct?: number; avg_suspicion?: number }>;
  suspicious_countries: Array<{ country: string; submissions: number; bot_rate_pct?: number }>;
  high_risk_visitors: Array<{
    visitor_id: string;
    submissions: number;
    avg_suspicion?: number;
    avg_vpn_score?: number;
    form_submit_count_24h?: number;
  }>;
}

1.4 Performance

Run independent queries in parallel (Promise.all) where possible.
Limit result sizes (e.g. LIMIT 10 or 20 per query).
If needed, add a single “dashboard-style” Analytics Engine query that returns one row per dimension (e.g. with GROUP BY) to reduce round-trips; otherwise keep composing existing functions.

Deliverables:

src/analytics/moderationQueries.ts (or similarly named) with getModerationSummary(env, period).
Types for ModerationSummary.
Unit tests with mocked queryAnalyticsEngine / Env.

Phase 2: Worker AI — Binding, Prompt, Client

Goal: Worker can call Llama 3 8B with the moderation summary and receive a structured JSON report.

2.1 Wrangler and Env

Add Workers AI binding in wrangler.toml (e.g. [ai] or binding name AI).
Ref: Cloudflare Workers AI.
Extend Env in src/types.ts with optional AI: Ai (or the correct type from @cloudflare/workers-types).

2.2 Prompt design — `src/ai/moderationPrompt.ts`

System prompt: Instruct the model to act as a traffic quality analyst. Output only valid JSON. No raw submission data in the prompt; only the aggregated summary is provided.
User prompt: Include the serialized ModerationSummary (e.g. JSON.stringify(summary)) and ask for a report in the exact schema:
- Executive summary (short paragraph).
- Key issues (list).
- Campaign quality issues (list).
- Suspicious visitor patterns (list).
- Recommended actions (list).
- Suggested blocking actions (list).
Output schema (enforced in prompt):

json

{
  "summary": "",
  "issues": [],
  "campaign_problems": [],
  "visitor_abuse": [],
  "recommended_actions": [],
  "blocking_recommendations": []
}

Best practices: Clear instructions that input is aggregated only; response must be valid JSON; tone suitable for marketing stakeholders.

2.3 AI client — `src/ai/aiClient.ts`

generateModerationReport(env, summary: ModerationSummary): Promise<StructuredAiReport>.
Call env.AI.run('@cf/meta/llama-3-8b-instruct', { messages: [...] }) (or equivalent API).
Parse the model output as JSON. If the model returns markdown code blocks, strip them before parsing.
Validate shape (e.g. with a type guard or Zod) and map to StructuredAiReport.
Handle errors (model unavailable, invalid JSON) with clear errors and optional fallback (e.g. return summary without AI text).

2.4 Types

StructuredAiReport: { summary, issues, campaign_problems, visitor_abuse, recommended_actions, blocking_recommendations } (all strings or string arrays as per schema above).

Deliverables:

src/ai/moderationPrompt.ts (system + user prompt builders).
src/ai/aiClient.ts (run model, parse and validate JSON).
StructuredAiReport type and optional schema validation.
Tests with mocked AI binding.

Phase 3: Report Worker & API

Goal: A single orchestration path: fetch 24h summary → generate AI report → return (and optionally send to Slack/email).

3.1 Orchestration — `src/worker/moderationReportWorker.ts`

getModerationReport(env, options?: { period?, sendToSlack?, skipAi? }): Promise<ModerationReportResponse>.
Steps:
1. Call getModerationSummary(env, period) (Phase 1).
2. If skipAi or AI binding missing, return summary + empty AI fields or a placeholder message.
3. Otherwise call generateModerationReport(env, summary) (Phase 2).
4. Build response: { period, summary: ModerationSummary, ai_report: StructuredAiReport, generated_at }.
5. If sendToSlack and webhook configured, send a condensed version (e.g. summary + top issues + top blocking recommendations) to Slack.
Keep the function pure of HTTP; the route/cron will call it and set sendToSlack based on context.

3.2 API endpoint

GET (or POST) /analytics/moderation/ai-report (or /analytics/moderation/report).
Query params: e.g. period=last_24_hours, slack=false.
Auth: reuse existing analytics auth (e.g. ANALYTICS_API_KEY or dashboard auth).
Response: JSON with summary, ai_report, generated_at. Status 200; on failure (e.g. Analytics Engine down) return 503 or 500 with error message.
Performance: Ensure the whole flow (queries + AI) stays under 2 seconds where possible; document that AI may add ~1–2 s.

3.3 Routing

In src/index.ts (or existing analytics router), add a route that calls getModerationReport and returns the JSON. No raw submission data in the response; only aggregated summary + AI report.

Deliverables:

src/worker/moderationReportWorker.ts with getModerationReport.
Route and auth for GET /analytics/moderation/ai-report.
Integration test (or manual test) that hits the endpoint and receives valid JSON.

Phase 4: Automation — Slack, Email, Cron

Goal: Optional delivery of the report to Slack and email; daily run via cron.

4.1 Slack

Config: Optional env/secret e.g. MODERATION_SLACK_WEBHOOK_URL. If absent, skip Slack.
Payload: Short, readable message: period, 2–3 headline numbers (e.g. total submissions, bot rate, real lead rate), link to dashboard or report URL, and top 3–5 “blocking recommendations” or “issues” from the AI report.
Implementation: In moderationReportWorker or a small src/worker/slackNotifier.ts, POST to the webhook. No retry required for MVP; log failures.

4.2 Email (daily digest)

Option A: Use a third-party (e.g. Resend, SendGrid) with an API key in secrets; worker sends one email per day with the report body (summary + AI sections).
Option B: Use Cloudflare Email Workers / Email Routing if available and sufficient.
Config: e.g. MODERATION_EMAIL_ENABLED, MODERATION_EMAIL_TO, and API key for the provider.
Content: HTML or plain text: executive summary, key issues, recommended actions, blocking recommendations. Optionally attach or link to the JSON report.

4.3 Cron

Add a scheduled handler (e.g. daily at 09:00 UTC) in wrangler.toml that invokes the same logic as the report worker: build summary → AI report → send to Slack (if configured) and/or email (if configured). Response can be fire-and-forget; log success/failure.

Deliverables:

Slack notification with report snippet when webhook URL is set.
Optional email sending (document provider and env vars).
Cron entry and handler; docs for enabling Slack/email.

Phase 5: Documentation, Example Output, and Polish

Goal: Docs and an example report so stakeholders and future devs understand the system.

5.1 Documentation

docs/analytics/traffic-moderation.md: Add a section “AI moderation report” describing: how to trigger (API vs cron), what the summary contains, what the AI adds, and how to interpret the JSON.
docs/analytics/ai-moderation-report.md (new):
- Data flow (Analytics Engine → summary → Worker AI → structured output).
- API: GET /analytics/moderation/ai-report, params, auth, response shape.
- Automation: Slack webhook, email, cron schedule.
- Env vars and secrets (AI binding, Slack, email, Analytics Engine).
ARCHITECTURE.md: Add src/analytics, src/ai, src/worker and the report flow.

5.2 Example output

Add an example moderation report (JSON) in docs, e.g. docs/analytics/ai-moderation-report-example.json, with sample ModerationSummary and StructuredAiReport so stakeholders see the exact format and tone.

5.3 Performance and safety checklist

Confirm only aggregated data is sent to the AI (no raw lead_submissions rows).
Confirm response time is measured (target < 2 s for summary + AI); document if AI pushes it over.
Add a short “Operational runbook” note: what to do if the report fails (check Analytics Engine token, AI binding, Slack/email config).

Deliverables:

Updated traffic-moderation doc and new ai-moderation-report doc.
Example JSON report.
ARCHITECTURE.md update and runbook note.

Phase Summary Table

Phase	Focus	Key deliverables
1	Analytics — 24h aggregation & structured summary	`getModerationSummary(env, period)`, `ModerationSummary` type, no raw rows
2	Worker AI — binding, prompt, client	AI binding, `moderationPrompt.ts`, `aiClient.ts`, `StructuredAiReport`
3	Report worker & API	`getModerationReport()`, `GET /analytics/moderation/ai-report`
4	Automation	Slack webhook, optional email, daily cron
5	Docs & example	traffic-moderation + ai-moderation-report docs, example JSON, runbook

Dependency Order

Phase 2 depends on Phase 1 (summary type and data).
Phase 3 depends on Phase 1 and 2.
Phase 4 depends on Phase 3.
Phase 5 can be done in parallel with 3–4 and finalized after 4.

Suggested File Layout (Final)

src/
├── analytics/
│   └── moderationQueries.ts    # getModerationSummary(env, period) → ModerationSummary
├── ai/
│   ├── moderationPrompt.ts     # buildSystemPrompt(), buildUserPrompt(summary)
│   └── aiClient.ts             # generateModerationReport(env, summary) → StructuredAiReport
├── worker/
│   ├── moderationReportWorker.ts   # getModerationReport(env, options)
│   └── slackNotifier.ts            # (optional) sendReportToSlack(env, report)
├── tracking/
│   └── services/
│       └── leadModerationQueries.ts   # existing; used by analytics/moderationQueries
└── ...

Existing leadModerationQueries.ts stays the single place for Analytics Engine SQL; src/analytics/moderationQueries.ts composes it and builds the summary only.

Risk and Mitigations

Risk	Mitigation
AI model returns non-JSON	Strip markdown code fences; retry with “output only JSON”; fallback to summary-only response.
Analytics Engine slow	Use parallel queries; limit result sizes; consider caching summary for 5–10 min if needed.
Report timeout > 2 s	Document; consider making AI optional (e.g. `?ai=false`) for fast summary-only.
Slack/email secrets	Document required vars; no defaults; feature off when not set.

Next Steps

Kick off Phase 1: Implement getModerationSummary and ModerationSummary type; add blocked count to totals if needed.
Then Phase 2: Add AI binding, prompts, and client; validate with a manual test.
Then Phase 3: Wire route and orchestration; verify end-to-end under 2 s where feasible.
Then Phase 4 and 5: Add Slack/email/cron and complete documentation and example output.

This roadmap is ready for team review and phase-by-phase implementation.

AI-Powered Moderation Reporting System — Phased Roadmap ​

Current State (What Exists) ​

Target Architecture (High Level) ​

Phase 1: Analytics — 24h Aggregation & Structured Summary ​

1.1 Totals for last 24 hours ​

1.2 Moderation signals (top N, 24h) ​

1.3 Structured summary type ​

1.4 Performance ​

Phase 2: Worker AI — Binding, Prompt, Client ​

2.1 Wrangler and Env ​

2.2 Prompt design — src/ai/moderationPrompt.ts ​

2.3 AI client — src/ai/aiClient.ts ​

2.4 Types ​

Phase 3: Report Worker & API ​

3.1 Orchestration — src/worker/moderationReportWorker.ts ​

3.2 API endpoint ​

3.3 Routing ​

Phase 4: Automation — Slack, Email, Cron ​

4.1 Slack ​

4.2 Email (daily digest) ​

4.3 Cron ​

Phase 5: Documentation, Example Output, and Polish ​

5.1 Documentation ​

5.2 Example output ​

5.3 Performance and safety checklist ​

Phase Summary Table ​

Dependency Order ​

Suggested File Layout (Final) ​

Risk and Mitigations ​

Next Steps ​

AI-Powered Moderation Reporting System — Phased Roadmap

Current State (What Exists)

Target Architecture (High Level)

Phase 1: Analytics — 24h Aggregation & Structured Summary

1.1 Totals for last 24 hours

1.2 Moderation signals (top N, 24h)

1.3 Structured summary type

1.4 Performance

Phase 2: Worker AI — Binding, Prompt, Client

2.1 Wrangler and Env

2.2 Prompt design — `src/ai/moderationPrompt.ts`

2.3 AI client — `src/ai/aiClient.ts`

2.4 Types

Phase 3: Report Worker & API

3.1 Orchestration — `src/worker/moderationReportWorker.ts`

3.2 API endpoint

3.3 Routing

Phase 4: Automation — Slack, Email, Cron

4.1 Slack

4.2 Email (daily digest)

4.3 Cron

Phase 5: Documentation, Example Output, and Polish

5.1 Documentation

5.2 Example output

5.3 Performance and safety checklist

Phase Summary Table

Dependency Order

Suggested File Layout (Final)

Risk and Mitigations

Next Steps