Production Engagement · Anonymized

Multimodal content moderation on Amazon Bedrock Qwen 3 VL

Consumer social platform · age-restricted user cohort · production-live 2026-05-13

Amazon BedrockQwen 3 VLStep FunctionsAWS LambdaRekognition

Generative AI Applications (multimodal) · Consumer Social

Consumer Social Platform

(Customer name on file with VeUP; available to AWS Partner Validation on request.)

Production multimodal NSFW content moderation on Amazon Bedrock with the Qwen 3 VL model as the sole production classifier — selected after a structured head-to-head evaluation of five candidate engines.

The customer

A consumer social platform with an age-restricted user cohort (under-18 users and regulator-flagged accounts). The platform serves user-generated content moderation at consumer-social scale and is regulator-facing, with policy enforcement as a load-bearing product requirement.

The challenge

Pre-engagement, the customer had no production moderation pipeline for the age-restricted cohort. Internal stakeholders were debating Amazon Rekognition (managed moderation labels) versus a multimodal LLM approach on Amazon Bedrock, with no apples-to-apples data on accuracy, latency, cost, or operational fit for the actual content distribution. Buying decisions were being made on intuition. Customer-side: prior third-party-service-provider moderation benchmarks had been explored and were not preferred.

The solution

VeUP ran a Phase 1 head-to-head evaluation across five candidate classifier engines on an identical labeled dataset: Meta Llama 4 Maverick, Meta Llama Guard 4, Amazon Nova Premier, Anthropic Claude Opus, and Amazon Rekognition. Bedrock Qwen 3 VL (bedrock::qwen.qwen3-vl-235b-a22b) emerged as the sole production engine — a single-hop classifier-with-policy-prompt with the customer's Community Guidelines taxonomy and an asymmetric-loss policy (“err toward the explicit class”) encoded directly in the Qwen 3 VL system prompt. The model emits the structured per-frame pass/block verdict in a single Amazon Bedrock Converse API hop.

Around the Qwen 3 VL core, AWS provides the full production pipeline: AWS Step Functions with a Map state at MaxConcurrency=5 (a deliberate Bedrock-quota dial), containerized AWS Lambda (via Amazon ECR for ffmpeg), Amazon EventBridge Pipes, Amazon API Gateway with AWS_IAM SigV4 auth, Amazon S3, cross-account Amazon SNS/SQS with single-principal-scoped policies, AWS Secrets Manager (rotating bearer token), AWS IAM / IAM Identity Center (JumpCloud federated, zero standing keys), and AWS CloudWatch + CloudTrail — all Terraform-provisioned. Amazon Rekognition is retained behind the same engine-router contract for A/B and fallback. Production go-live was confirmed 2026-05-13.

Production multimodal moderation architecture: Amazon API Gateway (AWS_IAM SigV4) → AWS Lambda → Amazon Bedrock Converse (Qwen 3 VL classifier-with-policy-prompt) synchronous path; asynchronous AWS Step Functions Map state at MaxConcurrency=5 over extracted frames via containerized Lambda; EventBridge Pipes; cross-account SNS/SQS; K-of-N aggregator. — Production multimodal moderation architecture on Amazon Bedrock — Qwen 3 VL classifier-with-policy-prompt + engine router + K-of-N aggregator + Step Functions Map.

Architecture — AWS Well-Architected

Previous state, migration path, and target-state AWS architecture, annotated against the six AWS Well-Architected pillars.

Previous state → target state on AWS, mapped to the six AWS Well-Architected pillars. Grounded in the submitted competency evidence; customer identity anonymized throughout.

Production outcomes

KPI	Result
NSFW classification accuracy	99.56–99.58% on Bedrock Qwen 3 VL vs an Amazon Rekognition baseline of 43.48% NSFW recall. +45.6 percentage-point lift from naive baseline (54%) to production (99.58%) — from prompt engineering alone, no fine-tuning.
Cost per 1,000 evaluations	~$1.00 per 1,000 images end-to-end on the Phase 1 evaluation slice. Bedrock token economics: $0.20–$0.53 per million tokens.
Throughput	~1,000 images per 20–25 minutes on the Phase 1 evaluation pipeline. Production async pipeline runs Step Functions Map at MaxConcurrency=5 to stay inside Bedrock invocation quotas while keeping per-video walltime bounded.
Dataset	~50,000-image labeled corpus with a ~1,500-image Phase 1 evaluation slice covering all major NSFW categories.
AWS Partner Funding leverage	Both Phase 1 (POC) and Phase 2 (GTM Production) AWS-funded. Customer cash out-of-pocket effectively zero; Phase 3 contract sent.

In the customer's words

“We found the Qwen 3 VL had just the highest hit rate of any of them.”
— Customer CEO, 2026-03-02 Phase 1 results customer review (name on file)

“Bedrock wins.”
— Customer CEO, 2026-03-02

“Bedrock is really the path really clearly.”
— Customer CEO, 2026-03-11

AWS services in production

Amazon Bedrock (Qwen 3 VL bedrock::qwen.qwen3-vl-235b-a22b, production engine; Llama 4 Maverick + Llama Guard 4 + Nova Premier + Claude Opus evaluated) · Amazon Bedrock Converse API · AWS Step Functions (Map state, MaxConcurrency=5) · AWS Lambda (containerized via Amazon ECR for ffmpeg) · Amazon EventBridge Pipes · Amazon API Gateway (AWS_IAM / SigV4) · Amazon S3 · Amazon SNS/SQS (cross-account with single-principal-scoped policies) · AWS Secrets Manager · AWS IAM / IAM Identity Center · Amazon Rekognition (retained as comparison engine) · Amazon CloudWatch · AWS CloudTrail · Terraform provisioned end-to-end.