
Multimodal GenAI with Multi-Agent Orchestration on AWS
About the Customer
The customer is an ISV specializing in enterprise intelligent document processing and knowledge management, serving regulated industries including financial services, retail, healthcare, and manufacturing. The company provides AI-powered document processing solutions to enterprises that handle high volumes of sensitive business documents, including Aadhaar-bearing records, across the Indian market.
Customer Challenge
The customer relied on third-party AI APIs — including external OCR and language model services — for document processing across its platform. This dependency exposed sensitive enterprise data to external services, creating significant data sovereignty risks for customers in regulated industries. Aadhaar-bearing records and other personally identifiable information were being transmitted outside the customer’s controlled environment with no guarantee of data residency or compliance.
Beyond security, the external API model created unpredictable per-token costs that scaled linearly with document volume, making it difficult to forecast infrastructure spend. The APIs also offered limited customization — the customer could not fine-tune models for its specific document types, resulting in suboptimal classification and extraction accuracy. Without a change, the customer faced growing compliance exposure, escalating costs, and an inability to differentiate its platform with proprietary AI capabilities.
Solution
Teleglobal International designed and deployed a production-grade, fully AWS-native multimodal Generative AI platform in the Mumbai region (ap-south-1) over a 24-week engagement, eliminating all external AI API dependencies.
The solution implements a two-tier AI architecture. Tier 1 uses Amazon SageMaker to host a custom fine-tuned Qwen3-VL-8B-Instruct multimodal vision-language model that handles all document understanding — classification, extraction, summarization, and contextual processing across PDFs, images, and tables. The model was selected through a rigorous 5-criteria weighted evaluation, scoring 92/100 against four alternatives including managed API services and traditional OCR approaches. Tier 2 uses Amazon EC2 G5 instances with NVIDIA A10G GPUs for embedding generation, powering the hybrid retrieval layer.
The platform’s core differentiator is its agentic AI architecture. Four specialized AI agents — Classification, Extraction, Summarization, and Knowledge Graph — are orchestrated via a LangGraph-based multi-agent framework on Amazon EKS. These agents communicate through Agent-to-Agent (A2A) typed message schemas with conditional routing based on document type and confidence thresholds, parallel execution for concurrent processing, shared state management across all agents, and event-driven messaging that creates a self-orchestrating pipeline. This agentic approach means the platform reasons about each document, routing it through the optimal processing path.
The platform combines Amazon Neptune for knowledge graph generation — using Gremlin traversal for multi-hop entity relationship discovery — with Amazon OpenSearch Service for hybrid RAG-based semantic retrieval using k-NN/HNSW vector indexing. This dual approach delivers document intelligence that goes beyond simple extraction, enabling users to discover relationships across documents invisible to traditional processing.
Enterprise security is built into every layer: VPC isolation with private subnets, AWS WAF for application protection, Amazon GuardDuty for threat detection, AWS KMS customer-managed keys with 90-day rotation for encryption at rest, Amazon Macie for PII and Aadhaar detection and scanning, IAM least-privilege RBAC, and custom agent validation logic for content safety and hallucination prevention. All processing occurs within the customer’s AWS VPC with zero external data exposure.
Additional AWS services deployed include Amazon CloudFront for content delivery, Amazon API Gateway for rate limiting and authentication, AWS Amplify for the web portal, Amazon CloudWatch and AWS CloudTrail for monitoring and audit trails, Amazon Cognito for user authentication, and Elastic Load Balancing for Multi-AZ traffic distribution.
Results and Benefits
The platform achieved a 60–70% reduction in manual document processing effort through AI-driven automation across classification, extraction, summarization, and knowledge graph generation. The customer now processes 12,000+ AI inference requests per month at ~1.8-second average latency (p95) — replacing variable, SLA-less external API performance with predictable, sub-2-second response times.
Amazon Neptune generates 500,000+ knowledge graph entities per month, enabling enterprise-grade relationship intelligence across document corpora. The platform maintains 99.9% availability via Multi-AZ deployment with self-healing EKS, SageMaker endpoint failover, and Neptune/OpenSearch automatic failover. The customer achieved 100% elimination of external AI API dependencies and zero external data exposure — all enterprise documents, including Aadhaar-bearing records, are processed entirely within the customer’s AWS VPC. A production security audit confirmed zero critical findings. Infrastructure costs are now fully predictable at a fixed monthly rate, replacing the unpredictable per-token API pricing model.