cms.teleglobals.com

Teleglobal Developed a Secure GPU-Powered GenAI Platform on AWS for FutureCraft

Executive Summary 

FutureCraft is building a blockchain-based, Agentic-AI SaaS platform for tokenizing real-world assets. 

The platform needed LLM inference embedded directly into cloud-native microservices, without sending sensitive enterprise data to external providers. 

A standard external API approach failed on three counts: data residency, latency, and compliance. Teleglobal delivered a VPC-isolated GenAI platform on Amazon Bedrock (Claude 3.5 Sonnet), with all inference fully contained within FutureCraft’s own AWS boundary. 

58% Latency  reduction 87% Faster  provisioning 76% Fewer drift incidents 42% Retrieval latency drop 12K+ AI requests / month 
  • SOC 2 and ISO 27001 compliance active from day one 
  • All inference and data within FutureCraft’s AWS boundary, with zero public internet exposure 
  • 28 AWS services deployed in ap-south-1 in a single engagement 

About FutureCraft 

FutureCraft is building a blockchain-based, Agentic-AI powered SaaS platform for tokenizing real-world assets including commodities, property, and physical goods. 

The platform integrates four core capabilities: 

  • Blockchain infrastructure for decentralized asset tracking 
  • Semantic knowledge graphs for organizing multimodal knowledge 
  • AI agents that automate workflows and reasoning 
  • Hybrid RAG pipelines for intelligent knowledge retrieval 

Core modules: Knowledge Graph Studio, Tokenized Asset Registry, and Agent Runtime Engine. 

All components run on GPU-powered infrastructure for real-time AI reasoning. 

The Challenge 

FutureCraft needed enterprise-grade AI infrastructure, but a standard external API approach created four blockers that made production deployment impossible. 

  1. Data Residency and Compliance 

Routing sensitive enterprise content through externally hosted AI APIs conflicted with SOC 2 and ISO 27001 controls. Any production deployment required inference to remain within FutureCraft’s own AWS boundary. 

  1. Cost Unpredictability at Scale 

Token-based billing from external providers created unacceptable budget variance at projected volumes of 2M+ inference requests per month. 

  1. Latency Under Production Load 

Real-time reasoning use cases required sub-500ms response times. External API round-trip latency averaged ~780ms at p95 during testing, consistently missing the threshold. 

  1. No Observability Path 

Without internal hosting, there was no way to instrument inference requests, trace errors, or integrate AI workloads into existing CloudWatch and CloudTrail governance pipelines. 

The Solution 

Teleglobal designed a production-ready GenAI architecture on AWS covering the complete AI lifecycle. 

This included model selection, inference deployment, application hosting, data services, security, and observability, all provisioned through automated GitHub Actions pipelines from day one. 

Model Selection 

Evaluation Criteria 

Before shortlisting any model, Teleglobal defined five criteria that the chosen model had to meet for FutureCraft’s use case: 

  • Data sovereignty: must be deployable via a private VPC endpoint with no inference traffic leaving FutureCraft’s AWS environment 
  • Advanced reasoning: must handle complex enterprise document interpretation, semantic understanding, and contextual response generation 
  • Multimodal capability: must process both structured and unstructured data sources including documents, metadata, and knowledge graphs 
  • Production latency: must support sub-500ms p95 response times under real application load 
  • Compliance readiness: must support SOC 2 and ISO 27001 audit trails without additional instrumentation 

Models Evaluated 

Three model options were shortlisted and evaluated against FutureCraft’s criteria: 

  • Claude 3.5 Sonnet via Amazon Bedrock, managed foundation model, private VPC endpoint access 
  • GPT-4o via OpenAI API, externally hosted, leading proprietary model 
  • Llama 3.1 70B self-hosted on SageMaker, open-source, self-managed inference 
Parameter Claude 3.5 Sonnet (Bedrock) ✔ Selected GPT-4o (OpenAI API) Llama 3.1 70B (Self-hosted) 
VPC private endpoint (no public internet) ✔ Yes, native Bedrock VPC endpoint ✘ No, all traffic via OpenAI servers ✔ Yes, self-hosted on SageMaker 
Data stays in AWS boundary ✔ Yes, fully within FutureCraft AWS ✘ No, data leaves to OpenAI ✔ Yes, self-managed 
SOC 2 / ISO 27001 compliance fit ✔ Native AWS audit trails via CloudTrail ✘ External, cannot audit inference ⚠ Requires custom instrumentation 
Advanced reasoning ✔ Strong, complex doc understanding ✔ Strong ⚠ Good, less capable at 70B scale 
Multimodal support ✔ Yes, text, documents, structured data ✔ Yes ✘ Text only (70B variant) 
Production latency ✔ ~330ms p95 (post-deployment) ✘ ~780ms p95, failed 500ms threshold ⚠ Variable, depends on GPU config 
Cost model at 2M+ requests/month ✔ Predictable, AWS infrastructure cost ✘ Unpredictable, per-token billing ✔ Predictable, fixed compute cost 
Managed infrastructure ✔ Fully managed by AWS ✔ Fully managed by OpenAI ✘ Team must manage model + infra 
Native AWS service integration ✔ Direct: CloudWatch, CloudTrail, IAM ✘ External, API only ⚠ Partial, via SageMaker only 

Why Claude 3.5 Sonnet on Bedrock Was Selected 

Claude 3.5 Sonnet via Amazon Bedrock was the only option that satisfied all five criteria simultaneously. The decision points were: 

  • GPT-4o rejected: GPT-4o was eliminated at Step 1, as all inference would route through OpenAI’s servers, directly violating SOC 2 and ISO 27001 data residency requirements. External API latency (~780ms p95) also consistently failed FutureCraft’s sub-500ms threshold during pre-engagement testing. 
  • Llama 3.1 70B rejected: Llama 3.1 70B was viable on data sovereignty but introduced significant operational burden; the team would need to manage model serving, GPU infrastructure, scaling, and monitoring themselves. At the 70B parameter scale, reasoning quality was also below Claude 3.5 Sonnet for complex document interpretation tasks. 
  • Claude 3.5 Sonnet selected: Claude 3.5 Sonnet on Bedrock delivered private VPC endpoint access, native CloudTrail audit integration, managed infrastructure with no operational overhead, and the strongest reasoning performance, all within FutureCraft’s AWS boundary from day one. 

AI Workflow and Orchestration 

FutureCraft uses a multi-agent AI architecture to automate complex workflows. It operates across four layers: 

  • Application Layer: user queries and documents submitted to application services 
  • AI Orchestration Layer: requests routed through LangGraph multi-agent workflows to inference endpoints 
  • Model Inference Layer: Claude 3.5 Sonnet on Bedrock processes requests, generates embeddings and contextual responses 
  • Data and Knowledge Layer: hybrid RAG pipelines retrieve relevant knowledge graph data from OpenSearch and Neptune

AI Tasks Performed 

  • Multimodal document analysis: regulatory and compliance documents, asset metadata 
  • Named entity recognition from enterprise documents 
  • Relationship mapping for knowledge graph generation 
  • Hybrid Retrieval Augmented Generation (RAG) for intelligent knowledge retrieval 
  • Orchestration of multi-agent AI workflows via LangGraph

GPU Infrastructure 

AI workloads run on Amazon EC2 G5.2xlarge instances powered by NVIDIA A10G GPUs. 

Specification Detail 
GPU NVIDIA A10G Tensor Core GPU 
GPU Memory 24 GB VRAM 
CUDA Cores 6,144 
vCPUs 
System Memory 32 GB 
Network Bandwidth Up to 25 Gbps 

GPU workload allocation: 

Workload GPU Utilization 
Regulatory document analysis and Q&A 60% 
Knowledge graph generation 25% 
Model fine-tuning and learning 10% 
Validation and QA 5% 

Solution Architecture 

GenAI Inference Layer 

  • Amazon Bedrock (Claude 3.5 Sonnet): private VPC endpoint only, no public internet exposure 
  • API Gateway and Lambda: standardised REST interfaces for internal microservices 
  • ElastiCache (Redis): inference context caching, 42% latency reduction vs. direct RDS 
  • OpenSearch: vector embedding indexing, semantic search, and RAG-ready retrieval 

Application and Data Layer 

  • Amazon EKS: containerised workloads, EC2 worker nodes, Application Load Balancer 
  • GitHub Actions: CI/CD pipeline automated from first deployment 
  • AWS Amplify + Cognito: frontend hosting and authentication via IAM federation 
  • RDS PostgreSQL Multi-AZ: relational data with automated backups 
  • Neptune: graph queries; S3: all data stores with KMS encryption

Security and Compliance 

  • WAF, GuardDuty, Secrets Manager, KMS: all active from deployment day 
  • Least-privilege IAM and AWS Config drift detection: active from day one 
  • CloudTrail exports to OpenSearch: compliance audit queries in hours, not weeks 
  • SOC 2 and ISO 27001 posture active from day one, all workloads within FutureCraft’s AWS boundary 

AWS Services Used 

AI and Compute 

  • Amazon Bedrock: Claude 3.5 Sonnet via private VPC endpoint 
  • Amazon EC2 G5.2xlarge: NVIDIA A10G GPU for AI workloads 
  • Amazon SageMaker: real-time model inference endpoints

Orchestration and Caching 

  • Amazon EKS: container orchestration with EC2 worker nodes 
  • API Gateway and AWS Lambda: REST interfaces for internal microservices 
  • Amazon ElastiCache (Redis): inference context caching 

Data and Storage 

  • Amazon OpenSearch: vector embeddings, semantic search, RAG retrieval 
  • RDS PostgreSQL Multi-AZ: relational data with automated backups 
  • Amazon Neptune: graph-based queries 
  • Amazon S3: datasets, model artefacts, logs (KMS encrypted) 

Frontend and Access 

  • AWS Amplify: frontend hosting with continuous deployment 
  • Amazon Cognito: authentication federated into IAM roles 

Security and Governance 

  • AWS WAF, GuardDuty, Secrets Manager, KMS: active from deployment day 
  • AWS IAM: least-privilege access across all resources 
  • AWS Config: continuous drift detection 
  • CloudTrail + CloudWatch: audit logs, metrics, compliance queries 
  • GitHub Actions: CI/CD pipeline from day one 

Results 

Metric Result 
Inference latency 58% reduction (~780ms to ~330ms at p95) 
Provisioning time 87% reduction (4.5 days to under 14 hours) 
Configuration drift incidents 76% fewer vs. pre-engagement baseline 
Context retrieval latency 42% reduction via ElastiCache vs. direct RDS 
AI inference requests monthly 12,000+ 
Knowledge graph entities processed 500,000+ 
Platform uptime 99.9% 
Compliance audit prep time Weeks reduced to hours via CloudTrail + OpenSearch 
Data sovereignty All inference, data, and audit trails within FutureCraft AWS boundary 
Recovery time objective (RTO) Under 2 minutes (Multi-AZ RDS) 
External AI API dependency Eliminated 
AWS services deployed 28 services in ap-south-1 in a single engagement 

“Before this deployment, we had no credible path to production for GenAI. Data sovereignty was a blocker, external API latency was a blocker, and our compliance team would not sign off on any architecture that moved sensitive content outside our boundary. Teleglobal resolved all three in a single engagement. We went from blocked to running production inference on Bedrock, within our own VPC, with full audit trails, in under eight weeks. That timeline was not what we expected.” 

— Kamleshwar Gupta
CEO at Futurecraft Technologies Pvt. Ltd 

What’s Next 

The platform is live and designed for continuous improvement. Planned next steps: 

  • Advanced multimodal AI: expanding document processing to support richer asset types and formats 
  • Expanded knowledge graph analytics: deeper relationship mapping across tokenized asset data 
  • Improved AI agent orchestration: more autonomous multi-agent workflows with reduced human intervention 
  • Continuous model optimisation: ongoing fine-tuning and benchmarking within the SageMaker infrastructure