Teleglobal Developed a Secure GPU-Powered GenAI Platform on AWS for FutureCraft

Executive Summary

FutureCraft is building a blockchain-based, Agentic-AI SaaS platform for tokenizing real-world assets.

The platform needed LLM inference embedded directly into cloud-native microservices, without sending sensitive enterprise data to external providers.

A standard external API approach failed on three counts: data residency, latency, and compliance. Teleglobal delivered a VPC-isolated GenAI platform on Amazon Bedrock (Claude 3.5 Sonnet), with all inference fully contained within FutureCraft’s own AWS boundary.

58% Latency reduction

87% Faster provisioning

76% Fewer drift incidents

42% Retrieval latency drop

12K+ AI requests / month

SOC 2 and ISO 27001 compliance active from day one
All inference and data within FutureCraft’s AWS boundary, with zero public internet exposure
28 AWS services deployed in ap-south-1 in a single engagement

About FutureCraft

FutureCraft is building a blockchain-based, Agentic-AI powered SaaS platform for tokenizing real-world assets including commodities, property, and physical goods.

The platform integrates four core capabilities:

Blockchain infrastructure for decentralized asset tracking
Semantic knowledge graphs for organizing multimodal knowledge
AI agents that automate workflows and reasoning
Hybrid RAG pipelines for intelligent knowledge retrieval

Core modules: Knowledge Graph Studio, Tokenized Asset Registry, and Agent Runtime Engine.

All components run on GPU-powered infrastructure for real-time AI reasoning.

The Challenge

FutureCraft needed enterprise-grade AI infrastructure, but a standard external API approach created four blockers that made production deployment impossible.

Data Residency and Compliance

Routing sensitive enterprise content through externally hosted AI APIs conflicted with SOC 2 and ISO 27001 controls. Any production deployment required inference to remain within FutureCraft’s own AWS boundary.

Cost Unpredictability at Scale

Token-based billing from external providers created unacceptable budget variance at projected volumes of 2M+ inference requests per month.

Latency Under Production Load

Real-time reasoning use cases required sub-500ms response times. External API round-trip latency averaged ~780ms at p95 during testing, consistently missing the threshold.

No Observability Path

Without internal hosting, there was no way to instrument inference requests, trace errors, or integrate AI workloads into existing CloudWatch and CloudTrail governance pipelines.

The Solution

Teleglobal designed a production-ready GenAI architecture on AWS covering the complete AI lifecycle.

This included model selection, inference deployment, application hosting, data services, security, and observability, all provisioned through automated GitHub Actions pipelines from day one.

Model Selection

Evaluation Criteria

Before shortlisting any model, Teleglobal defined five criteria that the chosen model had to meet for FutureCraft’s use case:

Data sovereignty: must be deployable via a private VPC endpoint with no inference traffic leaving FutureCraft’s AWS environment
Advanced reasoning: must handle complex enterprise document interpretation, semantic understanding, and contextual response generation
Multimodal capability: must process both structured and unstructured data sources including documents, metadata, and knowledge graphs
Production latency: must support sub-500ms p95 response times under real application load
Compliance readiness: must support SOC 2 and ISO 27001 audit trails without additional instrumentation

Models Evaluated

Three model options were shortlisted and evaluated against FutureCraft’s criteria:

Claude 3.5 Sonnet via Amazon Bedrock, managed foundation model, private VPC endpoint access
GPT-4o via OpenAI API, externally hosted, leading proprietary model
Llama 3.1 70B self-hosted on SageMaker, open-source, self-managed inference

Parameter	Claude 3.5 Sonnet (Bedrock) ✔ Selected	GPT-4o (OpenAI API)	Llama 3.1 70B (Self-hosted)
VPC private endpoint (no public internet)	✔ Yes, native Bedrock VPC endpoint	✘ No, all traffic via OpenAI servers	✔ Yes, self-hosted on SageMaker
Data stays in AWS boundary	✔ Yes, fully within FutureCraft AWS	✘ No, data leaves to OpenAI	✔ Yes, self-managed
SOC 2 / ISO 27001 compliance fit	✔ Native AWS audit trails via CloudTrail	✘ External, cannot audit inference	⚠ Requires custom instrumentation
Advanced reasoning	✔ Strong, complex doc understanding	✔ Strong	⚠ Good, less capable at 70B scale
Multimodal support	✔ Yes, text, documents, structured data	✔ Yes	✘ Text only (70B variant)
Production latency	✔ ~330ms p95 (post-deployment)	✘ ~780ms p95, failed 500ms threshold	⚠ Variable, depends on GPU config
Cost model at 2M+ requests/month	✔ Predictable, AWS infrastructure cost	✘ Unpredictable, per-token billing	✔ Predictable, fixed compute cost
Managed infrastructure	✔ Fully managed by AWS	✔ Fully managed by OpenAI	✘ Team must manage model + infra
Native AWS service integration	✔ Direct: CloudWatch, CloudTrail, IAM	✘ External, API only	⚠ Partial, via SageMaker only

Why Claude 3.5 Sonnet on Bedrock Was Selected

Claude 3.5 Sonnet via Amazon Bedrock was the only option that satisfied all five criteria simultaneously. The decision points were:

GPT-4o rejected: GPT-4o was eliminated at Step 1, as all inference would route through OpenAI’s servers, directly violating SOC 2 and ISO 27001 data residency requirements. External API latency (~780ms p95) also consistently failed FutureCraft’s sub-500ms threshold during pre-engagement testing.
Llama 3.1 70B rejected: Llama 3.1 70B was viable on data sovereignty but introduced significant operational burden; the team would need to manage model serving, GPU infrastructure, scaling, and monitoring themselves. At the 70B parameter scale, reasoning quality was also below Claude 3.5 Sonnet for complex document interpretation tasks.
Claude 3.5 Sonnet selected: Claude 3.5 Sonnet on Bedrock delivered private VPC endpoint access, native CloudTrail audit integration, managed infrastructure with no operational overhead, and the strongest reasoning performance, all within FutureCraft’s AWS boundary from day one.

AI Workflow and Orchestration

FutureCraft uses a multi-agent AI architecture to automate complex workflows. It operates across four layers:

Application Layer: user queries and documents submitted to application services
AI Orchestration Layer: requests routed through LangGraph multi-agent workflows to inference endpoints
Model Inference Layer: Claude 3.5 Sonnet on Bedrock processes requests, generates embeddings and contextual responses
Data and Knowledge Layer: hybrid RAG pipelines retrieve relevant knowledge graph data from OpenSearch and Neptune

AI Tasks Performed

Multimodal document analysis: regulatory and compliance documents, asset metadata
Named entity recognition from enterprise documents
Relationship mapping for knowledge graph generation
Hybrid Retrieval Augmented Generation (RAG) for intelligent knowledge retrieval
Orchestration of multi-agent AI workflows via LangGraph

GPU Infrastructure

AI workloads run on Amazon EC2 G5.2xlarge instances powered by NVIDIA A10G GPUs.

Specification	Detail
GPU	NVIDIA A10G Tensor Core GPU
GPU Memory	24 GB VRAM
CUDA Cores	6,144
vCPUs	8
System Memory	32 GB
Network Bandwidth	Up to 25 Gbps

GPU workload allocation:

Workload	GPU Utilization
Regulatory document analysis and Q&A	60%
Knowledge graph generation	25%
Model fine-tuning and learning	10%
Validation and QA	5%

Solution Architecture

GenAI Inference Layer

Amazon Bedrock (Claude 3.5 Sonnet): private VPC endpoint only, no public internet exposure
API Gateway and Lambda: standardised REST interfaces for internal microservices
ElastiCache (Redis): inference context caching, 42% latency reduction vs. direct RDS
OpenSearch: vector embedding indexing, semantic search, and RAG-ready retrieval

Application and Data Layer

Amazon EKS: containerised workloads, EC2 worker nodes, Application Load Balancer
GitHub Actions: CI/CD pipeline automated from first deployment
AWS Amplify + Cognito: frontend hosting and authentication via IAM federation
RDS PostgreSQL Multi-AZ: relational data with automated backups
Neptune: graph queries; S3: all data stores with KMS encryption

Security and Compliance

WAF, GuardDuty, Secrets Manager, KMS: all active from deployment day
Least-privilege IAM and AWS Config drift detection: active from day one
CloudTrail exports to OpenSearch: compliance audit queries in hours, not weeks
SOC 2 and ISO 27001 posture active from day one, all workloads within FutureCraft’s AWS boundary

AWS Services Used

AI and Compute

Amazon Bedrock: Claude 3.5 Sonnet via private VPC endpoint
Amazon EC2 G5.2xlarge: NVIDIA A10G GPU for AI workloads
Amazon SageMaker: real-time model inference endpoints

Orchestration and Caching

Amazon EKS: container orchestration with EC2 worker nodes
API Gateway and AWS Lambda: REST interfaces for internal microservices
Amazon ElastiCache (Redis): inference context caching

Data and Storage

Amazon OpenSearch: vector embeddings, semantic search, RAG retrieval
RDS PostgreSQL Multi-AZ: relational data with automated backups
Amazon Neptune: graph-based queries
Amazon S3: datasets, model artefacts, logs (KMS encrypted)

Frontend and Access

AWS Amplify: frontend hosting with continuous deployment
Amazon Cognito: authentication federated into IAM roles

Security and Governance

AWS WAF, GuardDuty, Secrets Manager, KMS: active from deployment day
AWS IAM: least-privilege access across all resources
AWS Config: continuous drift detection
CloudTrail + CloudWatch: audit logs, metrics, compliance queries
GitHub Actions: CI/CD pipeline from day one

Results

Metric	Result
Inference latency	58% reduction (~780ms to ~330ms at p95)
Provisioning time	87% reduction (4.5 days to under 14 hours)
Configuration drift incidents	76% fewer vs. pre-engagement baseline
Context retrieval latency	42% reduction via ElastiCache vs. direct RDS
AI inference requests monthly	12,000+
Knowledge graph entities processed	500,000+
Platform uptime	99.9%
Compliance audit prep time	Weeks reduced to hours via CloudTrail + OpenSearch
Data sovereignty	All inference, data, and audit trails within FutureCraft AWS boundary
Recovery time objective (RTO)	Under 2 minutes (Multi-AZ RDS)
External AI API dependency	Eliminated
AWS services deployed	28 services in ap-south-1 in a single engagement

“Before this deployment, we had no credible path to production for GenAI. Data sovereignty was a blocker, external API latency was a blocker, and our compliance team would not sign off on any architecture that moved sensitive content outside our boundary. Teleglobal resolved all three in a single engagement. We went from blocked to running production inference on Bedrock, within our own VPC, with full audit trails, in under eight weeks. That timeline was not what we expected.”

— Kamleshwar Gupta
CEO at Futurecraft Technologies Pvt. Ltd

What’s Next

The platform is live and designed for continuous improvement. Planned next steps:

Advanced multimodal AI: expanding document processing to support richer asset types and formats
Expanded knowledge graph analytics: deeper relationship mapping across tokenized asset data
Improved AI agent orchestration: more autonomous multi-agent workflows with reduced human intervention
Continuous model optimisation: ongoing fine-tuning and benchmarking within the SageMaker infrastructure