
Executive Summary
The Client is a technology firm building a blockchain-based SaaS platform for tokenizing real-world assets. The platform requires a Generative AI infrastructure capable of processing enterprise documents, generating knowledge graphs, and performing intelligent content retrieval at scale.
Traditional external AI APIs introduced data residency risks, unpredictable costs, and infrastructure limitations. Teleglobal International designed and deployed a secure, GPU-powered GenAI platform entirely within The Client’s AWS environment, using Amazon SageMaker for model hosting and real-time inference.
| Metric | Result |
| AI requests / month | 12K+ |
| Knowledge graph entities | 500K+ |
| Avg. response latency | ~1.8s |
| Platform uptime | 99.9% |
| External API dependency | Zero |
- Entire GenAI platform runs within The Client’s own AWS environment
- GPU-accelerated inference on NVIDIA A10G via Amazon EC2 G5.2xlarge
- External AI API dependency fully eliminated
About The Client
The Client is a technology and development firm building a blockchain-based SaaS platform for tokenizing real-world assets including commodities, property, and physical goods.
The platform integrates four core capabilities:
- Blockchain infrastructure for decentralized asset tracking
- Semantic knowledge graphs for organizing multimodal knowledge
- GenAI-powered automation for complex document processing and reasoning
- Hybrid Retrieval Augmented Generation (RAG) pipelines for intelligent knowledge retrieval
Core platform modules include the Knowledge Graph Studio, Tokenized Asset Registry, and Agent Runtime Engine. All components run on GPU-powered infrastructure for real-time GenAI inference.
Business Use Case
The Client required a Generative AI platform capable of supporting:
- Multimodal document analysis across regulatory, compliance, and asset documentation
- Knowledge graph generation from structured and unstructured enterprise data
- Hybrid RAG-based knowledge retrieval for contextual insights
- Intelligent automation for asset tokenization processes
The platform must support real-time inference and operate completely within The Client’s own cloud environment to meet security and compliance requirements.
The Challenge
Traditional AI solutions based on external APIs introduced several problems that made them unsuitable for The Client’s requirements.
- Sensitive enterprise data would leave the organisation’s environment
- Unpredictable operational costs due to usage-based pricing at scale
- Lack of infrastructure control for enterprise-grade GenAI workloads
- Limitations in scaling GPU-intensive AI inference workloads
The Client required a secure, scalable GenAI infrastructure running entirely within its own AWS environment.
Model Selection
Step 1: Evaluation Criteria
Before selecting a model, Teleglobal defined five criteria the chosen model had to meet for The Client’s GenAI platform:
- Advanced reasoning capability: must interpret complex enterprise documents and generate accurate contextual insights
- Multimodal processing: must handle both structured and unstructured data sources including documents, tables, and metadata
- Real-time inference: must deliver low-latency responses suitable for application-level queries
- Scalability: must support GPU-accelerated distributed workloads on Amazon SageMaker infrastructure
- Security: must operate fully within The Client’s AWS account with no data leaving the environment
Step 2: Models Evaluated
Three model options were shortlisted and evaluated against The Client’s criteria:
- GPT-5-class Multimodal LLM on Amazon SageMaker: state-of-the-art multimodal reasoning, self-hosted within AWS
- Claude 3.5 Sonnet via Amazon Bedrock: managed foundation model, strong reasoning, private VPC endpoint
- Llama 3.1 70B (open-source, self-hosted on SageMaker): open-source generative model, fully self-managed
| Parameter | GPT-5-class LLM (SageMaker) ✔ Selected | Claude 3.5 Sonnet (Bedrock) | Llama 3.1 70B (Self-hosted) |
| Advanced reasoning | ✔ State-of-the-art multimodal | ✔ Strong reasoning | ⚠ Good, limited at 70B scale |
| Multimodal processing | ✔ Yes, text, docs, images, structured data | ✔ Yes, text and documents | ✘ Text only (70B variant) |
| Real-time inference on SageMaker | ✔ Native SageMaker endpoint | ⚠ Via Bedrock, not SageMaker | ✔ Native SageMaker endpoint |
| GPU-accelerated scaling | ✔ Full GPU control on EC2 G5 | ⚠ Managed, limited GPU control | ✔ Full GPU control on EC2 G5 |
| Data stays within AWS | ✔ Yes, fully self-hosted on SageMaker | ✔ Yes, via Bedrock VPC endpoint | ✔ Yes, self-managed |
| Fine-tunable on custom data | ✔ Yes, via SageMaker training jobs | ⚠ Limited fine-tuning options | ✔ Yes, open-source weights |
| Operational overhead | ✔ Managed via SageMaker endpoints | ✔ Fully managed by AWS Bedrock | ✘ Team must manage model + infra |
| CloudWatch integration | ✔ Native via SageMaker | ⚠ Partial via Bedrock metrics | ✔ Native via SageMaker |
Step 3: Why GPT-5-class LLM on SageMaker Was Selected
The GPT-5-class multimodal LLM was the strongest match across all five criteria. The key decision points:
- Claude 3.5 Sonnet: Claude 3.5 Sonnet on Bedrock is strong on reasoning and data sovereignty but runs as a managed Bedrock service, limiting direct GPU control, SageMaker-native endpoint management, and deep customisation for The Client’s GPU-intensive workloads.
- Llama 3.1 70B: Llama 3.1 70B is viable on data sovereignty and fine-tuning but text-only at the 70B scale, lacks multimodal processing capability, and requires the team to fully self-manage model serving, scaling, and infrastructure.
- GPT-5-class LLM: The GPT-5-class multimodal LLM delivered the strongest reasoning and document understanding, native multimodal support, full GPU control on EC2 G5 instances, and seamless SageMaker endpoint management — all within The Client’s AWS environment from day one.
GenAI Tasks Performed
The deployed platform supports the following GenAI capabilities:
- Multimodal document analysis: regulatory and compliance documents, asset metadata
- Named entity recognition from enterprise documents
- Relationship mapping for knowledge graph generation
- Hybrid Retrieval Augmented Generation (RAG) for contextual knowledge retrieval
- GenAI-powered automation within asset tokenization workflows
- Model fine-tuning and continuous learning on proprietary data
GenAI Processing Pipeline
The GenAI platform processes enterprise queries and documents across four layers:
- Application Layer: user queries and documents submitted to application services
- AI Processing Layer: requests routed to GenAI inference endpoints hosted on SageMaker
- Model Inference Layer: the LLM processes the request and generates embeddings or contextual responses
- Data and Knowledge Layer: hybrid RAG pipelines retrieve relevant knowledge graph data
Data and Knowledge Sources
The GenAI system processes and analyses multiple enterprise data sources:
- Regulatory and compliance documents
- Asset metadata and tokenization records
- Enterprise datasets stored in Amazon S3
- Knowledge graph relationships generated from structured data
GPU Infrastructure
GenAI inference workloads run on Amazon EC2 G5.2xlarge instances powered by NVIDIA A10G GPUs.
| Specification | Detail |
| GPU | NVIDIA A10G Tensor Core GPU |
| GPU Memory | 24 GB VRAM |
| CUDA Cores | 6,144 |
| vCPUs | 8 |
| System Memory | 32 GB |
| Network Bandwidth | Up to 25 Gbps |
GPU workload allocation:
| Workload | GPU Utilization |
| Regulatory document analysis and Q&A | 60% |
| Knowledge graph generation | 25% |
| Model fine-tuning and learning | 10% |
| Validation and QA | 5% |
Platform Architecture
The platform follows a GPU-accelerated GenAI pattern with four layers: Application Layer, AI Processing Layer, Model Inference Layer, and Data and Knowledge Layer.
GenAI Inference Layer
- Amazon SageMaker: model hosting and real-time GenAI inference endpoints
- Amazon EC2 G5.2xlarge: GPU compute for high-performance LLM inference
- Hybrid RAG pipelines: contextual knowledge retrieval from enterprise data sources
Data and Storage Layer
- Amazon S3: storage for datasets, model artefacts, training data, and enterprise content
Security and Governance
- Amazon VPC: private subnets for secure network isolation of all GenAI workloads
- AWS IAM: role-based access control with least-privilege policies
- Encryption at rest and in transit across all data stores and inference endpoints
- Restricted access to AI inference endpoints and enterprise data
Observability and Monitoring
- Amazon CloudWatch: inference latency, endpoint health, GPU utilization, request throughput
- Infrastructure performance monitoring and automated alerting
AWS Services Used
| Category | Service / Detail |
| AI and Compute | Amazon SageMaker: model hosting and real-time inference endpoints |
| GPU Compute | Amazon EC2 G5.2xlarge: NVIDIA A10G GPU for GenAI workloads |
| Storage | Amazon S3: datasets, model artefacts, and training data |
| Networking | Amazon VPC: secure networking and private infrastructure |
| Security | AWS IAM: role-based access control and least-privilege policies |
| Monitoring | Amazon CloudWatch: infrastructure performance and inference metrics |
Scalability and Reliability
The GenAI platform is designed to support enterprise-scale workloads:
- GPU-based inference scaling for high-throughput document processing
- Load distribution across SageMaker inference endpoints
- High-availability infrastructure for continuous GenAI operations
- Scalable microservices architecture supporting modular expansion
Cost Optimisation
The Client optimised infrastructure costs through:
- GPU instance right-sizing on EC2 G5.2xlarge for actual workload requirements
- Workload scheduling to maximise GPU utilisation efficiency
- Self-hosted model eliminating all external AI API usage fees
Results
| Metric | Result |
| AI inference requests processed | 12,000+ per month |
| Average response latency | ~1.8 seconds |
| Knowledge graph entities processed | 500,000+ |
| Platform availability | 99.9% uptime |
| External AI API dependency | Eliminated |
| Data residency | All GenAI workloads within The Client’s AWS environment |
What’s Next
The Client plans to continue enhancing its GenAI platform:
- Advanced multimodal AI capabilities for richer document and data processing
- Expanded knowledge graph analytics for deeper asset relationship mapping
- Continuous model optimisation through fine-tuning on proprietary enterprise data
- Enhanced monitoring and analytics for deeper GenAI inference insights
Deployment and Lifecycle Management
The GenAI platform follows modern DevOps and MLOps practices:
- Automated model deployment and versioning using Amazon SageMaker
- Infrastructure monitoring and alerting via Amazon CloudWatch
- Continuous improvement through model fine-tuning on new enterprise data
Dataset and model artefact versioning for reproducibility