
Executive Summary
Auxy AI provides a Conversational AI platform that automates customer engagement through agentic AI-powered voice capabilities. The platform delivers a 24/7 AI Voice Agent capable of handling inbound and outbound customer calls autonomously, using agentic AI to understand intent, respond in real time, and take action without human involvement.
As the platform scaled, Auxy AI needed production-grade AWS infrastructure to support real-time agentic AI inference, continuous voice workloads, multi-environment deployments, and enterprise-grade security. Teleglobal International designed and deployed a cloud-native containerised architecture on AWS to meet this need.
| 24/7 AI Voice Availability | 3 Isolated Environments | Zero Manual Release Steps | 6+ Security Controls Active | 99.9% Platform Uptime Target |
- Agentic AI voice inference powered by Amazon SageMaker real-time endpoints
- Automated CI/CD pipeline via GitHub Actions across all environments
- Full security stack active from day one: IAM, KMS, Secrets Manager, GuardDuty, WAF, CloudTrail
About Auxy AI
Auxy AI delivers an always-on agentic AI voice platform for enterprise customer engagement. The platform deploys AI voice agents that autonomously understand, respond to, and complete voice-based customer interactions at scale, without requiring human intervention.
Core capabilities include:
- Lead qualification during inbound and outbound voice calls
- Automated agentic AI responses to frequently asked questions
- Appointment scheduling and booking via voice
- Integration with enterprise backend systems for real-time data access
The Challenge
Auxy AI’s agentic AI voice platform was growing but running without a production-grade infrastructure. Four gaps were putting reliability and enterprise growth at risk.
1. Continuous Agentic AI Voice Workloads
The platform operates 24/7. Agentic AI inference for voice requires low latency, high reliability, and scalable compute. Any downtime or latency spike directly impacts live customer calls handled by AI voice agents.
2. Environment Segregation
The team needed separate Production, Development, and Testing environments to ensure safe testing and controlled releases without impacting live agentic AI voice services.
3. Deployment Automation
Manual infrastructure provisioning was causing configuration drift between environments, inconsistent security enforcement, and slow release cycles. A fully automated CI/CD pipeline was needed.
4. Security and Governance
The platform lacked enterprise-grade security controls. There was no secure credential management, no encryption baseline, no threat detection, and no audit logging. These were blockers for enterprise contracts.
Model Selection
Step 1: Evaluation Criteria
Before selecting an AI model for the agentic voice platform, Teleglobal defined five criteria the chosen model had to meet:
- Real-time inference latency: must deliver low-latency responses suitable for live voice conversations handled by AI agents
- Conversational reasoning quality: must handle multi-turn dialogue, intent recognition, and contextual response generation for agentic interactions
- Self-hostable on AWS: must run within Auxy AI’s own AWS environment via SageMaker, with no call data leaving the boundary
- Fine-tunable on custom data: must support training on Auxy AI’s proprietary conversation recordings to improve agent intent accuracy
- Scalability: must support GPU-accelerated distributed inference on SageMaker to handle concurrent agentic voice sessions
Step 2: Models Evaluated
Three model options were shortlisted and evaluated against Auxy AI’s agentic voice platform requirements:
- Fine-tuned Conversational LLM on Amazon SageMaker: self-hosted, customised for agentic voice interaction use cases
- GPT-4o via OpenAI API: externally hosted, general-purpose conversational model
- Llama 3.1 8B Instruct (open-source, self-hosted on SageMaker): lightweight open-source model, fully self-managed
| Parameter | Fine-tuned LLM on SageMaker (Selected) | GPT-4o (OpenAI API) | Llama 3.1 8B (Self-hosted) |
| Data stays within AWS | Yes, fully self-hosted on SageMaker | No, call data routes to OpenAI | Yes, self-managed on SageMaker |
| Real-time inference latency | Optimised for voice workloads | External API round-trip overhead | Low latency, lightweight model |
| Fine-tunable on custom data | Yes, via SageMaker training jobs | No custom training available | Yes, open-source weights |
| Conversational reasoning quality | Strong, optimised for voice intent | Strong general reasoning | Good, limited at 8B scale |
| GPU-accelerated scaling on SageMaker | Native SageMaker endpoint | Not applicable, external API | Native SageMaker endpoint |
| Cost model at scale | Predictable, fixed SageMaker cost | Unpredictable, per-token billing | Predictable, fixed compute cost |
| CloudWatch monitoring integration | Native via SageMaker metrics | External, no CloudWatch | Native via SageMaker |
| Operational overhead | Managed via SageMaker endpoints | Fully managed by OpenAI | Team must manage model and infra |
Step 3: Why the Fine-tuned LLM on SageMaker Was Selected
The fine-tuned conversational LLM on Amazon SageMaker was the strongest fit across all five criteria:
- GPT-4o rejected: All voice call data would route through OpenAI’s servers, which is unacceptable for enterprise clients with data privacy requirements. Per-token billing also creates unpredictable costs at scale, and there is no path to fine-tuning on Auxy AI’s proprietary recordings.
- Llama 3.1 8B rejected: Viable on data privacy and runs on SageMaker, but reasoning quality at the 8B scale was insufficient for complex multi-turn agentic voice conversations. The team would also need to fully manage model serving, scaling, and updates.
- Fine-tuned LLM selected: Delivers low-latency agentic AI inference optimised for voice, runs entirely within Auxy AI’s AWS environment, supports fine-tuning on proprietary call recordings for improved agent intent recognition, and integrates natively with CloudWatch for monitoring and cost visibility.
The Solution
Teleglobal designed and deployed a cloud-native infrastructure on AWS, purpose-built for agentic AI voice workloads. The platform covers the full stack: AI model hosting, container orchestration, CI/CD automation, data services, security, and observability.
Agentic AI Voice Inference
- Amazon SageMaker hosting the fine-tuned conversational LLM as a real-time inference endpoint for AI voice agents
- Low-latency agentic AI responses enabling autonomous handling of live voice call interactions
- SageMaker training pipeline ready for continuous model improvement using Auxy AI’s conversation recordings
- Amazon ElastiCache providing sub-millisecond session caching during active agentic voice calls
Container Infrastructure
- Amazon EKS orchestrating containerised agentic AI application services across three environments
- Production environment: 2 worker nodes for continuous voice workload handling
- Development and Testing environments: isolated 1-node clusters for safe pre-production work
- Amazon ECR managing container images with automated build and security scanning
- Application Load Balancer distributing traffic across EKS worker nodes
CI/CD and Deployment Automation
- GitHub Actions pipeline automating container image build, security scan, and EKS deployment
- Consistent deployments across all three environments, eliminating configuration drift
- Faster release cycles with no manual provisioning steps
Data and Storage
- Amazon RDS (MySQL): managed relational database for application data
- Amazon S3: storage for conversation recordings, model artefacts, and training data
- Amazon ElastiCache: application-layer caching for session and response data
Security — Active from Day One
- Amazon VPC: all workloads in private subnets with secure network segmentation
- AWS IAM: role-based access control with least-privilege policies
- AWS Secrets Manager: secure credential and configuration management
- AWS KMS: encryption at rest and in transit across all data stores
- Amazon GuardDuty: continuous threat monitoring and anomaly detection
- AWS WAF: external traffic protection at the application boundary
- AWS CloudTrail: 100% API and infrastructure activity logging for compliance
AWS Services Used
| Category | Service |
| AI/ML Enablement | Amazon SageMaker: GenAI model hosting and real-time inference endpoints |
| Container Orchestration | Amazon Elastic Kubernetes Service (EKS) |
| Container Registry | Amazon Elastic Container Registry (ECR) |
| Database | Amazon RDS (MySQL) |
| Object Storage | Amazon S3 |
| Caching | Amazon ElastiCache |
| Networking | Amazon VPC |
| Security | AWS IAM, AWS Secrets Manager, AWS KMS |
| Threat Detection | Amazon GuardDuty |
| Web Protection | AWS WAF |
| Monitoring | Amazon CloudWatch |
| Audit Logging | AWS CloudTrail |
Observability and Monitoring
Teleglobal implemented enterprise-grade monitoring across the entire agentic AI voice platform.
Infrastructure Monitoring
Amazon CloudWatch monitors:
- Agentic AI inference endpoint health and latency
- Container performance and application metrics
- System logs across all three environments
Audit and Security Monitoring
AWS CloudTrail captures all infrastructure and API activity, providing full operational transparency and the compliance audit trail enterprise clients require.
Results
| Metric | Result |
| AI Voice Availability | Continuous 24/7 agentic AI voice operations |
| Agentic AI Inference | Real-time, low-latency responses via SageMaker endpoints |
| Infrastructure Scalability | Multi-environment Kubernetes deployment across Production, Dev, and Testing |
| Deployment Automation | Full CI/CD pipeline via GitHub Actions, zero manual steps |
| Security Posture | Encryption, threat detection, and audit logging active from day one |
| Monitoring Visibility | Real-time observability across all services via CloudWatch |
| Operational Reliability | High-availability infrastructure for continuous voice workloads |
| Model Improvement Path | SageMaker training pipeline ready for fine-tuning on call recordings |
“The platform Teleglobal delivered gives us the foundation we needed to scale our agentic AI voice product with confidence. We can release faster, demonstrate security controls to enterprise clients on demand, and improve our AI models continuously using our own data. That combination is what enterprise growth requires.”
— Auxy AI
What’s Next
Auxy AI plans to continue expanding its agentic AI voice platform. Planned improvements include:
- Agentic voice capability expansion: fine-tuning the SageMaker model on proprietary conversation recordings to improve agent intent recognition and response quality
- Infrastructure optimisation: continuous right-sizing and performance tuning for agentic AI inference workloads
- Advanced monitoring: deeper analytics on agentic AI inference patterns and voice interaction quality via CloudWatch
About Teleglobal International
Teleglobal International is an AWS Partner specialising in cloud-native agentic AI infrastructure, enterprise application modernisation, and scalable AI platform delivery. Teleglobal helps organisations design and deploy production-grade AI systems that are secure, cost-efficient, and built to grow.