cms.teleglobals.com

How Teleglobal International Built a Secure GenAI-Powered Recruitment Intelligence Platform on AWS for Forte AI

How Teleglobal International Built a Secure GenAI-Powered Recruitment Intelligence Platform on AWS for Forte AI

 Executive Summary 

Forte AI is an HR technology company developing AI-powered talent management and recruitment intelligence platforms. The platform replaces traditional static resumes and fragmented job boards with adaptive professional profiles that evolve dynamically based on skills, experience, achievements, and workforce trends. 

To scale its GenAI capabilities, Forte AI partnered with Teleglobal International to build a secure, production-grade Generative AI platform on AWS. The solution uses open-source LLMs hosted via Ollama on Amazon EC2, running entirely within Forte AI’s own cloud environment, with no candidate data sent to external providers. 

The platform integrates Retrieval-Augmented Generation (RAG), GPU-based inference, and enterprise-grade monitoring to deliver intelligent recruitment insights at scale. 

RAG Context-aware matching GPU+CPU Hybrid inference infra Auto Scaling on demand 100% Data within AWS boundary 
  • Open-source LLMs hosted on Amazon EC2 via Ollama, no external AI API costs 
  • RAG architecture delivering context-aware candidate matching and hiring insights 
  • All candidate and HR data processed within Forte AI’s secure AWS environment 

About Forte AI 

Forte AI is building an AI-native recruitment intelligence platform that transforms how organisations attract, evaluate, and retain talent. 

The platform moves beyond keyword-based hiring by using GenAI to understand candidate skills, career progression, and workforce trends. Core AI-powered capabilities include: 

  • Context-aware candidate matching based on skills and experience patterns 
  • Conversational hiring assistants for recruiters and hiring managers 
  • Talent sentiment analysis across candidate interactions 
  • Predictive workforce insights and hiring trend analytics 

As the platform scaled, Forte AI required a production-grade GenAI infrastructure that could handle enterprise workloads while keeping sensitive HR data entirely within its own cloud environment. 

The Challenge 

Traditional recruitment platforms rely on keyword matching and manual evaluation, leading to poor candidate-job alignment, slow hiring cycles, and limited workforce insights. Forte AI needed GenAI capabilities, but four specific challenges blocked production deployment. 

  1. Context-Aware Candidate Matching 

Existing systems used static profile data and keyword matching. Forte AI needed an AI system capable of understanding context, skills progression, and career patterns to surface strong candidates who might not match predefined keywords. 

  1. Conversational Hiring Assistants 

Recruiters needed a GenAI assistant capable of summarising candidate profiles, recommending candidates for roles, answering workforce queries, and providing hiring insights — going well beyond rule-based systems. 

  1. Sensitive HR Data Governance 

Recruitment platforms process highly sensitive data including personal candidate profiles, employment history, compensation expectations, and internal hiring records. Sending this data to external AI providers was not acceptable. 

  1. Infrastructure Limitations 

Forte AI had no production-grade infrastructure for deploying open-source LLMs. Specific gaps included: 

  • No scalable GPU-based AI inference infrastructure 
  • No standardised deployment pipeline for LLMs 
  • No observability tooling for model performance 
  • No RAG architecture for contextual knowledge retrieval 

Model Selection 

Step 1: Evaluation Criteria 

Before shortlisting any model, Teleglobal defined five criteria the chosen model had to meet for Forte AI’s recruitment GenAI platform: 

  • Inference latency: must deliver near real-time responses suitable for live recruiter and hiring manager queries 
  • Conversational reasoning quality: must handle multi-turn dialogue, candidate profile summarisation, and contextual job matching 
  • RAG compatibility: must work effectively with Retrieval-Augmented Generation pipelines for grounding responses in real candidate and job data 
  • Self-hostable on AWS via Ollama: must run within Forte AI’s own EC2 infrastructure with no candidate data leaving the environment 
  • Cost efficiency: must eliminate per-query API charges, enabling predictable operational costs at scale 

Step 2: Models Evaluated 

Three open-source models were shortlisted and evaluated against Forte AI’s requirements. All three are compatible with Ollama and deployable on Amazon EC2. 

  • Llama 3.1 8B (Meta): general-purpose instruction-tuned model, strong reasoning, widely used for enterprise deployments 
  • Mistral 7B (Mistral AI): compact, efficient model known for strong performance-to-size ratio 
  • Gemma 2 9B (Google): instruction-tuned model with strong document understanding and knowledge retrieval capabilities 
Parameter Llama 3.1 8B (Meta) ✔ Selected Mistral 7B (Mistral AI) Gemma 2 9B (Google) 
Ollama compatibility ✔ Full native support ✔ Full native support ✔ Full native support 
Conversational reasoning quality ✔ Strong, multi-turn instruction following ⚠ Good, optimised for speed over depth ✔ Strong, good document understanding 
RAG pipeline compatibility ✔ Excellent, strong context grounding ✔ Good RAG performance ✔ Good, strong at knowledge retrieval 
HR domain adaptability ✔ Strong instruction tuning for domain tasks ⚠ General purpose, less domain depth ⚠ General purpose, less HR-specific tuning 
Inference latency on EC2 ✔ Efficient on both CPU and GPU instances ✔ Very fast, smallest model size ⚠ Slightly heavier, higher compute need 
Model size and cost ✔ 8B parameters, efficient resource use ✔ 7B parameters, lightest option ⚠ 9B parameters, slightly higher cost 
Community and enterprise support ✔ Largest active community, Meta backing ⚠ Strong but smaller ecosystem ⚠ Growing, Google-backed 
Fine-tuning capability ✔ Yes, open weights, widely documented ✔ Yes, open weights ✔ Yes, open weights 

Evaluation method: Each model was tested on a representative set of recruitment tasks — candidate profile summarisation, job-to-candidate matching queries, and multi-turn hiring assistant conversations — using Forte AI’s internal content as the knowledge base for RAG retrieval. Models were assessed on response relevance, factual grounding, and conversational coherence by Teleglobal’s engineering team. 

Step 3: Why Llama 3.1 8B Was Selected 

Llama 3.1 8B delivered the strongest overall performance across the criteria that matter most for Forte AI’s recruitment use case. 

  • Mistral 7B: Mistral 7B was evaluated as the fastest and lightest option, but its optimisation for speed over reasoning depth made it less suitable for complex multi-turn hiring assistant conversations and nuanced candidate profile analysis. It performed well on simple queries but fell short on domain-specific recruitment tasks requiring deeper contextual understanding. 
  • Gemma 2 9B: Gemma 2 9B showed strong document understanding and knowledge retrieval capabilities but carries a slightly higher compute footprint at 9B parameters, adding unnecessary infrastructure cost for Forte AI’s workload profile. Its HR-domain adaptability was also less refined compared to Llama 3.1. 
  • Llama 3.1 8B selected: Llama 3.1 8B delivered the best balance of reasoning quality, RAG compatibility, inference efficiency, and HR domain adaptability. Its strong instruction-following capability made it the most effective model for candidate matching, hiring assistant queries, and workforce insights. The largest open-source community and Meta backing also ensures long-term support and ongoing model improvements. 

The Solution 

Teleglobal designed and deployed a fully managed GenAI architecture on AWS tailored specifically for recruitment intelligence. Instead of relying on external proprietary AI APIs, the platform uses open-source LLMs hosted entirely within Forte AI’s AWS environment. 

GenAI Inference Layer 

Open-source LLMs run via Ollama on Amazon EC2, supporting both CPU and GPU-based inference depending on workload requirements. 

  • CPU-based EC2 instances handle lightweight inference tasks and low-concurrency queries 
  • GPU-based EC2 instances handle high-performance model inference for complex recruitment analysis 
  • Hybrid approach optimises infrastructure cost while maintaining low latency for GenAI workloads 
  • Amazon EC2 Auto Scaling Groups scale compute capacity dynamically based on recruitment activity volume 

Retrieval-Augmented Generation (RAG) Architecture 

To improve response accuracy and relevance, Teleglobal implemented RAG pipelines connecting the LLM to Forte AI’s internal recruitment data. 

Data sources integrated into the RAG pipeline: 

  • Candidate profiles and resume data 
  • Job descriptions and role requirements 
  • Talent analytics datasets 
  • Historical hiring workflow data 

Relevant data is retrieved dynamically and injected into model prompts, enabling the GenAI system to generate context-aware responses grounded in real recruitment data. This approach significantly reduces hallucinations and improves the quality of candidate matching recommendations. 

GenAI Capabilities Delivered 

  • Candidate profile analysis: AI summarisation of skills, experience, and career progression 
  • AI-powered job matching: context-aware candidate-to-role alignment using RAG 
  • Conversational hiring assistant: multi-turn Q&A for recruiters on candidates and workforce data 
  • Talent sentiment analysis: GenAI evaluation of candidate interactions and engagement signals 
  • Workforce intelligence insights: predictive analysis of hiring trends and talent availability 

Backend Integration 

The GenAI services integrate with Forte AI’s existing recruitment platform through internal APIs, connecting to candidate databases, recruitment workflows, and talent analytics systems. AI capabilities are embedded directly within the platform without requiring workflow changes. 

Security and Compliance 

Recruitment platforms process highly sensitive personal data. Security was a core design requirement from the start. 

  • Amazon VPC: all GenAI workloads run inside private subnets, isolated from public internet exposure 
  • AWS IAM: strict identity-based access control for all AI services and infrastructure 
  • Amazon S3 encryption: all model artefacts, logs, and candidate data encrypted at rest 
  • TLS encryption: all data encrypted in transit across services 
  • Role-based data access: access to models and datasets restricted by role permissions for HR data governance compliance 

AWS Services Used 

Category Service / Detail 
Compute Amazon EC2 (CPU and GPU instances for hybrid inference) 
Model Runtime Ollama: open-source LLM hosting on Amazon EC2 
Auto Scaling Amazon EC2 Auto Scaling Groups 
Storage Amazon S3: model artefacts, logs, candidate data, RAG knowledge base 
Networking Amazon VPC with private subnets 
Security AWS IAM: identity-based access control 
Monitoring Amazon CloudWatch: instance performance, inference latency, API metrics 
Visualisation Grafana dashboards: AI request volume, GPU utilisation, system latency, cost metrics 

Observability and Monitoring 

Teleglobal implemented comprehensive monitoring across the GenAI infrastructure to ensure stable production operations. 

Infrastructure Monitoring 

Amazon CloudWatch monitors: 

  • EC2 instance performance and health 
  • API request rates and throughput 
  • GenAI model inference latency 
  • Auto Scaling activity and capacity metrics 

Performance Visualisation 

Grafana dashboards provide real-time visibility into: 

  • AI request volume across recruitment workflows 
  • System latency for candidate matching and assistant queries 
  • GPU utilisation across inference workloads 
  • Infrastructure cost metrics 

Logging and Auditing 

All system events and GenAI interactions are logged, enabling Forte AI to audit model behaviour and ensure compliance with internal HR data governance policies. 

Results 

Metric Result 
AI response latency Optimised for near real-time recruitment workflows 
Candidate-job matching accuracy Significantly improved through RAG-grounded GenAI 
External AI API dependency Fully eliminated 
Infrastructure scalability Enabled via EC2 Auto Scaling Groups 
Data security All candidate and HR data within Forte AI’s AWS boundary 
Cost model Predictable, no per-query API charges 
Monitoring Real-time visibility via CloudWatch and Grafana 
GenAI capabilities delivered Candidate matching, hiring assistant, sentiment analysis, workforce insights 

“The GenAI platform Teleglobal built gives us the recruitment intelligence capabilities we needed to compete as an AI-native platform. Candidate matching is more accurate, our hiring assistant works in real time, and all of it runs inside our own AWS environment. We have full control over our data and our costs.” 

— Forte AI 

What’s Next 

Forte AI continues to expand its GenAI capabilities on the platform built by Teleglobal. Planned enhancements: 

  • Advanced workforce insights: using GenAI models to identify hiring trends, talent shortages, and workforce patterns from internal data 
  • Continuous model optimisation: evaluating additional open-source models to improve reasoning quality and contextual understanding for recruitment tasks 
  • Cost optimisation: improving GPU utilisation efficiency and exploring alternative AI acceleration hardware for large-scale inference 
  • Responsible AI governance: implementing human-in-the-loop review processes and AI governance frameworks for ethical GenAI usage in recruitment