How Teleglobal International Built a Secure GenAI-Powered Recruitment Intelligence Platform on AWS for Forte AI

Executive Summary

Forte AI is an HR technology company developing AI-powered talent management and recruitment intelligence platforms. The platform replaces traditional static resumes and fragmented job boards with adaptive professional profiles that evolve dynamically based on skills, experience, achievements, and workforce trends.

To scale its GenAI capabilities, Forte AI partnered with Teleglobal International to build a secure, production-grade Generative AI platform on AWS. The solution uses open-source LLMs hosted via Ollama on Amazon EC2, running entirely within Forte AI’s own cloud environment, with no candidate data sent to external providers.

The platform integrates Retrieval-Augmented Generation (RAG), GPU-based inference, and enterprise-grade monitoring to deliver intelligent recruitment insights at scale.

RAG Context-aware matching

GPU+CPU Hybrid inference infra

Auto Scaling on demand

100% Data within AWS boundary

Open-source LLMs hosted on Amazon EC2 via Ollama, no external AI API costs
Used bedrock for testing and model finalization purposes

RAG architecture delivering context-aware candidate matching and hiring insights

All candidate and HR data processed within Forte AI’s secure AWS environment

About Forte AI

Forte AI is building an AI-native recruitment intelligence platform that transforms how organisations attract, evaluate, and retain talent.

The platform moves beyond keyword-based hiring by using GenAI to understand candidate skills, career progression, and workforce trends. Core AI-powered capabilities include:

Context-aware candidate matching based on skills and experience patterns

Conversational hiring assistants for recruiters and hiring managers

Talent sentiment analysis across candidate interactions

Predictive workforce insights and hiring trend analytics

As the platform scaled, Forte AI required a production-grade GenAI infrastructure that could handle enterprise workloads while keeping sensitive HR data entirely within its own cloud environment.

The Challenge

Traditional recruitment platforms rely on keyword matching and manual evaluation, leading to poor candidate-job alignment, slow hiring cycles, and limited workforce insights. Forte AI needed GenAI capabilities, but four specific challenges blocked production deployment.

Context-Aware Candidate Matching

Existing systems used static profile data and keyword matching. Forte AI needed an AI system capable of understanding context, skills progression, and career patterns to surface strong candidates who might not match predefined keywords.

Conversational Hiring Assistants

Recruiters needed a GenAI assistant capable of summarising candidate profiles, recommending candidates for roles, answering workforce queries, and providing hiring insights — going well beyond rule-based systems.

Sensitive HR Data Governance

Recruitment platforms process highly sensitive data including personal candidate profiles, employment history, compensation expectations, and internal hiring records. Sending this data to external AI providers was not acceptable.

Infrastructure Limitations

Forte AI had no production-grade infrastructure for deploying open-source LLMs. Specific gaps included:

No scalable GPU-based AI inference infrastructure

No standardised deployment pipeline for LLMs

No observability tooling for model performance

No RAG architecture for contextual knowledge retrieval

Model Selection

Step 1: Evaluation Criteria

Before shortlisting any model, Teleglobal defined five criteria the chosen model had to meet for Forte AI’s recruitment GenAI platform:

Inference latency: must deliver near real-time responses suitable for live recruiter and hiring manager queries

Conversational reasoning quality: must handle multi-turn dialogue, candidate profile summarisation, and contextual job matching

RAG compatibility: must work effectively with Retrieval-Augmented Generation pipelines for grounding responses in real candidate and job data

Self-hostable on AWS via Ollama: must run within Forte AI’s own EC2 infrastructure with no candidate data leaving the environment

Cost efficiency: must eliminate per-query API charges, enabling predictable operational costs at scale

Step 2: Models Evaluated

Three open-source models were shortlisted and evaluated against Forte AI’s requirements. All three are compatible with Ollama and deployable on Amazon EC2.

Llama 3.1 8B (Meta): general-purpose instruction-tuned model, strong reasoning, widely used for enterprise deployments

Mistral 7B (Mistral AI): compact, efficient model known for strong performance-to-size ratio

Gemma 2 9B (Google): instruction-tuned model with strong document understanding and knowledge retrieval capabilities

Parameter	Llama 3.1 8B (Meta) ✔ Selected	Mistral 7B (Mistral AI)	Gemma 2 9B (Google)
Ollama compatibility	✔ Full native support	✔ Full native support	✔ Full native support
Conversational reasoning quality	✔ Strong, multi-turn instruction following	⚠ Good, optimised for speed over depth	✔ Strong, good document understanding
RAG pipeline compatibility	✔ Excellent, strong context grounding	✔ Good RAG performance	✔ Good, strong at knowledge retrieval
HR domain adaptability	✔ Strong instruction tuning for domain tasks	⚠ General purpose, less domain depth	⚠ General purpose, less HR-specific tuning
Inference latency on EC2	✔ Efficient on both CPU and GPU instances	✔ Very fast, smallest model size	⚠ Slightly heavier, higher compute need
Model size and cost	✔ 8B parameters, efficient resource use	✔ 7B parameters, lightest option	⚠ 9B parameters, slightly higher cost
Community and enterprise support	✔ Largest active community, Meta backing	⚠ Strong but smaller ecosystem	⚠ Growing, Google-backed
Fine-tuning capability	✔ Yes, open weights, widely documented	✔ Yes, open weights	✔ Yes, open weights

Evaluation method: Each model was tested on a representative set of recruitment tasks — candidate profile summarisation, job-to-candidate matching queries, and multi-turn hiring assistant conversations — using Forte AI’s internal content as the knowledge base for RAG retrieval. Models were assessed on response relevance, factual grounding, and conversational coherence by Teleglobal’s engineering team.

Step 3: Why Llama 3.1 8B Was Selected

Llama 3.1 8B delivered the strongest overall performance across the criteria that matter most for Forte AI’s recruitment use case.

Mistral 7B: Mistral 7B was evaluated as the fastest and lightest option, but its optimisation for speed over reasoning depth made it less suitable for complex multi-turn hiring assistant conversations and nuanced candidate profile analysis. It performed well on simple queries but fell short on domain-specific recruitment tasks requiring deeper contextual understanding.

Gemma 2 9B: Gemma 2 9B showed strong document understanding and knowledge retrieval capabilities but carries a slightly higher compute footprint at 9B parameters, adding unnecessary infrastructure cost for Forte AI’s workload profile. Its HR-domain adaptability was also less refined compared to Llama 3.1.

Llama 3.1 8B selected: Llama 3.1 8B delivered the best balance of reasoning quality, RAG compatibility, inference efficiency, and HR domain adaptability. Its strong instruction-following capability made it the most effective model for candidate matching, hiring assistant queries, and workforce insights. The largest open-source community and Meta backing also ensures long-term support and ongoing model improvements.

The Solution

Teleglobal designed and deployed a fully managed GenAI architecture on AWS tailored specifically for recruitment intelligence. Instead of relying on external proprietary AI APIs, the platform uses open-source LLMs hosted entirely within Forte AI’s AWS environment.

GenAI Inference Layer

Open-source LLMs run via Ollama on Amazon EC2, supporting both CPU and GPU-based inference depending on workload requirements.

CPU-based EC2 instances handle lightweight inference tasks and low-concurrency queries

GPU-based EC2 instances handle high-performance model inference for complex recruitment analysis

Hybrid approach optimises infrastructure cost while maintaining low latency for GenAI workloads

Amazon EC2 Auto Scaling Groups scale compute capacity dynamically based on recruitment activity volume

Retrieval-Augmented Generation (RAG) Architecture

To improve response accuracy and relevance, Teleglobal implemented RAG pipelines connecting the LLM to Forte AI’s internal recruitment data.

Data sources integrated into the RAG pipeline:

Candidate profiles and resume data

Job descriptions and role requirements

Talent analytics datasets

Historical hiring workflow data

Relevant data is retrieved dynamically and injected into model prompts, enabling the GenAI system to generate context-aware responses grounded in real recruitment data. This approach significantly reduces hallucinations and improves the quality of candidate matching recommendations.

GenAI Capabilities Delivered

Candidate profile analysis: AI summarisation of skills, experience, and career progression

AI-powered job matching: context-aware candidate-to-role alignment using RAG

Conversational hiring assistant: multi-turn Q&A for recruiters on candidates and workforce data

Talent sentiment analysis: GenAI evaluation of candidate interactions and engagement signals

Workforce intelligence insights: predictive analysis of hiring trends and talent availability

Backend Integration

The GenAI services integrate with Forte AI’s existing recruitment platform through internal APIs, connecting to candidate databases, recruitment workflows, and talent analytics systems. AI capabilities are embedded directly within the platform without requiring workflow changes.

Security and Compliance

Recruitment platforms process highly sensitive personal data. Security was a core design requirement from the start.

Amazon VPC: all GenAI workloads run inside private subnets, isolated from public internet exposure

AWS IAM: strict identity-based access control for all AI services and infrastructure

Amazon S3 encryption: all model artefacts, logs, and candidate data encrypted at rest

TLS encryption: all data encrypted in transit across services

Role-based data access: access to models and datasets restricted by role permissions for HR data governance compliance

AWS Services Used

Category	Service / Detail
Compute	Amazon EC2 (CPU and GPU instances for hybrid inference)
Model Runtime	Ollama: open-source LLM hosting on Amazon EC2
Pre-trained Model	AWS Bedrock
Auto Scaling	Amazon EC2 Auto Scaling Groups
Storage	Amazon S3: model artefacts, logs, candidate data, RAG knowledge base
Networking	Amazon VPC with private subnets
Security	AWS IAM: identity-based access control
Monitoring	Amazon CloudWatch: instance performance, inference latency, API metrics
Visualisation	Grafana dashboards: AI request volume, GPU utilisation, system latency, cost metrics

Observability and Monitoring

Teleglobal implemented comprehensive monitoring across the GenAI infrastructure to ensure stable production operations.

Infrastructure Monitoring

Amazon CloudWatch monitors:

EC2 instance performance and health

API request rates and throughput

GenAI model inference latency

Auto Scaling activity and capacity metrics

Performance Visualisation

Grafana dashboards provide real-time visibility into:

AI request volume across recruitment workflows

System latency for candidate matching and assistant queries

GPU utilisation across inference workloads

Infrastructure cost metrics

Logging and Auditing

All system events and GenAI interactions are logged, enabling Forte AI to audit model behaviour and ensure compliance with internal HR data governance policies.

Results

Metric	Result
AI response latency	Optimised for near real-time recruitment workflows
Candidate-job matching accuracy	Significantly improved through RAG-grounded GenAI
External AI API dependency	Fully eliminated
Infrastructure scalability	Enabled via EC2 Auto Scaling Groups
Data security	All candidate and HR data within Forte AI’s AWS boundary
Cost model	Predictable, no per-query API charges
Monitoring	Real-time visibility via CloudWatch and Grafana
GenAI capabilities delivered	Candidate matching, hiring assistant, sentiment analysis, workforce insights

“The GenAI platform Teleglobal built gives us the recruitment intelligence capabilities we needed to compete as an AI-native platform. Candidate matching is more accurate, our hiring assistant works in real time, and all of it runs inside our own AWS environment. We have full control over our data and our costs.”

— Forte AI

What’s Next

Forte AI continues to expand its GenAI capabilities on the platform built by Teleglobal. Planned enhancements:

Advanced workforce insights: using GenAI models to identify hiring trends, talent shortages, and workforce patterns from internal data

Continuous model optimisation: evaluating additional open-source models to improve reasoning quality and contextual understanding for recruitment tasks

Cost optimisation: improving GPU utilisation efficiency and exploring alternative AI acceleration hardware for large-scale inference

Responsible AI governance: implementing human-in-the-loop review processes and AI governance frameworks for ethical GenAI usage in recruitment