
Executive Summary
Forte AI is an HR technology company developing AI-powered talent management and recruitment intelligence platforms. The platform replaces traditional static resumes and fragmented job boards with adaptive professional profiles that evolve dynamically based on skills, experience, achievements, and workforce trends.
To scale its GenAI capabilities, Forte AI partnered with Teleglobal International to build a secure, production-grade Generative AI platform on AWS. The solution uses open-source LLMs hosted via Ollama on Amazon EC2, running entirely within Forte AI’s own cloud environment, with no candidate data sent to external providers.
The platform integrates Retrieval-Augmented Generation (RAG), GPU-based inference, and enterprise-grade monitoring to deliver intelligent recruitment insights at scale.
| RAG Context-aware matching | GPU+CPU Hybrid inference infra | Auto Scaling on demand | 100% Data within AWS boundary |
- Open-source LLMs hosted on Amazon EC2 via Ollama, no external AI API costs
- RAG architecture delivering context-aware candidate matching and hiring insights
- All candidate and HR data processed within Forte AI’s secure AWS environment
About Forte AI
Forte AI is building an AI-native recruitment intelligence platform that transforms how organisations attract, evaluate, and retain talent.
The platform moves beyond keyword-based hiring by using GenAI to understand candidate skills, career progression, and workforce trends. Core AI-powered capabilities include:
- Context-aware candidate matching based on skills and experience patterns
- Conversational hiring assistants for recruiters and hiring managers
- Talent sentiment analysis across candidate interactions
- Predictive workforce insights and hiring trend analytics
As the platform scaled, Forte AI required a production-grade GenAI infrastructure that could handle enterprise workloads while keeping sensitive HR data entirely within its own cloud environment.
The Challenge
Traditional recruitment platforms rely on keyword matching and manual evaluation, leading to poor candidate-job alignment, slow hiring cycles, and limited workforce insights. Forte AI needed GenAI capabilities, but four specific challenges blocked production deployment.
- Context-Aware Candidate Matching
Existing systems used static profile data and keyword matching. Forte AI needed an AI system capable of understanding context, skills progression, and career patterns to surface strong candidates who might not match predefined keywords.
- Conversational Hiring Assistants
Recruiters needed a GenAI assistant capable of summarising candidate profiles, recommending candidates for roles, answering workforce queries, and providing hiring insights — going well beyond rule-based systems.
- Sensitive HR Data Governance
Recruitment platforms process highly sensitive data including personal candidate profiles, employment history, compensation expectations, and internal hiring records. Sending this data to external AI providers was not acceptable.
- Infrastructure Limitations
Forte AI had no production-grade infrastructure for deploying open-source LLMs. Specific gaps included:
- No scalable GPU-based AI inference infrastructure
- No standardised deployment pipeline for LLMs
- No observability tooling for model performance
- No RAG architecture for contextual knowledge retrieval
Model Selection
Step 1: Evaluation Criteria
Before shortlisting any model, Teleglobal defined five criteria the chosen model had to meet for Forte AI’s recruitment GenAI platform:
- Inference latency: must deliver near real-time responses suitable for live recruiter and hiring manager queries
- Conversational reasoning quality: must handle multi-turn dialogue, candidate profile summarisation, and contextual job matching
- RAG compatibility: must work effectively with Retrieval-Augmented Generation pipelines for grounding responses in real candidate and job data
- Self-hostable on AWS via Ollama: must run within Forte AI’s own EC2 infrastructure with no candidate data leaving the environment
- Cost efficiency: must eliminate per-query API charges, enabling predictable operational costs at scale
Step 2: Models Evaluated
Three open-source models were shortlisted and evaluated against Forte AI’s requirements. All three are compatible with Ollama and deployable on Amazon EC2.
- Llama 3.1 8B (Meta): general-purpose instruction-tuned model, strong reasoning, widely used for enterprise deployments
- Mistral 7B (Mistral AI): compact, efficient model known for strong performance-to-size ratio
- Gemma 2 9B (Google): instruction-tuned model with strong document understanding and knowledge retrieval capabilities
| Parameter | Llama 3.1 8B (Meta) ✔ Selected | Mistral 7B (Mistral AI) | Gemma 2 9B (Google) |
| Ollama compatibility | ✔ Full native support | ✔ Full native support | ✔ Full native support |
| Conversational reasoning quality | ✔ Strong, multi-turn instruction following | ⚠ Good, optimised for speed over depth | ✔ Strong, good document understanding |
| RAG pipeline compatibility | ✔ Excellent, strong context grounding | ✔ Good RAG performance | ✔ Good, strong at knowledge retrieval |
| HR domain adaptability | ✔ Strong instruction tuning for domain tasks | ⚠ General purpose, less domain depth | ⚠ General purpose, less HR-specific tuning |
| Inference latency on EC2 | ✔ Efficient on both CPU and GPU instances | ✔ Very fast, smallest model size | ⚠ Slightly heavier, higher compute need |
| Model size and cost | ✔ 8B parameters, efficient resource use | ✔ 7B parameters, lightest option | ⚠ 9B parameters, slightly higher cost |
| Community and enterprise support | ✔ Largest active community, Meta backing | ⚠ Strong but smaller ecosystem | ⚠ Growing, Google-backed |
| Fine-tuning capability | ✔ Yes, open weights, widely documented | ✔ Yes, open weights | ✔ Yes, open weights |
Evaluation method: Each model was tested on a representative set of recruitment tasks — candidate profile summarisation, job-to-candidate matching queries, and multi-turn hiring assistant conversations — using Forte AI’s internal content as the knowledge base for RAG retrieval. Models were assessed on response relevance, factual grounding, and conversational coherence by Teleglobal’s engineering team.
Step 3: Why Llama 3.1 8B Was Selected
Llama 3.1 8B delivered the strongest overall performance across the criteria that matter most for Forte AI’s recruitment use case.
- Mistral 7B: Mistral 7B was evaluated as the fastest and lightest option, but its optimisation for speed over reasoning depth made it less suitable for complex multi-turn hiring assistant conversations and nuanced candidate profile analysis. It performed well on simple queries but fell short on domain-specific recruitment tasks requiring deeper contextual understanding.
- Gemma 2 9B: Gemma 2 9B showed strong document understanding and knowledge retrieval capabilities but carries a slightly higher compute footprint at 9B parameters, adding unnecessary infrastructure cost for Forte AI’s workload profile. Its HR-domain adaptability was also less refined compared to Llama 3.1.
- Llama 3.1 8B selected: Llama 3.1 8B delivered the best balance of reasoning quality, RAG compatibility, inference efficiency, and HR domain adaptability. Its strong instruction-following capability made it the most effective model for candidate matching, hiring assistant queries, and workforce insights. The largest open-source community and Meta backing also ensures long-term support and ongoing model improvements.
The Solution
Teleglobal designed and deployed a fully managed GenAI architecture on AWS tailored specifically for recruitment intelligence. Instead of relying on external proprietary AI APIs, the platform uses open-source LLMs hosted entirely within Forte AI’s AWS environment.
GenAI Inference Layer
Open-source LLMs run via Ollama on Amazon EC2, supporting both CPU and GPU-based inference depending on workload requirements.
- CPU-based EC2 instances handle lightweight inference tasks and low-concurrency queries
- GPU-based EC2 instances handle high-performance model inference for complex recruitment analysis
- Hybrid approach optimises infrastructure cost while maintaining low latency for GenAI workloads
- Amazon EC2 Auto Scaling Groups scale compute capacity dynamically based on recruitment activity volume
Retrieval-Augmented Generation (RAG) Architecture
To improve response accuracy and relevance, Teleglobal implemented RAG pipelines connecting the LLM to Forte AI’s internal recruitment data.
Data sources integrated into the RAG pipeline:
- Candidate profiles and resume data
- Job descriptions and role requirements
- Talent analytics datasets
- Historical hiring workflow data
Relevant data is retrieved dynamically and injected into model prompts, enabling the GenAI system to generate context-aware responses grounded in real recruitment data. This approach significantly reduces hallucinations and improves the quality of candidate matching recommendations.
GenAI Capabilities Delivered
- Candidate profile analysis: AI summarisation of skills, experience, and career progression
- AI-powered job matching: context-aware candidate-to-role alignment using RAG
- Conversational hiring assistant: multi-turn Q&A for recruiters on candidates and workforce data
- Talent sentiment analysis: GenAI evaluation of candidate interactions and engagement signals
- Workforce intelligence insights: predictive analysis of hiring trends and talent availability
Backend Integration
The GenAI services integrate with Forte AI’s existing recruitment platform through internal APIs, connecting to candidate databases, recruitment workflows, and talent analytics systems. AI capabilities are embedded directly within the platform without requiring workflow changes.
Security and Compliance
Recruitment platforms process highly sensitive personal data. Security was a core design requirement from the start.
- Amazon VPC: all GenAI workloads run inside private subnets, isolated from public internet exposure
- AWS IAM: strict identity-based access control for all AI services and infrastructure
- Amazon S3 encryption: all model artefacts, logs, and candidate data encrypted at rest
- TLS encryption: all data encrypted in transit across services
- Role-based data access: access to models and datasets restricted by role permissions for HR data governance compliance
AWS Services Used
| Category | Service / Detail |
| Compute | Amazon EC2 (CPU and GPU instances for hybrid inference) |
| Model Runtime | Ollama: open-source LLM hosting on Amazon EC2 |
| Auto Scaling | Amazon EC2 Auto Scaling Groups |
| Storage | Amazon S3: model artefacts, logs, candidate data, RAG knowledge base |
| Networking | Amazon VPC with private subnets |
| Security | AWS IAM: identity-based access control |
| Monitoring | Amazon CloudWatch: instance performance, inference latency, API metrics |
| Visualisation | Grafana dashboards: AI request volume, GPU utilisation, system latency, cost metrics |
Observability and Monitoring
Teleglobal implemented comprehensive monitoring across the GenAI infrastructure to ensure stable production operations.
Infrastructure Monitoring
Amazon CloudWatch monitors:
- EC2 instance performance and health
- API request rates and throughput
- GenAI model inference latency
- Auto Scaling activity and capacity metrics
Performance Visualisation
Grafana dashboards provide real-time visibility into:
- AI request volume across recruitment workflows
- System latency for candidate matching and assistant queries
- GPU utilisation across inference workloads
- Infrastructure cost metrics
Logging and Auditing
All system events and GenAI interactions are logged, enabling Forte AI to audit model behaviour and ensure compliance with internal HR data governance policies.
Results
| Metric | Result |
| AI response latency | Optimised for near real-time recruitment workflows |
| Candidate-job matching accuracy | Significantly improved through RAG-grounded GenAI |
| External AI API dependency | Fully eliminated |
| Infrastructure scalability | Enabled via EC2 Auto Scaling Groups |
| Data security | All candidate and HR data within Forte AI’s AWS boundary |
| Cost model | Predictable, no per-query API charges |
| Monitoring | Real-time visibility via CloudWatch and Grafana |
| GenAI capabilities delivered | Candidate matching, hiring assistant, sentiment analysis, workforce insights |
“The GenAI platform Teleglobal built gives us the recruitment intelligence capabilities we needed to compete as an AI-native platform. Candidate matching is more accurate, our hiring assistant works in real time, and all of it runs inside our own AWS environment. We have full control over our data and our costs.”
— Forte AI
What’s Next
Forte AI continues to expand its GenAI capabilities on the platform built by Teleglobal. Planned enhancements:
- Advanced workforce insights: using GenAI models to identify hiring trends, talent shortages, and workforce patterns from internal data
- Continuous model optimisation: evaluating additional open-source models to improve reasoning quality and contextual understanding for recruitment tasks
- Cost optimisation: improving GPU utilisation efficiency and exploring alternative AI acceleration hardware for large-scale inference
- Responsible AI governance: implementing human-in-the-loop review processes and AI governance frameworks for ethical GenAI usage in recruitment