Model Online / On-Site
Modules 10 modules

Course Description

This advanced course covers sophisticated RAG architectures used at the enterprise level.

Based on a modern retrieval pipeline, you will take an in-depth look at techniques such as hybrid search, ColBERT, and reranking. Then, you will learn how to integrate structured information into this system using GraphRAG, and how to empower your system with autonomous reasoning and verification capabilities using self-correcting Agentic RAG structures.

You will learn how to build a production-level, advanced system by addressing critical production requirements such as GPU acceleration, caching, and security in hands-on labs.

Target Audience

  • ML engineers deploying RAG systems into production

  • Senior software developers optimizing existing RAG implementations

  • AI engineers designing secure and compliant information systems

  • Technical leaders managing large-scale RAG infrastructures

  • Security engineers strengthening LLM applications


Prerequisites

  • Strong Python programming skills

  • Experience with basic RAG implementations

  • General understanding of vector databases and embedding models

  • Familiarity with LLM APIs and prompt engineering

  • Knowledge of distributed systems and caching strategies

Outcomes

Participants who complete this course will be proficient in the following areas:

  • Designing and implementing hybrid retrieval systems with BM25-dense fusion and neural reranking

  • Creating adaptive routers that intelligently choose between RAG and long-context processing

  • Using GraphRAG to holistically interpret the entire knowledge pool and make inferences based on local connections in the data.

  • Setting up temporally sensitive retrieval systems for time-sensitive queries and real-time updates

  • Creating comprehensive evaluation frameworks beyond basic metrics with citation verification

  • Hardening RAG systems against prompt injection and applying OWASP LLM Top 10 defense strategies

  • Optimizing performance with GPU-accelerated search and smart caching strategies

Curriculum

Module 1 - Modern Hybrid Retrieval and Routing

Hybrid Retrieval Fundamentals

BM25-Dense Fusion Strategies

  • Keyword and semantic search combination

  • Reciprocal rank fusion algorithms

  • Weighted scoring approaches

  • Query-dependent weight adjustment

  • Performance benchmarking methods

Late-Interaction Retriever

  • ColBERT architecture and benefits

  • PLAID for efficient retrieval

  • Token-level matching strategies

  • Balance between storage and computation

  • Application considerations

Neural Reranking Pipeline

Cross-Encoder Reranking

  • Bi-encoders architecture comparison

  • Multi-stage reranking cascades

  • Computational cost optimization

  • Domain-specific fine-tuning

  • Batch processing strategies

LLM-Based Rerankers

  • Prompt engineering for reranking

  • List-based and pair-based ranking comparison

  • Cost-latency trade-off

  • Consistency and reliability

  • Integration patterns

RAG and Long-Context Routing

Adaptive Routing Strategies

  • Query complexity assessment

  • Cost-accuracy optimization

  • Dynamic threshold determination

  • Fallback mechanisms

  • Performance monitoring

Context Window Management

  • Token budget allocation

  • Context compression techniques

  • Chunking for long contexts

  • Hybrid RAG-context approaches

  • Model selection criteria

Module 2 - Self-Correcting and Adaptive RAG

Self-RAG Architecture

Retrieval Necessity Gates

  • Query classification for retrieval necessity

  • Confidence scoring mechanisms

  • Dynamic retrieval triggers

  • Cost optimization through selective retrieval

  • Performance impact analysis

Verification and Improvement

  • Relevance evaluation loops

  • Support verification mechanisms

  • Critique generation strategies

  • Iterative improvement loops

  • Quality threshold management

Corrective RAG Patterns

Answer Verification Pipeline

  • Factual consistency check

  • Contradiction detection systems

  • Source attribution verification

  • Trust calibration

  • Automatic correction strategies

Conflict Resolution

  • Multi-source conflict management

  • Temporal conflict resolution

  • Authority weighting systems

  • Consensus building strategies

  • User preference integration

Multi-Agent Orchestration

Mixture-of-Agents Design

  • Agent specialization patterns

  • Workflow orchestration frameworks

  • Communication protocols

  • Result fusion methods

  • Error management and recovery

Cost and Performance Balance

  • Agent selection strategies

  • Parallel and sequential execution comparison

  • Resource allocation optimization

  • Latency management

  • Decisions between quality and speed

Module 3 - GraphRAG and Structured Knowledge

GraphRAG Application

Entity Graph Creation

  • Entity and relationship extraction

  • Graph schema design

  • Community detection algorithms

  • Hierarchical summarization

  • Scalability considerations

Graph-Enhanced Retrieval

  • Local and global retrieval strategies

  • Multi-hop reasoning patterns

  • Path ranking algorithms

  • Subgraph extraction

  • Query-driven traversal

Hybrid Graph-Vector Systems

Integration Strategies

  • Semantic and structural search fusion

  • Entity linking pipelines

  • Knowledge graph embeddings

  • Cross-modal retrieval

  • Result fusion techniques

Temporal Knowledge Graphs

  • Time-aware relationships

  • Event sequence modeling

  • Temporal consistency checking

  • Version-aware retrieval

  • Historical analysis patterns

Layout-Aware Document Processing

Understanding Structured Documents

  • Table extraction and parsing

  • Graph and figure analysis

  • Form field mapping

  • Multi-column layout management

  • Document hierarchy preservation

Multimodal RAG Integration

  • Vision-language model integration

  • OCR and text extraction pipeline

  • Image-text alignment

  • Cross-modal search strategies

  • Quality assurance for extracted content

Module 4 - Text-to-SQL RAG

RAG Fundamentals with SQL

Schema Context Management

  • Database schema embedding strategies

  • Indexing table and column descriptions

  • Relationship graph representation

  • Schema versioning and updates

  • Multi-database coordination

SQL Generation Pipeline

  • Few-shot example selection

  • Schema-aware prompt templates

  • Query validation and sanitization

  • Execution safety checks

  • Error recovery mechanisms

  • SQL Integration

Integration Patterns

  • SQL results as retrieval context

  • Document filtering with SQL predicates

  • Joining operations between sources

  • Transaction boundaries

  • Cache coherence

Module 5 - Query Processing and Understanding

Advanced Query Extension

HyDE and Query Generation

  • Hypothetical document embeddings

  • Multiple query variations

  • Query decomposition strategies

  • Techniques for preserving query intent

  • Performance impact analysis

Query Rewriting Strategies

  • Context-aware rewriting

  • Synonym expansion

  • Domain-specific terminology mapping

  • Ambiguity resolution

  • User preference learning

  • Router Engines

ML-Based Routing

  • Classification model architectures

  • Feature engineering for routing

  • Online learning strategies

  • A/B testing framework

  • Performance monitoring

Rule Engine Integration

  • Definition of business rules

  • Priority and precedence management

  • Dynamic rule updates

  • Conflict resolution

  • Audit and compliance

  • Intent Classification

Query Understanding Models

  • Intent classification design

  • Multi-label classification

  • Confidence scoring

  • Fallback management

  • Continuous improvement cycles

Module 6 - Temporal and Real-Time Retrieval

Time-Sensitive Indexing

Temporal Partitioning Strategies

  • Time-based sharding

  • Rolling window indexes

  • Event-driven partitioning

  • Archive management

  • Query routing based on time range

Freshness Scoring

  • Time-dependent decay functions

  • Novelty and relevance balance

  • Dynamic weight adjustment

  • User preference modeling

  • A/B testing freshness factors

  • Streaming Updates

Real-Time Ingestion Pipelines

  • Change data capture integration

  • Incremental embedding generation

  • Hot-swappable indexing strategies

  • Consistency guarantees

  • Backpressure management

Cache Invalidation Patterns

  • Event-driven invalidation

  • TTL strategies

  • Selective cache warming

  • Distributed cache consistency

  • Performance monitoring

Module 7 - Performance Enhancement Methods

GPU-Accelerated Search

Vector Index Optimization

  • HNSW and IVF-PQ selection

  • GPU memory management

  • Batch processing optimization

  • Scaling with multiple GPUs

  • Cost-performance analysis

Hardware Selection

  • Balance between GPU and CPU

  • Memory requirements

  • Networking considerations

  • Storage optimization

  • Cloud and on-premise decisions

  • Caching Infrastructure

Multi-Level Cache Design

  • Semantic cache implementation

  • Prompt and context caching

  • Cache invalidation strategies

  • Distributed cache patterns

  • Hit rate optimization

Cache Economics

  • Cost-benefit analysis

  • Balance between storage and computation

  • Cache sizing strategies

  • Eviction policies

  • Monitoring and alerting

Efficient Model Serving

Inference Optimization

  • vLLM integration patterns

  • TensorRT-LLM optimization

  • Quantization strategies

  • Batching and scheduling

  • Resource allocation

Load Balancing

  • Request distribution strategies

  • Health checking

  • Circuit breakers

  • Rate limiting

  • Auto-scaling strategies

Module 8 - Evaluation and Quality Assurance

Advanced Evaluation Metrics

Citation Fidelity Verification

  • Source attribution accuracy

  • Citation extraction verification

  • Context preservation check

  • Hallucination detection

  • Consistency scoring

Beyond RAGAS Metrics

  • Custom evaluation frameworks

  • Domain-specific metrics

  • Human evaluation integration

  • Automated quality gates

  • Regression testing

  • Production Monitoring

RAG-Specific Observability

  • Retrieval quality metrics

  • Embedding drift detection

  • Query pattern analysis

  • Cost tracking systems

  • Performance regression alerts

Drift Detection Systems

  • Distribution monitoring

  • Concept drift detection

  • Model performance tracking

  • Automatic retraining triggers

  • Alert thresholds

  • A/B Test Framework

Experiment Infrastructure

  • Online evaluation setup

  • Statistical significance testing

  • Feature flag management

  • Gradual rollout strategies

  • Results analysis pipeline

Decision Making

  • Metric interpretation

  • Trade-off analysis

  • Rollback criteria

  • Documentation practices

  • Stakeholder communication

Module 9 - Security and Compliance

Prompt Injection Defense

Attack Vector Mitigation

  • Preventing direct injection

  • Indirect injection via documents

  • Input sanitization strategies

  • Output validation framework

  • Detection and logging systems

Defense in Depth

  • Layered security approach

  • Isolation strategies

  • Privilege separation

  • Security monitoring

  • Incident response planning

  • OWASP LLM Top 10

Security Implementation

  • Threat modeling for RAG

  • Preventing data poisoning

  • Model denial-of-service (DoS) protection

  • Information disclosure controls

  • Supply chain security

Vulnerability Management

  • Security scanning

  • Dependency management

  • Patch management

  • Security testing

  • Compliance reporting

Module 10 - Hands-on Lab

Creating Enterprise-Ready RAG

Core System Application

  • Hybrid retrieval setup with reordering

  • Self-healing RAG configuration

  • GraphRAG pipeline construction

  • Router engine development

  • Security hardening exercises

Integration Challenges

  • API design and versioning

  • Error management patterns

  • Retry strategies

  • Circuit breaker implementation

  • Monitoring integration

Get in touch