Advanced RAG
This individual course is also available for enterprises.
Renkler tuhaf mı görünüyor? Samsung Internet tarayıcısı koyu modda site renklerini değiştiriyor olabilir. Kapatmak için Internet menüsünden Ayarlar → Kullanışlı Özellikler → Labs → Web site koyu temasını kullan seçeneğini etkinleştirebilirsiniz.
This individual course is also available for enterprises.
This advanced course covers sophisticated RAG architectures used at the enterprise level.
Based on a modern retrieval pipeline, you will take an in-depth look at techniques such as hybrid search, ColBERT, and reranking. Then, you will learn how to integrate structured information into this system using GraphRAG, and how to empower your system with autonomous reasoning and verification capabilities using self-correcting Agentic RAG structures.
You will learn how to build a production-level, advanced system by addressing critical production requirements such as GPU acceleration, caching, and security in hands-on labs.
ML engineers deploying RAG systems into production
Senior software developers optimizing existing RAG implementations
AI engineers designing secure and compliant information systems
Technical leaders managing large-scale RAG infrastructures
Security engineers strengthening LLM applications
Strong Python programming skills
Experience with basic RAG implementations
General understanding of vector databases and embedding models
Familiarity with LLM APIs and prompt engineering
Knowledge of distributed systems and caching strategies
Participants who complete this course will be proficient in the following areas:
Designing and implementing hybrid retrieval systems with BM25-dense fusion and neural reranking
Creating adaptive routers that intelligently choose between RAG and long-context processing
Using GraphRAG to holistically interpret the entire knowledge pool and make inferences based on local connections in the data.
Setting up temporally sensitive retrieval systems for time-sensitive queries and real-time updates
Creating comprehensive evaluation frameworks beyond basic metrics with citation verification
Hardening RAG systems against prompt injection and applying OWASP LLM Top 10 defense strategies
Optimizing performance with GPU-accelerated search and smart caching strategies
Keyword and semantic search combination
Reciprocal rank fusion algorithms
Weighted scoring approaches
Query-dependent weight adjustment
Performance benchmarking methods
ColBERT architecture and benefits
PLAID for efficient retrieval
Token-level matching strategies
Balance between storage and computation
Application considerations
Bi-encoders architecture comparison
Multi-stage reranking cascades
Computational cost optimization
Domain-specific fine-tuning
Batch processing strategies
Prompt engineering for reranking
List-based and pair-based ranking comparison
Cost-latency trade-off
Consistency and reliability
Integration patterns
Query complexity assessment
Cost-accuracy optimization
Dynamic threshold determination
Fallback mechanisms
Performance monitoring
Token budget allocation
Context compression techniques
Chunking for long contexts
Hybrid RAG-context approaches
Model selection criteria
Query classification for retrieval necessity
Confidence scoring mechanisms
Dynamic retrieval triggers
Cost optimization through selective retrieval
Performance impact analysis
Relevance evaluation loops
Support verification mechanisms
Critique generation strategies
Iterative improvement loops
Quality threshold management
Factual consistency check
Contradiction detection systems
Source attribution verification
Trust calibration
Automatic correction strategies
Multi-source conflict management
Temporal conflict resolution
Authority weighting systems
Consensus building strategies
User preference integration
Agent specialization patterns
Workflow orchestration frameworks
Communication protocols
Result fusion methods
Error management and recovery
Agent selection strategies
Parallel and sequential execution comparison
Resource allocation optimization
Latency management
Decisions between quality and speed
Entity and relationship extraction
Graph schema design
Community detection algorithms
Hierarchical summarization
Scalability considerations
Local and global retrieval strategies
Multi-hop reasoning patterns
Path ranking algorithms
Subgraph extraction
Query-driven traversal
Semantic and structural search fusion
Entity linking pipelines
Knowledge graph embeddings
Cross-modal retrieval
Result fusion techniques
Time-aware relationships
Event sequence modeling
Temporal consistency checking
Version-aware retrieval
Historical analysis patterns
Table extraction and parsing
Graph and figure analysis
Form field mapping
Multi-column layout management
Document hierarchy preservation
Vision-language model integration
OCR and text extraction pipeline
Image-text alignment
Cross-modal search strategies
Quality assurance for extracted content
Database schema embedding strategies
Indexing table and column descriptions
Relationship graph representation
Schema versioning and updates
Multi-database coordination
Few-shot example selection
Schema-aware prompt templates
Query validation and sanitization
Execution safety checks
Error recovery mechanisms
SQL Integration
SQL results as retrieval context
Document filtering with SQL predicates
Joining operations between sources
Transaction boundaries
Cache coherence
Hypothetical document embeddings
Multiple query variations
Query decomposition strategies
Techniques for preserving query intent
Performance impact analysis
Context-aware rewriting
Synonym expansion
Domain-specific terminology mapping
Ambiguity resolution
User preference learning
Router Engines
Classification model architectures
Feature engineering for routing
Online learning strategies
A/B testing framework
Performance monitoring
Definition of business rules
Priority and precedence management
Dynamic rule updates
Conflict resolution
Audit and compliance
Intent Classification
Intent classification design
Multi-label classification
Confidence scoring
Fallback management
Continuous improvement cycles
Time-based sharding
Rolling window indexes
Event-driven partitioning
Archive management
Query routing based on time range
Time-dependent decay functions
Novelty and relevance balance
Dynamic weight adjustment
User preference modeling
A/B testing freshness factors
Streaming Updates
Change data capture integration
Incremental embedding generation
Hot-swappable indexing strategies
Consistency guarantees
Backpressure management
Event-driven invalidation
TTL strategies
Selective cache warming
Distributed cache consistency
Performance monitoring
HNSW and IVF-PQ selection
GPU memory management
Batch processing optimization
Scaling with multiple GPUs
Cost-performance analysis
Balance between GPU and CPU
Memory requirements
Networking considerations
Storage optimization
Cloud and on-premise decisions
Caching Infrastructure
Semantic cache implementation
Prompt and context caching
Cache invalidation strategies
Distributed cache patterns
Hit rate optimization
Cost-benefit analysis
Balance between storage and computation
Cache sizing strategies
Eviction policies
Monitoring and alerting
vLLM integration patterns
TensorRT-LLM optimization
Quantization strategies
Batching and scheduling
Resource allocation
Request distribution strategies
Health checking
Circuit breakers
Rate limiting
Auto-scaling strategies
Source attribution accuracy
Citation extraction verification
Context preservation check
Hallucination detection
Consistency scoring
Custom evaluation frameworks
Domain-specific metrics
Human evaluation integration
Automated quality gates
Regression testing
Production Monitoring
Retrieval quality metrics
Embedding drift detection
Query pattern analysis
Cost tracking systems
Performance regression alerts
Distribution monitoring
Concept drift detection
Model performance tracking
Automatic retraining triggers
Alert thresholds
A/B Test Framework
Online evaluation setup
Statistical significance testing
Feature flag management
Gradual rollout strategies
Results analysis pipeline
Metric interpretation
Trade-off analysis
Rollback criteria
Documentation practices
Stakeholder communication
Preventing direct injection
Indirect injection via documents
Input sanitization strategies
Output validation framework
Detection and logging systems
Layered security approach
Isolation strategies
Privilege separation
Security monitoring
Incident response planning
OWASP LLM Top 10
Threat modeling for RAG
Preventing data poisoning
Model denial-of-service (DoS) protection
Information disclosure controls
Supply chain security
Security scanning
Dependency management
Patch management
Security testing
Compliance reporting
Hybrid retrieval setup with reordering
Self-healing RAG configuration
GraphRAG pipeline construction
Router engine development
Security hardening exercises
API design and versioning
Error management patterns
Retry strategies
Circuit breaker implementation
Monitoring integration