Back to Projects
Retrieval Augmented Generation from YouTube for Long-Form QA
LangChain, LLMs, Python, RAG — Transform Video Data into Searchable Knowledge
📋 Project Overview
Built a RAG pipeline transforming unstructured video data into searchable knowledge bases using transcription (OpenAI Whisper), chunking, embedding (LLaMA), and ranked retrieval refined with GPT. Reduced hallucinations and improved factual grounding by combining vector search with LLM-based reasoning.
Delivered a production-ready QA system capable of answering complex, multi-hour content queries.
⚡ Key Highlights
- Video-to-Knowledge Pipeline: Transcription, chunking, embedding, and ranked retrieval
- OpenAI Whisper: High-quality transcription for video content
- LLaMA Embeddings: Dense vector representations for semantic search
- GPT Refinement: LLM-based reasoning for grounded answers
- Reduced Hallucinations: Vector search + LLM reasoning for factual grounding
- Production-Ready: Capable of answering complex, multi-hour content queries
Skills Demonstrated
LangChain
RAG
LLMs
OpenAI Whisper
GPT
LLaMA
Vector Search
Python
NLP
More details and images coming soon.