Q&A Chatbot with LLMs via Retrieval Augmented Generation

LLMRAGVector DBNLP

Motivation

Q&A systems with large language models (LLMs) have shown remarkable performance in generating human-like responses. However, LLMs often suffer from hallucination and generate plausible but incorrect information.

Introduction

RAG Pipeline

To address this issue, we developed a Q&A chatbot system that leverages retrieval-augmented generation (RAG). RAG allows users to vectorize and store documents (PDF format) in a grounded database, and conducts similarity and semantic search to retrieve the most relevant information when a user asks a question. The retrieved information is then converted into a human-like response by the LLM.

Research Challenges

  • Choosing an LLM for the human-like text generation component.
  • Selecting an embedding model to vectorize the documents.
  • Generalizing the retrieval system to handle different document types.
  • Further mitigating hallucination in the LLM.
  • Finding evaluation metrics for retrieval system performance.
  • Fine-tuning the LLM if performance is not satisfactory.
  • Optimizing inference time and memory usage.
  • Developing a user-friendly web application interface.

Web Application

We developed a web application that allows users to upload PDF documents and ask questions. The application focuses on refining inference time, memory usage, and user interface.

RAG Web Application