Chat-PDF is a Python application for extracting, storing, and querying PDF content using OpenAI embeddings and MongoDB Atlas as a vector database. It allows you to parse PDF files, store content embeddings, and retrieve relevant information via similarity search.
-
Clone the Repository:
git clone https://github.com/folathecoder/chat-pdf.git
-
Install Dependencies::
python -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Run the Scripts::
First run the data ingestion pipeline:
python data_ingestion_pipeline.py
Then run the data retrieval pipeline:
python data_retrieval_pipeline.py