Interact with PDF documents using natural language! This project leverages local large language models (LLMs) and embedding-based vector search to answer questions about PDF files efficiently and privately.
- Ask natural language questions about the content of your PDFs.
- Local inference using
llama3:8bvia Ollama — no data leaves your machine. - Fast and lightweight vector search with
DocArrayInMemorySearch. - Embedding powered by
nomic-embed-textfor semantic understanding.
- LLM:
llama3:8bvia Ollama - PDF Loader:
PyPDFLoaderfrom LangChain - Embeddings:
nomic-embed-text - Vector Store:
DocArrayInMemorySearch - Framework: Python + LangChain
- Python 3.10+
- Ollama installed and running
llama3model pulled via Ollama- Required Python packages installed (see
requirements.txtor instructions in the Usage section)
- Load a PDF document using PyPDFLoader.
- Generate embeddings with
nomic-embed-text. - Store and search using
DocArrayInMemorySearch. - Query using
llama3:8bfor context-aware responses.
- Add a simple web UI using Streamlit or Gradio
- Enable support for querying multiple PDFs
- Add persistent vector store option (e.g., FAISS or Chroma)
- Improve context retention and memory in conversations
This project is licensed under the MIT License. See the LICENSE file for details.