RAG (Retriever-Augmented Generation) System Overview
main.py
- Setting Up Qdrant Collection
In this part, you are setting up the Qdrant client and creating a collection where embeddings will be stored:
Initialize Qdrant Client: You connect to the Qdrant service using the API key and Qdrant Cloud URL.
Create Collection: You define a new collection (
book_data
) in Qdrant with a specified vector size (1024) and distance metric (Cosine similarity).Result: You confirm that the collection is created successfully.
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
qdrant_client = QdrantClient(
url="https://your-qdrant-cloud-url",
api_key="your-api-key"
)
qdrant_client.create_collection(
collection_name="book_data", # Name of the collection
vectors_config=VectorParams(size=1024, distance=Distance.COSINE) # Vector size and distance metric
)
print("Collection 'book_data' created successfully!")
add_data.py
- Adding Data to Qdrant
In this script, you prepare embeddings for your text data and store them in Qdrant:
Generate Embeddings: You use an external API to generate embeddings for each text chunk, transforming text into a numerical representation (vector).
Store Embeddings: The generated embeddings are stored in Qdrant as
PointStruct
, where each point has a unique ID and is associated with its corresponding text.Process Multiple Text Chunks: The text chunks you provided (e.g., information about your laptop issue) are processed, and each chunk is converted into an embedding, which is then stored in Qdrant.
import requests
import numpy as np
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct
qdrant_client = QdrantClient(
url="https://your-qdrant-cloud-url",
api_key="your-api-key"
)
# Function to generate embeddings using an external API
def create_embedding(text):
url = 'http://localhost:11434/api/embeddings'
payload = {"model": "mxbai-embed-large", "prompt": text}
response = requests.post(url, json=payload)
response.raise_for_status()
embedding = response.json()['embedding']
return np.array(embedding)
# Example text data
text_chunks = [
"Hôm nay là thứ thứ ngày 25 tháng 9 năm 2024.",
# Additional text chunks here...
]
points = []
for i, chunk in enumerate(text_chunks):
embedding = create_embedding(chunk)
point = PointStruct(id=i + 1, vector=embedding.tolist(), payload={"text": chunk})
points.append(point)
# Store the points in Qdrant
operation_info = qdrant_client.upsert(
collection_name="book_data",
wait=True,
points=points
)
print("Embeddings and payloads stored in Qdrant successfully!")
generate_answer.py
- Retrieving and Generating Answers
Here, the system is designed to generate answers based on the question asked by a user. It retrieves relevant data from Qdrant and then uses a generative model to produce an answer:
Generate Embedding for Query: Given a user's question, you generate its embedding.
Search for Similar Vectors: You search for similar vectors in the Qdrant database using the query embedding.
Retrieve Context and Generate Answer: You gather the most relevant text chunks based on similarity and feed them into a generation model (e.g., Llama) to produce an answer.
import numpy as np
import requests
from qdrant_client import QdrantClient
client = QdrantClient(
url='https://your-qdrant-cloud-url',
api_key='your-api-key'
)
# Generate embedding for the query
def generate_embedding(prompt):
url = 'http://localhost:11434/api/embeddings'
payload = {"model": "mxbai-embed-large", "prompt": prompt}
response = requests.post(url, json=payload)
response.raise_for_status()
embedding = response.json().get('embedding')
return np.array(embedding)
# Search similar vectors in Qdrant
def search_similar_vectors(query_vector, collection_name="book_data", limit=50):
hits = client.search(
collection_name=collection_name,
query_vector=query_vector,
limit=limit
)
return hits
# Generate an answer using Llama API
def generate_answer_with_llama(prompt):
url = 'http://localhost:11434/api/generate'
payload = {"model": "llama3.1", "prompt": prompt, "stream": False}
response = requests.post(url, json=payload)
response.raise_for_status()
return response.json().get('response')
# Handle a question from the user
def handle_question(question):
query_vector = generate_embedding(question)
if query_vector is not None:
hits = search_similar_vectors(query_vector.tolist())
if hits:
context = " ".join([hit.payload['text'] for hit in hits])
prompt = f"Based on the following content:\n\n{context.strip()}\n\nAnswer the question: {question}"
answer = generate_answer_with_llama(prompt)
print("Generated Answer:", answer)
else:
print("No similar vectors found.")
else:
print("Failed to generate embedding for the question.")
# Example usage
user_question = "Tôi dùng búa tạ làm gì?"
handle_question(user_question)
Key Concepts in the RAG System
Embedding Generation: Embeddings are dense vector representations of text. You generate these embeddings using models like
mxbai-embed-large
or any other available embedding model.Search in Qdrant: Qdrant is used to store these embeddings and search for the most similar ones based on cosine similarity. It enables fast, scalable retrieval of related information.
Contextual Answer Generation: Once relevant context is retrieved from Qdrant, it is passed to a generative model like Llama (or GPT-based models), which generates the final answer.
Conclusion
The RAG system combines information retrieval (searching for relevant documents) with generative models to produce answers. The process involves storing embeddings in a vector database (Qdrant), querying that database for relevant content based on the user's question, and generating a response using the retrieved context. This system can be powerful for applications that need context-aware, real-time responses based on large datasets.
Message for Readers: Keep learning and applying new techniques every day to build more sophisticated systems. Don't forget to take notes!
#MachineLearning #Qdrant #RAG
Comments
Post a Comment