สร้าง RAG รองรับ 10,000 คำถามได้อย่าง How I Built a RAG System That Answers 10,000 Questions Per Second

How I Built a RAG System That Answers 10,000 Questions Per Second

1. ปัญหาที่ต้องแก้

LLM ตอบเก่ง แต่มีปัญหา:

ช้าเมื่อข้อมูลเยอะ
ตอบมั่ว (Hallucination)
ข้อมูลไม่อัปเดต
Scale ไม่ได้

ทางออก:

👉 ใช้ RAG (Retrieval-Augmented Generation) + ออกแบบระบบให้รองรับ High Throughput

2. ภาพรวมสถาปัตยกรรม (High-Level Architecture)

User Query

↓

API Gateway (Load Balancer)

↓

Query Encoder (Embedding)

↓

Vector Search (ANN)

↓

Context Builder

↓

LLM Inference

↓

Response Cache

↓

User

3. เคล็ดลับที่ทำให้เร็วระดับ 10,000 QPS

🔹 1. แยก “Retrieval” กับ “Generation”

Retrieval → เร็ว, deterministic
Generation → แพง, ใช้เฉพาะจำเป็น

📌 หลักคิด:

อย่าให้ LLM ทำงาน ถ้าไม่จำเป็น

🔹 2. ใช้ Vector Database ที่เหมาะ

สิ่งที่ต้องมี:

Approximate Nearest Neighbor (ANN)
In-memory index
Parallel search

ตัวอย่างแนวคิด (ไม่จำเป็นต้องยึดชื่อ):

IVF / HNSW
Sharded index
CPU-friendly

🔹 3. Query Embedding ต้อง “เบา”

ใช้ embedding model ขนาดเล็ก
Preload model ไว้ใน RAM
Batch embedding

📌 เป้าหมาย:

Embedding < 2 ms ต่อ query

🔹 4. Context ไม่ต้องยาว

แทนที่จะ:

ส่ง 10–20 documents

ใช้:

Top-k = 3–5
Chunk สั้น
Rank ซ้ำอีกรอบ (Re-rank)

ผลลัพธ์:

LLM เร็วขึ้น
ตอบแม่นขึ้น
ค่าใช้จ่ายลด

🔹 5. Cache คือพระเอก

ใช้ Cache 3 ชั้น:

Query Cache

คำถามซ้ำ → ไม่ต้อง retrive ใหม่

Embedding Cache

ลดการคำนวณซ้ำ

Final Answer Cache

คำถามยอดนิยม → ตอบทันที

📌 QPS พุ่งทันทีหลายเท่า

6. ทำอย่างไรให้ Scale ถึง 10,000 QPS

แนวคิดหลัก

Stateless API
Horizontal Scaling
Async ทุกอย่าง

ตัวอย่าง

API → Auto scale
Vector DB → Sharding
LLM → Batch inference

7. ทำไม RAG แบบนี้ “เสถียร”

✔ ตอบจากข้อมูลจริง

✔ ควบคุม source ได้

✔ Debug ง่าย

✔ เปลี่ยน LLM ได้ทันที

✔ รองรับ production

8. บทเรียนสำคัญ

ความเร็วของ RAG

ไม่ได้ขึ้นกับ LLM

แต่ขึ้นกับ “ระบบรอบ LLM”

9. Use Cases ที่เหมาะ

AI Search
Chatbot องค์กร
Customer Support
Legal / Finance QA
Knowledge Assistant

สนใจบทความแนวไหน คอมเมนต์มาได้

ค..ตนดูระบบคอม

ค้นหาบล็อกนี้

สร้าง RAG รองรับ 10,000 คำถามได้อย่าง How I Built a RAG System That Answers 10,000 Questions Per Second

ความคิดเห็น

แสดงความคิดเห็น