使用 LlamaIndex 存储和读取 Embedding 向量

jjj_web

243人浏览 · 2026-03-13 06:30:00

jjj_web · 2026-03-13 06:30:00 发布

使用 LlamaIndex 存储和读取 Embedding 向量

在使用 LlamaIndex 构建 RAG 系统时，Embedding 向量的存储与读取是工程落地的关键环节。如果不做持久化，每次重启应用或重建索引都需要重新计算所有文档的 embedding，不仅耗时，还可能产生额外的 API 费用（如果使用第三方 embedding 服务）。LlamaIndex 通过 Vector Store 抽象层，支持将向量及其对应的文本块、元数据持久化到各种向量数据库中，后续可以直接加载使用。

下面详细介绍如何实现 embedding 的存储与读取，并提供完整的代码示例。

一、核心概念

Vector Store：向量数据库的抽象接口，负责向量的增删改查。LlamaIndex 内置了多种实现，如 ChromaVectorStore、FAISSVectorStore、PineconeVectorStore 等。
StorageContext：存储上下文，用于指定向量存储、文档存储、索引存储的位置。通过它可以将索引数据持久化到磁盘或云数据库。
Index：索引对象（如 VectorStoreIndex）可以基于已有的向量存储构建，而不需要重新传入文档。

二、存储 Embedding 到向量数据库

1. 安装依赖

以 Chroma 为例，首先安装必要的库：

pip install llama-index chromadb llama-index-vector-stores-chroma

2. 初始化向量存储并构建索引

import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# 设置 embedding 模型（必须与后续加载时使用的模型一致）
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-large-zh-v1.5"
)

# 1. 加载文档（假设数据在 ./data 文件夹）
documents = SimpleDirectoryReader("./data").load_data()

# 2. 初始化 Chroma 客户端（持久化到本地磁盘）
db = chromadb.PersistentClient(path="./chroma_db")  # 数据会保存在 ./chroma_db 目录
chroma_collection = db.get_or_create_collection("tcm_keji")

# 3. 创建 ChromaVectorStore 对象
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# 4. 创建 StorageContext，指定 vector_store
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 5. 构建索引（文档会自动分块、生成 embedding 并存入 Chroma）
index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context,
    show_progress=True  # 显示进度条
)

# 此时，embedding 和文本块已经持久化到 ./chroma_db 目录

关键点：

PersistentClient 将数据保存到本地文件夹，重启后仍然存在。
get_or_create_collection 指定集合名称，可以按项目或领域划分。
storage_context 将向量存储与索引绑定，构建时自动写入。

三、读取已存储的 Embedding

下次启动应用时，无需重新加载文档和计算 embedding，直接从向量数据库加载索引即可。

import chromadb
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# 设置相同的 embedding 模型（必须一致！）
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-large-zh-v1.5"
)

# 1. 连接已有的 Chroma 数据库
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("tcm_keji")

# 2. 创建 vector_store 对象（指向现有集合）
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# 3. 创建 StorageContext（只需指定 vector_store）
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 4. 从已有的 vector_store 加载索引
index = VectorStoreIndex.from_vector_store(
    vector_store,
    storage_context=storage_context
)

# 5. 创建查询引擎并使用
query_engine = index.as_query_engine()
response = query_engine.query("麻黄汤的组成")
print(response)

注意：

from_vector_store 方法直接从现有的向量存储构建索引，不再需要传入文档。
必须使用与存储时完全相同的 embedding 模型，否则查询向量空间不一致，检索结果会出错。

四、其他向量存储的示例

1. FAISS（本地持久化）

FAISS 是 Meta 开源的相似性搜索库，支持本地文件存储。

pip install llama-index-vector-stores-faiss faiss-cpu

import faiss
from llama_index.vector_stores.faiss import FaissVectorStore

# 创建 FAISS 索引（假设向量维度为 768，根据你的 embedding 模型调整）
d = 768
faiss_index = faiss.IndexFlatL2(d)

# 持久化到本地文件
vector_store = FaissVectorStore(faiss_index=faiss_index)

# 构建索引时，需要将 faiss_index 与 storage_context 关联
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# 保存 FAISS 索引到文件（可选）
faiss.write_index(faiss_index, "faiss_index.bin")

# 加载时
faiss_index = faiss.read_index("faiss_index.bin")
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store)

2. Pinecone（云数据库）

Pinecone 是一个托管的向量数据库，适合生产环境。

pip install llama-index-vector-stores-pinecone pinecone-client

import pinecone
from llama_index.vector_stores.pinecone import PineconeVectorStore

# 初始化 Pinecone
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENV")
index_name = "tcm-index"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(index_name, dimension=768, metric="cosine")

pinecone_index = pinecone.Index(index_name)
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# 构建索引（自动上传 embedding 到 Pinecone）
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# 后续直接从 Pinecone 加载
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
index = VectorStoreIndex.from_vector_store(vector_store)

五、进阶：同时持久化文本和元数据

向量数据库存储的不仅是向量，通常还包括文本块和元数据。Chroma 和 Pinecone 等会自动保存这些内容。但在 FAISS 等纯向量库中，需要额外保存文本映射。LlamaIndex 的 StorageContext 还支持 DocStore 和 IndexStore，可以将文档和索引结构也持久化。

from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore

# 持久化到本地磁盘
docstore = SimpleDocumentStore.from_persist_dir("./persist")
index_store = SimpleIndexStore.from_persist_dir("./persist")
storage_context = StorageContext.from_defaults(
    docstore=docstore,
    index_store=index_store,
    vector_store=vector_store  # 向量存储单独指定
)

# 构建索引
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# 保存到磁盘
storage_context.persist(persist_dir="./persist")

# 重新加载时
from llama_index.core import load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./persist")
index = load_index_from_storage(storage_context)

这种方式会将所有数据（文档、索引结构、向量）保存到本地，但向量存储本身可能还是外部数据库（如 Chroma），需要分别管理。

六、中医领域的注意事项

模型一致性：中医领域可能使用微调后的 embedding 模型，务必保证存储和加载使用完全相同的模型（包括模型文件路径或 HuggingFace 模型名称）。
元数据保留：在存储时，确保文本块的元数据（如来源、章节、方剂名）也存入向量数据库，这样在检索时可以基于元数据过滤。
增量更新：如果知识库需要新增文档，可以继续向已有的向量存储添加数据。例如 Chroma 支持 add 操作，LlamaIndex 的 insert 方法也能实现增量。

# 新增文档
new_docs = SimpleDirectoryReader("./new_data").load_data()
for doc in new_docs:
    index.insert(doc)  # 自动计算 embedding 并插入向量库

七、总结

存储 embedding：通过 StorageContext 将向量存储指向持久化的向量数据库（如 Chroma、FAISS、Pinecone），然后在构建索引时自动写入。
读取 embedding：使用相同的向量数据库和 embedding 模型，通过 from_vector_store 加载索引，无需重新处理原始文档。
工具选择：本地开发推荐 Chroma（简单、持久化），生产环境可选用 Pinecone、Milvus 等云服务。
关键点：确保 embedding 模型一致，向量维度匹配。

通过这种方式，你可以构建一个可重启、可扩展的 RAG 系统，避免重复计算 embedding，大幅提升开发和部署效率。

MCP技术社区

欢迎加入 MCP 技术社区！与志同道合者携手前行，一同解锁 MCP 技术的无限可能！

更多推荐

MCP 控制平面的容灾与备份恢复策略

第一种是主备模式，主区域处理所有写入请求，备用区域处理读请求或处于待命状态。这种模式资源利用率高，切换时无中断，但实现复杂，需要处理数据一致性问题。：如果业务是核心交易系统，每天调用量超过百万次，需要极高的可用性，预算充足，可以使用双区域多活。：如果业务重要性较高，每天调用量在十万到百万次之间，需要合规保障，可以使用双区域主备。：如果业务重要性一般，每天调用量小于十万次，团队规模小，可以使用单区域

MCP技术社区

从 MVP 到 Product-Market Fit：AI Agent Harness Engineering 产品的迭代路径

墨白 MCP（Multi-Agent Collaboration Platform）是我在 2022 年 10 月创办的「墨白科技」的第一款产品——它是一个企业级的多 Agent 协作平台，帮助 SaaS 公司、传统制造业、金融机构等企业快速开发、部署、监控、迭代多 Agent 协作系统。

MCP技术社区

AutoGen 与 MCP 的深度集成——多 Agent 对话中的 MCP 调用

专家代理将结果返回用户代理。质检代理在后台分析对话，如果检测到用户情绪激动，自动标记需要人工介入。用户代理将问题升级到专家代理。函数执行过程中可以产生中间输出，用于向用户展示进度。收到消息后，它可以决定调用注册的函数，然后将结果作为新消息发送给其他。执行器支持同步调用、异步调用、流式调用。的内部测试中，原生集成比适配器模式的调用延迟降低约百分之二十。规范中读取元数据，自动生成对应的注册信息。用户代