数据库 | 向量数据库 Zilliz 作为知识库接入 Dify | Chuck

更新_2025-04-15

数据库是 24 年 11 月底搭建好，然后 1 个月不到就收到了 Zilliz Cloud Pipelines 要下线的邮件通知……现在文档已经更新，不过新功能还未上线。解决流程已经不适用了，然而各大厂的向量数据库依然不是很好用（贵）。
Zilliz Cloud Pipelines 下线说明：https://docs.zilliz.com.cn/docs/pipelines-concepts
Zilliz Cloud Pipelines 服务正处在逐步下线中，将于 2025 年第二季度末停止服务，被 “Data In, Data Out” 的新功能取代。该功能旨在简化 Milvus 和 Zilliz Cloud 中的向量化流程。自 2025 年 1 月 10 日起，Zilliz Cloud Pipelines 将不再接受新用户注册。现有用户可在每月 100 元人民币免费试用额度内继续使用服务直至下线日期。该服务不提供 SLA 支持。建议您使用模型提供商的 Embedding API 或开源模型生成向量。

问题

应对的问题：Dify Professional 版知识库的容量小（200 MB）和长文检索召回准确度不准。
测试过的工具和问题
- Dify 给出的方案对接 AWS Bedrock 知识库，流程复杂。
  Dify 平台尚不能直接连接 AWS Bedrock Knowledge Base，需要开发团队参考 Dify 关于外部知识库连接的 API 定义，手动创建后端 API 服务，建立与 AWS Bedrock 的连接。
- 向量数据库 Zilliz、Qdrant、Weaviate，后两者在线服务还不成熟。
最终选择 Zilliz 向量数据库继续遇到的难点
- 问题文本需要自己向量化后再去 Zilliz 数据库做检索，Zilliz Pipeline 配置的是 bge-base-zh-v1.5，为了节省，必须寻找可以在线调用的 bge-base-zh-v1.5 ……
- 国内大厂基本只支持在线调用自家的文本向量模型 ……
- HuggingFace Inference API 可以直接调用的向量模型 bge-large-zh-v1.5（不包括 bge-base-zh-v1.5），并且 http response 可能＞20s ……
- HuggingFace Inference Endpoints 可以自己配置在线模型。需要 model.onnx 文件，该文件是一个通用的、跨平台的模型表示格式，适用于模型交换、部署和优化浏览了一下 bge-base-en-v1.5 有 onnx，而 bge-base-zh-v1.5 没有 ……

解决流程

下载 bge-base-zh-v1.5 到本地后，配置 model.onnx 文件

# Pythorch 框架
# 生成 bge-base-zh-v1.5 onnx 文件代码
import torch
from transformers import BertModel, BertConfig
import json

# 读取配置文件
with open('bge-base-zh-v1.5/config.json', 'r') as f:
  config = json.load(f)

# 创建 BertConfig 实例
bert_config = BertConfig(**config)

# 确保 config 中的参数与 BertModel 兼容
bert_config.output_hidden_states = False  # 如果不需要隐藏状态
bert_config.return_dict = True

# 加载模型
model = BertModel(bert_config)

# 加载模型权重（权重文件名为 'pytorch_model.bin'）
model_path = 'bge-base-zh-v1.5/pytorch_model.bin'
model.load_state_dict(torch.load(model_path), strict=False)
model.eval()

# 创建一个虚拟输入
# 假设输入为 batch_size=1, sequence_length=128
# 这里的输入是 token ids 和 attention masks
input_ids = torch.ones(1, 128, dtype=torch.long)
attention_mask = torch.ones(1, 128, dtype=torch.long)

# 导出模型为 ONNX
output_onnx = 'BAAI_bge_base_zh_v1.5_model.onnx'
torch.onnx.export(model,
                  (input_ids, attention_mask),
                  output_onnx,
                  opset_version=14,
                  do_constant_folding=True,
                  input_names=['input_ids', 'attention_mask'],
                  output_names=['last_hidden_state', 'pooler_output'],
                  dynamic_axes={
                    'input_ids': {0: 'batch_size', 1: 'sequence_length'},
                    'attention_mask': {0: 'batch_size', 1: 'sequence_length'},
                    'last_hidden_state': {0: 'batch_size', 1: 'sequence_length'},
                    'pooler_output': {0: 'batch_size'}
                  })

print(f"模型已导出为 ONNX 格式: {output_onnx}")

在 HuggingFace - Create a new model repository，同步配置好了的模型，然后部署 Deploy - Inference Endpoints
HuggingFace Inference Endpoints

调用自己部署的在线模型将问题文本向量化，Feature Extraction: https://huggingface.co/docs/api-inference/tasks/feature-extraction

import requests

API_URL = "https://<url>.endpoints.huggingface.cloud"
headers = {"Authorization": "Bearer <token>"}

def query(payload):
  response = requests.post(API_URL, headers=headers, json=payload)
  return response.json()

output = query({
  "inputs": "text",
})

print(output)

通过 Zilliz 知识库混合检索 Hybrid Search: https://docs.zilliz.com.cn/reference/restful/hybrid-search-v2

# 此操作根据矢量相似性和标量筛选搜索实体，并使用指定的策略对结果进行重新排序
# metricType 当前搜索所使用的度量类型。该值应与目标 Collection 的度量类型相同
# annsField 向量值字段
# strategy unsupported rank type COSINE
# limit 本次查询要求返回的最大 Entity 数量
# offset 本次查询结果中需要跳过的 Entity 数量
# limit 与 offset 的总和不能超过 16384

import requests

# 混合检索请求的 url
url = "https://<集群 ID>.serverless.ali-cn-hangzhou.cloud.zilliz.com.cn/v2/vectordb/entities/hybrid_search"

headers = {
  "Authorization": "Bearer <token>",
  "Content-Type": "application/json"
}

payload = {
  "collectionName": "<collectionName>",
  "search":[
      {
          "data": [
              [ 768 embeddings ]
          ],
          "annsField": "<Vector_Field>",
          "limit": 5,
          "offset": 0,
          "outputFields": [
              "*"
          ]
      }
  ],
  "rerank": {
      "strategy": "rrf",
      "params": {
          "k": 10
      }
  ],
  "outputFields": [
      "<Specified_Field>"
  ]
}

response = requests.post(url, headers=headers, json=payload)
response_json = response.json()

print(response.json())

本地生成文本向量，可以用作 Zilliz 在线向量检索

# 文本向量化用作查询词
# 768-bge-base-en-v1.5: https://huggingface.co/BAAI/bge-base-en-v1.5
# 768-bge-base-zh-v1.5: https://huggingface.co/BAAI/bge-base-zh-v1.5
# 1024-bge-large-zh-v1.5: https://huggingface.co/BAAI/bge-large-zh-v1.5
from transformers import AutoTokenizer, AutoModel
import torch

# 加载模型和分词器
model_name = "bge-base-zh-v1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# 输入文本
text = "文本"

# 对文本进行分词，并设置最大长度为 512
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

# 使用模型生成向量
with torch.no_grad():
  outputs = model(**inputs)

# 获取最后一层的隐藏状态（即向量表示）
embeddings = outputs.last_hidden_state

# 通常我们取第一个 token 的向量作为句子的向量表示
sentence_embedding = embeddings[:, 0, :].squeeze()

# 确保向量维度
assert sentence_embedding.size(0) == 768, "向量维度不为 768"

# 将张量转换为列表并输出
sentence_embedding_list = sentence_embedding.tolist()
print(sentence_embedding_list)