vLLM Production Stack Whisper API转录：如何集成语音识别模型

章炎滔

647人浏览 · 2026-05-08 15:33:17

章炎滔 · 2026-05-08 15:33:17 发布

vLLM Production Stack Whisper API转录：如何集成语音识别模型

【免费下载链接】production-stack vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization 项目地址: https://gitcode.com/gh_mirrors/pr/production-stack

vLLM Production Stack是一个基于Kubernetes的vLLM集群部署参考系统，提供社区驱动的性能优化方案。其中新增的Whisper API转录功能，让用户能够轻松集成语音识别模型，实现高效的音频转文本服务。

什么是Whisper API转录？

Whisper API转录是vLLM Production Stack中vllm-router新增的/v1/audio/transcriptions端点功能，它允许用户使用OpenAI的whisper-small模型转录.wav音频文件。这一功能为开发者提供了便捷的语音识别解决方案，可广泛应用于语音助手、会议记录、字幕生成等场景。

集成语音识别模型的准备工作

在开始集成语音识别模型之前，需要确保满足以下先决条件：

拥有带GPU的机器访问权限（例如通过RunPod）
Python 3.12环境（推荐使用uv）
已克隆并安装vllm和production-stack
安装带有音频支持的vllm：

pip install vllm[audio]

部署Whisper模型服务

首先，需要启动一个带有whisper-small模型的vLLM后端服务：

vllm serve \
  --task transcription openai/whisper-small \
  --host 0.0.0.0 --port 8002

配置并运行路由服务

接下来，创建并运行一个连接到Whisper后端的路由服务。在run-router.sh文件中配置路由参数：

#!/bin/bash
if [[ $# -ne 2 ]]; then
    echo "Usage: $0 <router_port> <backend_url>"
    exit 1
fi

uv run python3 -m vllm_router.app \
    --host 0.0.0.0 --port "$1" \
    --service-discovery static \
    --static-backends "$2" \
    --static-models "openai/whisper-small" \
    --static-model-types "transcription" \
    --routing-logic roundrobin \
    --log-stats \
    --log-level debug \
    --engine-stats-interval 10 \
    --request-stats-window 10 \
    --static-backend-health-checks

然后运行路由服务：

./run-router.sh 8000 http://0.0.0.0:8002

发送转录请求的方法

使用curl命令可以方便地将.wav文件发送到转录端点：

curl -v http://localhost:8000/v1/audio/transcriptions \
  -F 'file=@/path/to/audio.wav;type=audio/wav' \
  -F 'model=openai/whisper-small' \
  -F 'response_format=json' \
  -F 'language=en'

支持的参数说明

参数	描述
`file`	`.wav`音频文件路径
`model`	使用的Whisper模型（例如`openai/whisper-small`）
`prompt`	（可选）用于指导转录的文本提示
`response_format`	输出格式，可选`json`、`text`、`srt`、`verbose_json`或`vtt`
`temperature`	（可选）采样温度，浮点型
`language`	ISO 639-1代码（例如`en`、`fr`、`zh`）
`stream`	（可选）设为`true`以接收流式SSE响应

实现流式转录功能

对于长音频文件，可以启用流式转录功能，以Server-Sent Events（SSE）的形式增量接收转录结果：

curl -v http://localhost:8000/v1/audio/transcriptions \
  -F 'file=@/path/to/long_audio.wav;type=audio/wav' \
  -F 'model=openai/whisper-small' \
  -F 'response_format=json' \
  -F 'language=en' \
  -F 'stream=true'

响应将以SSE块的形式流式传输：

data: {"text": "Hello"}

data: {"text": " world"}

data: {"text": ", this is a test"}

Python流式转录示例

以下是一个使用Python实现流式转录的示例代码：

import aiohttp
import asyncio

async def stream_transcription():
    url = "http://localhost:8000/v1/audio/transcriptions"

    with open("audio.wav", "rb") as audio_file:
        audio_bytes = audio_file.read()

    data = aiohttp.FormData()
    data.add_field("file", audio_bytes, filename="audio.wav", content_type="audio/wav")
    data.add_field("model", "openai/whisper-small")
    data.add_field("stream", "true")

    async with aiohttp.ClientSession() as session:
        async with session.post(url, data=data) as response:
            while True:
                line = await response.content.readline()
                if not line:
                    break
                line = line.decode("utf-8").strip()
                if line.startswith("data: "):
                    print(line[6:])  # 打印JSON数据

asyncio.run(stream_transcription())

转录结果示例

成功转录后，将收到类似以下的JSON格式输出：

{
  "text": "Testing testing testing the whisper small model testing testing testing the audio transcription function testing testing testing the whisper small model"
}

注意事项与最佳实践

路由服务使用扩展的aiohttp超时设置，以支持长时间的转录任务。
此实现会动态发现有效的转录后端，并相应地路由请求。
建议使用适当的日志级别（如debug）来监控转录过程，以便排查问题。
对于生产环境，启用--static-backend-health-checks标志，使vllm-router定期通过向端点发送虚拟请求来检查模型是否正常工作。

相关资源

官方教程：tutorials/23-whisper-api-transcription.md
vLLM路由器源码：src/vllm_router/
测试代码：src/tests/test_transcription_streaming.py

MCP技术社区

欢迎加入 MCP 技术社区！与志同道合者携手前行，一同解锁 MCP 技术的无限可能！

更多推荐

AI Agent 面试题 775：如何设计Agent的评估指标的基线和目标值？

评估指标设计是 AI Agent 技术体系中的重要组成部分。简单来说，它涉及到 Agent 如何在 Agent评估与测试层面实现智能化的行为和决策。在实际应用中，评估指标设计的核心目标是让 Agent 能够更加高效、准确地完成特定任务。这需要我们深入理解其底层原理和实现机制。从学术角度来看，评估指标设计的研究可以追溯到人工智能的早期阶段。早在 1950 年代，Alan Turing 就提出

MCP技术社区

AI Agent 面试题 767：如何设计Agent的评估数据集版本管理？

基准测试框架是 AI Agent 技术体系中的重要组成部分。简单来说，它涉及到 Agent 如何在 Agent评估与测试层面实现智能化的行为和决策。在实际应用中，基准测试框架的核心目标是让 Agent 能够更加高效、准确地完成特定任务。这需要我们深入理解其底层原理和实现机制。从学术角度来看，基准测试框架的研究可以追溯到人工智能的早期阶段。早在 1950 年代，Alan Turing 就提出