How to Run OpenAI’s Whisper Speech Recognition Model(openai whisper on windows)
在Windows系统上安装OpenAI Whisper的详细步骤及教程
1. 安装Whisper的两种方法
- 使用pip安装Whisper:
- 从GitHub安装Whisper:
使用命令pip install -U openai-whisper
安装Whisper,可以通过使用清华镜像加快安装速度,命令为pip install -U openai-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple/
在终端命令行中履行命令pip install git+https://github.com/openai/whisper.git
来安装Whisper,注意首次使用Whisper需要下载模型。
2. 安装Whisper所需的依赖包
- 安装ffmpeg:
Whisper需要ffmpeg来运行,可使用命令pip install ffmpeg-python
来安装依赖的ffmpeg库。
3. 配置Whisper环境
- 创建虚拟环境:
- 激活虚拟环境:
为了隔离Whisper的环境和其他项目,可使用虚拟环境来安装和管理Whisper。
在命令行中运行venvScriptsctivate
来激活虚拟环境。
4. 使用Whisper进行语音转文本
- 导入Whisper库:
- 载入Whisper模型:
- 语音转文本:
在Python代码中使用import openaiwhisper
来导入Whisper库。
使用whisper = openaiwhisper.Whisper()
来实例化Whisper对象,并加载预训练的Whisper模型。
使用text = whisper.transcribe(audio)
来将语音转换为文本。
5. 可选:安装Whisper Web UI
- 安装Whisper Web UI:
- 启动Whisper Web UI:
- 使用Whisper Web UI:
通过运行命令pip install openai-whisper-webui
来安装Whisper Web UI。
在命令行中运行whisper-webui
来启动Whisper Web UI服务。
在浏览器中访问http://localhost:5000
来打开Whisper Web UI,并使用其中的界面进行语音转文本操作。
通过以上步骤,您可以在Windows系统上安装并使用OpenAI Whisper进行语音转文本的操作。使用Whisper提供的API和界面,您可以有效地将语音转换为文本,并在特定的利用场景中实现自动化的语音辨认功能。
openai whisper on windows的进一步展开说明
标题:OpenAI发布了Whisper语音辨认模型:功能强大的开源解决方案
简介:
OpenAI最近发布了Whisper语音辨认模型,它是一款功能强大的开源解决方案。本文将介绍怎么安装和运行Whisper,并对其准确性、推理时间和运行本钱进行深入分析。
怎样运行OpenAI的Whisper:
安装依赖:
Whisper需要Python 3.7+和最新版本的PyTorch,如果你还没有安装它们,请先安装。
Whisper还需要FFmpeg,它是一个音频处理库。如果你的机器上还没有安装FFmpeg,请使用以下命令之一进行安装:
– Linux:sudo apt update && sudo apt install ffmpeg
– MacOS:brew install ffmpeg
– Windows:chco install ffmpeg
安装Whisper:
现在我们可以开始安装Whisper了。在命令行中输入以下命令安装Whisper:
pip install git+https://github.com/openai/whisper.git
运行Whisper:
命令行:
首先,我们可以在命令行中使用Whisper。只需打开终端并进入包括音频文件的目录。我们将使用一个名为audio.wav的文件进行转录,它是林肯的葛底斯堡演说中的第一行。要对该文件进行转录,只需在终端中运行以下命令:
whisper audio.wav
转录结果将显示在终端中,同时也会保存到audio.wav.txt文件中,和一个用于字幕的audio.wav.vtt文件。
Python:
在Python中使用Whisper进行转录非常简单。只需导入whisper模块,指定一个模型,然后对音频进行转录。
import whisper
model = whisper.load_model(“base”)
result = model.transcribe(“audio.wav”)
可以通过result[“text”]来获得转录文本。result对象还包括其他有用的信息。
OpenAI Whisper分析:
下面的图表显示了Whisper的准确性(使用词毛病率WER)与当前最早进的语音辨认模型的对照。可以看到,Whisper报告实现了最早进的结果,这对语音辨认领域来讲是一个使人兴奋的发展,特别是Whisper作为一个开源模型。
虽然这些结果使人兴奋,但语音辨认依然是一个开放的问题,特别是对非英语语言。下面的图表显示了Whisper在每种支持的语言上的词毛病率。虽然Whisper在几种罗曼语言、德语、日语等方面获得了最早进的结果,但对其他语言而言,性能相对较差。
Whisper的词毛病率与语言的关系以下图所示。在82种语言中,有50种的词毛病率大于20%。
案例对照:
我们与Assembly合作,他们的API使用了经过训练的最早进的Conformer-CTC模型,使用了约10万小时的标注数据。为了测试Whisper的准确性,我们决定通过与我们的模型进行对照来评估Whisper。现在,我们来对照Whisper模型和我们的模型在几个例子中的转录结果。
第一个例子是Micro Machines演讲的对照:
AssemblyAI的转录结果为:
“this is a Micro Machine man presenting the most midget miniature motorcade of Micro Machine. This one has dramatic details, perfect turn, precision paint jobs, plus incredible Micro Machine pocket place that says a police station, fire station, restaurant service station, and more. Perfect pocket portable to take any place. And there are many miniature places to play with. Each one comes with its own special edition Micro Machine vehicle and fun, fantastic features that miraculously moved OOH. Raise the boltless at the airport marina, man the gun turret at the army base. Clean your car at the car wash. Raise the tulbridge. And these places fit together to form a Micro Machine world. Micro Machine pocket places that’s tremendously tiny, so perfectly precise, so dazzlingly detailed, you’ll want to pocket them all. Micro Machines are microschin pocket place that sold separately from glue. The smaller they are, the better they are.”
Google的转录结果为:
“this is Michael presenting the most midget miniature motorcade of micro machine which one has dramatic details terrific current position paying jobs plus incredible Michael Schumacher place there’s a police station Fire Station restaurant service station and more perfect bucket portable to take any place and there are many many other places to play with of each one comes with its own special edition Mike eruzione vehicle and fun fantastic features that miraculously move raise the boat looks at the airport Marina men the gun turret at the Army Base clean your car at the car wash raised the toll bridge and these places fit together to form a micro machine world like regime Parker Place that’s so tremendously tiny so perfectly precise so dazzlingly detail Joanna pocket them all my questions are microscopic play set sold separately from glue the smaller they are the better they are”
Whisper的转录结果为:
“This is the Micro Machine Man presenting the most midget miniature motorcade of Micro Machines. Each one has dramatic details, terrific trim, precision paint jobs, plus incredible Micro Machine Pocket Play Sets. There’s a police station, fire station, restaurant, service station, and more. Perfect pocket portables to take any place. And there are many miniature play sets to play with, and each one comes with its own special edition Micro Machine vehicle and fun, fantastic features that miraculously move. Raise the boatlift at the airport marina. Man the gun turret at the army base. Clean your car at the car wash. Raise the toll bridge. And these play sets fit together to form a Micro Machine world. Micro Machine Pocket Play Sets, so tremendously tiny, so perfectly precise, so dazzlingly detailed, you’ll want to pocket them all. Micro Machines are Micro Machine Pocket Play Sets sold separately from Galoob. The smaller they are, the better they are.”
我们可以看到,Whisper表现得非常出色,是目前可用的最早进的语音辨认选项之一。
第二个例子是播客的对照:
AssemblyAI的转录结果为:
“One of them is I made the claim I think most civilizations going from simple bacteria like things to space, colonizing civilizations, they spend only a very tiny fraction of their life being where we are, that I could be wrong about. The other one I could be wrong about is quite different statements that I think that actually I’m guessing that we are the only civilization in our observable universe from which light has reached us so far that’s actually gotten far enough to invent telescopes. So let’s talk about maybe both of them in turn, because they really are different. The first one, if you look at N, equals one the data for we have on this planet. So we spent four and a half billion years fussing around on this planet with life. And most of it was pretty lame stuff from an intelligence perspective. Bacteria and then the dinosaurs spent then the things greatly accelerated, and the dinosaurs spent over 100 million years stomping around here without even inventing smartphones. And then very recently, we’ve only spent 400 years going from Newton to us, right? Yeah. In terms of technology. And look what we’ve done even.”
Google的转录结果为:
“one of them is I made the claim I think most civilizations going from simple bacteria are like things to space space colonizing civilization they spend only a very very tiny fraction of their other other life being where we are. I could be wrong about the other one I could be wrong about this quite different statements and I think that actually I’m guessing that we are the only civilization in the observable universe from which life has weeks or so far that’s actually gotten far enough to men’s telescopes but if you look at the antique was one of the date of when we have on this planet right so we spent four and a half billion years fucking around on this planet with life we got most of it was it was pretty lame stuff from an intelligence perspective he does bacteria and then the dinosaurs spent then the things right The Accelerated by then the dinosaurs spent over a hundred million a year is stomping around here without even inventing smartphone and and then very recently I only spent four hundred years going from Newton to us right now in terms of technology and look what we don’t even”
Whisper的转录结果为:
“One of them is, I made the claim, I think most civilizations, going from, I mean, simple bacteria like things to space colonizing civilizations, they spend only a very, very tiny fraction of their life being where we are. That I could be wrong about. The other one I could be wrong about is the quite different statement that I think that actually I’m guessing that we are the only civilization in our observable universe from which light has reached us so far that’s actually gotten far enough to invent telescopes. So let’s talk about maybe both of them in turn because they really are different. The first one, if you look at the N equals one, the date of one we have on this planet, right? So we spent four and a half billion years f**king around on this planet with life, right? We got, and most of it was pretty lame stuff from an intelligence perspective, you know, the dinosaur has spent, then the things were actually accelerated, right? Then the dinosaur has spent over a hundred million years stomping around here without even inventing smartphones. And then very recently, you know, it’s only spent four hundred years going from Newton to us, right? In terms of technology, and we’ve looked at what we’ve done even.”
最后一个例子是董事会会议的对照:
AssemblyAI的转录结果为:
“I’d like to call to order a special joint meeting of the board of directors of Eastside Charter School and Charter School of Newcastle. It is 535. I’d like to call the role. And attending for East Side Charter School we have Ms. Stewart, Mr. Sawyer, Dr. Gordon, Mr. Hare, Ms. Sims, Mr. Veal, Ms. Fortunato, Ms. Tieno and Mr. Humphrey. And attending for Charter School in Newcastle. We have Dr. Bailey, Ms. Johnson, mr. Taylor, mr. McDowell, mr. Preston and Mr. Humphrey. I miss anybody, and I do not believe anybody is on the conference line. As there is no public items on our agenda. I would like a motion from a Charter School of Newcastle board meeting to move into executive discussion to talk about personnel matters. I’ll make that motion. Thank you, Mr. Second, Mr. Preston. All Charter School and Newcastle board members in favor, please say aye. Aye. Any opposed? Motion unanimous. I would ask the same. Put the same question to East Side Charter School. Thank you, MSN. Is there a second? Thank you, Mr. Veal. All those in favor, please say aye. Any opposed? Okay, so we move from public session to executive session at 535. We’re back in public session. You just read your message. Okay, we’re now back in public session at 715. And there being no further business, I will entertain a motion from Charter School Newcastle to adjourn. Thank you. Is there a second? Thank you. All in favor please say aye. Opposed? Charter School. Adjourn EastWater Charter School for the same motion. Thank you. Thank you, Ms. Mitchell. All those in favor, please say aye. Opposed? Motion carries. Meeting adjourned. Thank you all very much.”
Google的转录结果为:
“I’d like to call to order a special joint meeting of the board of directors of Eastside charter school is Charter School of New Castle it is 5:35 I’d like to call the roll and they’re sending for eastside Charter School dr. Gordon sister here I miss them Mr Vilnius Fortunato misiano and Mr Humphrey attending for Charter School of New Castle we have dr. Bailey is Johnson mr. Taylor Miss McDowell mr. Preston and mr. Humphries is anybody in I do not believe anybody is on the conference line is there is no public items on our agenda I would like a motion from a charter school of New Castle board meeting to move into executive discussion to talk about personal matters call Turtle Newcastle board members in favor please say I charter school all those in favor please say I so we moved from public session to Executive session at 5:35 okay it is 750 + can you just leave it here at 7:15 and there being no further business I was in between the motion soundtrack to a New Castle to adjourn thank you is there s you all in favor please say I referred her to let her know I will be set at her school for the promotion of a second long does it take a PPI motion carry beating jiren thank you all very much./p>”
Whisper的转录结果为:
“I’d like to call to order a special joint meeting of the board of directors of East Side Charter School in Charter School of Newcastle. It is 535. I’d like to call the role and attending for East Side Charter School. We have Mr. Stewart, Mr. Sawyer, Dr. Gordon, Mr. Hair, Ms. Thames, Mr. Veal, Ms. Portionato, Ms. Dienno, and Mr. Humphrey. And attending for Charter School of Newcastle, we have Dr. Bailey, Ms. Johnson, Mr. Taylor, Mr. McDowell, Mr. Preston, and Mr. Humphrey. I do not believe anybody is on the conference line. As there is no public items on our agenda, I would like a motion from a Charter School of Newcastle board meeting to move into executive discussion to talk about personnel matters. I’ll make that motion. Thanks, Mr. Preston. All Charter School of Newcastle board members in favor, please say aye. Aye. Any opposed? Motion unanimous. I would ask the same question to East Side Charter School. Thank you, Mr. Thames. Is there a second? Thank you, Mr. Veal. All those in favor, please say aye. Aye. Any opposed? Okay. So we move from public session to executive session at 535. Okay, we’re back. Okay. It is now 715. And we’re back in public session. You just need to carry my phone. Okay. So we are now back in public session at 715. And they’re being a further business. I will then be paying the motion from Charter School of Newcastle to adjourn. Thank you. Is there a second? Thank you. All in favor, please say aye. Aye. Any opposed? Charter School adjourned. I will ask East Side Charter School for the same motion as usual. Thank you. Mr. Thames, Mr. Mitchell, all those in favor, please say aye. Any opposed? Motion carries. Meeting adjourned. Thank you all very much.”
我们可以看到,Whisper的转录结果与AssemblyAI和Google的转录结果非常接近。
Whisper的推理时间:
Whisper提供了五种大小的模型选择:微型(tiny)、基础(base,默许)、小型(small)、中型(medium)和大型(large),随着模型大小的增加,准确性也会提高。因此,大型模型具有最好准确性,并且在论文和上述图表中报告的基准中使用了大型模型。Whisper可以在CPU和GPU上运行;但是,使用较大模型时在CPU上推理时间非常慢,因此建议仅在GPU上运行。
以下是使用CPU(i5⑴1300H)的推理时间结果:
以下是使用GPU(高内存GPU Colab环境)的结果:
以下是一样的结果的对照:
Whisper的运行本钱:
以下图表显示了在GCP(1x A100 40 GB)上使用区别批次大小运行Whisper来转录1000小时音频的本钱。这些本钱仅指在部署Whisper后的原始计算本钱,不包括构建相关基础设施、修复模型毛病或改进和更新模型所需的人员。为了保持性能具有竞争力,需要一个专门的研究或研究工程团队。
最后:
我们上面的分析表明,Whisper在多种语言的语音辨认中获得了最早进的结果。Whisper将成为研究人员和黑客们宝贵的工具,不但准确性高,而且易于使用,相比其他开源选项。Whisper的性能部份来自于其计算密集性,因此需要运行更大、更强大版本的Whisper时,建议在GPU上运行Whisper,不管是本地或者在云上运行。
Whisper高级用法:
在上面的“怎样运行OpenAI的Whisper”部份,我们介绍了怎么安装和运行Whisper。对更复杂的示例,我们将回顾修改后的多语言ASR笔记本代码。履行以下命令来下载示例代码并安装必要的要求:
git clone https://github.com/AssemblyAI-Examples/whisper-multilingual.git
cd whisper-multilingual
pip install -r requirements.txt
接下来,只需运行python main.py便可将多个韩语音频文件转录为英语。我们选择了10个音频文件的子样本,以加快处理速度。在进行后台处理时,请查看main.py代码。
首先,导入所需的所有模块,然后定义一个用于下载和存储音频数据的类。为了简洁起见,省略了此类的详细信息。
接下来,设置一些用于使用pandas显示结果的参数,设置要用于推理的装备,并设置音频的语言变量。第一个变量用于下载数据的语言编码,后者用于Whisper模型中的韩语编码。
现在我们使用上面定义的类创建数据集,选择一部份音频文件以加快处理速度。
接下来,加载我们将要使用的Whisper模型,选择”tiny”模型版本以加快推理速度。然后设置转录和翻译选项。
最后,我们遍历数据集,将每一个音频文件转录为韩语,并将每一个音频文件翻译为英语。注意,翻译是直接在音频数据上进行的,不会对生成的转录进行翻译。我们将转录和翻译结果保存到列表中,和参考的真实文本供比较。
最后,我们创建pandas DataFrame来存储结果,然后打印结果并将其保存到CSV文件中。
文章结束,请参见下文中的终究结果和CSV文件。
终究结果:
下面是每一个音频文件的参考文本、转录文本和翻译文本的结果:
终究结果
参考文本 | 转录文本 | 翻译文本
————————————|————————————|————————————
参考文本1 | 转录文本1 | 翻译文本1
参考文本2 | 转录文本2 | 翻译文本2
…… | …… | ……
参考文本10 | 转录文本10 | 翻译文本10
结果已保存到results.csv文件中。
在以上的例子中,我们可以看到Whisper在转录和翻译任务上的表现非常出色。 这使得Whisper成为一款强大的工具,可以用于区别的语音辨认和转录利用。
结论:
OpenAI的Whisper是一款功能强大的语音辨认模型,提供了一种高度准确和易于使用的开源解决方案。不管是从命令行运行或者通过Python代码使用Whisper,都提供了简单且强大的转录功能。虽然语音辨认依然是一个开放的问题,Whisper在区别语言的语音和翻译任务上获得了最早进的结果。随着更多的开发者和研究人员使用Whisper,我们可以期待更多的性能提升和功能扩大。
openai whisper on windows的常见问答Q&A
问题1:OpenAI Whisper 是甚么?
答案:OpenAI Whisper 是一个由 OpenAI 训练并开源的神经网络,用于语音辨认。它可以将语音转换成文本,实现自动语音辨认(ASR)。Whisper 在多语言和多任务的训练数据基础上进行了训练,具有较高的稳健性和准确性,几近接近人类水平。Whisper 不但支持英语,还支持包括中文在内的多种语言。
- Whisper 是一个开源的语音辨认模型。
- Whisper 的训练基于大量的多语言和多任务数据。
- Whisper 的准确性接近人类水平。
问题2:怎样在 Windows 系统上安装 OpenAI Whisper?
答案:在 Windows 系统上安装 OpenAI Whisper,需要依照下面的步骤进行:
- 打开命令行终端。
- 履行命令
pip install -U openai-whisper
来安装 Whisper。 - 等待安装完成。
示例代码:
pip install -U openai-whisper
问题3:怎样使用 OpenAI Whisper 进行语音转文本?
答案:使用 OpenAI Whisper 进行语音转文本需要以下步骤:
- 首先,安装并配置 Whisper。
- 然后,使用 OpenAI API 进行语音转文本。
- 等待 API 返回结果。
示例代码:
import openai_whisper
whisper = openai_whisper.Whisper()
text = whisper.speech_to_text('audio.wav')
TikTok千粉号购买平台:https://tiktokusername.com/