- apiai
- google-cloud-speech
- pocketsphinx
- SpeechRcognition
- watson-developer-cloud
- wit
wit 和 apiai 提供了一些超出基本语音识别的内置功能,如识别讲话者意图的自然语言处理功能。
满足几种主流语音 API ,灵活性高
Google Web Speech API 支持硬编码到 SpeechRecognition 库中的默认 API 密钥,无需注册就可使用
SpeechRecognition无需构建访问麦克风和从头开始处理音频文件的脚本, 只需几分钟即可自动完成音频输入、检索并运行。因此易用性很高。
SpeechRecognition 的核心就是识别器类。一共有七个Recognizer API ,包含多种设置和功能来识别音频源的语音,分别是:
- recognize_bing():Microsoft Bing Speech
- recognize_google():Google Web Speech API
- recognize_google_cloud():Google Cloud Speech- requires installation of the google-cloud-speech package
- recognize_houndify():Houndifyby SoundHound
- recognize_ibm():IBM Speech to Text
- recognize_sphinx():CMU Sphinx- requires installing PocketSphinx
- recognize_wit():Wit.ai
以上七个中只有 recognition_sphinx()可与CMU Sphinx 引擎脱机工作, 其他六个都需要连接互联网。
另外,SpeechRecognition 附带 Google Web Speech API 的默认 API 密钥,可直接使用它。其他六个 API 都需要使用 API 密钥或用户名/密码组合进行身份验证,因此本文使用了 Web Speech API。
To use all of the functionality of the library, you should have:
需要Python 2.6、2.7和3.3以上的版本
需要安装PyAudio 0.2.11+的版本
需要使用Google API Client Library for Python
需要安装FLAC encoder,如果系统不是X86
- WAV: 必须是 PCM/LPCM 格式
- FLAC: 必须是初始 FLAC 格式;OGG-FLAC 格式不可用
从终端安装 SpeechRecognition,使用命令:pip3 install SpeechRecognition:
alicedembp:~ alice$ pip3 install SpeechRecognition
Requirement already satisfied: SpeechRecognition in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (3.8.1)
alicedembp:~ alice$ python -m speech_recognition
>>> import speech_recognition as sr
>>> sr.__version__
brew install portaudio
pip install pyaudio
alicedembp:~ alice$ brew install portaudio
==> Downloading https://homebrew.bintray.com/bottles/portaudio-19.6.0.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring portaudio-19.6.0.high_sierra.bottle.tar.gz
? /usr/local/Cellar/portaudio/19.6.0: 33 files, 452KB
alicedembp:~ alice$ pip3 install pyaudio
Collecting pyaudio
Using cached https://files.pythonhosted.org/packages/ab/42/b4f04721c5c5bfc196ce156b3c768998ef8c0ae3654ed29ea5020c749a6b/PyAudio-0.2.11.tar.gz
Building wheels for collected packages: pyaudio
Building wheel for pyaudio (setup.py) ... done
Stored in directory: /Users/alice/Library/Caches/pip/wheels/f4/a8/a4/292214166c2917890f85b2f72a8e5f13e1ffa527c4200dcede
Successfully built pyaudio
Installing collected packages: pyaudio
Successfully installed pyaudio-0.2.11
alicedembp:~ alice$
否则会出现错误提示:src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found
gcc -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch i386 -arch x86_64 -g -DMACOSX=1 -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/_portaudiomodule.c -o build/temp.macosx-10.6-intel-3.7/src/_portaudiomodule.o
src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found
#include "portaudio.h"
1 error generated.
error: command 'gcc' failed with exit status 1
import speech_recognition as sr
r = sr.Recognizer()
test = sr.AudioFile('/Users/alice/Documents/Work/Blog/AI/语音识别/speechrecognition/audiofiles/test1.wav')
with test as source:
audio = r.record(source)
type (audio)
r.recognize_google(audio, language='zh-CN', show_all= True)