最近在学习语音识别的一些基本知识,也在了解Python的语音识别功能依赖库。分享一下。
Python的依赖库中有一些现成的语音识别软件包。其中包括:
- apiai
- google-cloud-speech
- pocketsphinx
- SpeechRcognition
- watson-developer-cloud
- wit
其中SpeechRecognition,是google出的,专注于语音向文本的转换。
wit 和 apiai 提供了一些超出基本语音识别的内置功能,如识别讲话者意图的自然语言处理功能。
满足几种主流语音 API ,灵活性高
Google Web Speech API 支持硬编码到 SpeechRecognition 库中的默认 API 密钥,无需注册就可使用
SpeechRecognition无需构建访问麦克风和从头开始处理音频文件的脚本, 只需几分钟即可自动完成音频输入、检索并运行。因此易用性很高。
SpeechRecognition 的核心就是识别器类。一共有七个Recognizer API ,包含多种设置和功能来识别音频源的语音,分别是:
- recognize_bing():Microsoft Bing Speech
- recognize_google():Google Web Speech API
- recognize_google_cloud():Google Cloud Speech- requires installation of the google-cloud-speech package
- recognize_houndify():Houndifyby SoundHound
- recognize_ibm():IBM Speech to Text
- recognize_sphinx():CMU Sphinx- requires installing PocketSphinx
- recognize_wit():Wit.ai
以上七个中只有 recognition_sphinx()可与CMU Sphinx 引擎脱机工作, 其他六个都需要连接互联网。
另外,SpeechRecognition 附带 Google Web Speech API 的默认 API 密钥,可直接使用它。其他六个 API 都需要使用 API 密钥或用户名/密码组合进行身份验证,因此本文使用了 Web Speech API。
To use all of the functionality of the library, you should have:
需要Python 2.6、2.7和3.3以上的版本
需要安装PyAudio 0.2.11+的版本
需要安装PocketSphinx
需要使用Google API Client Library for Python
需要安装FLAC encoder,如果系统不是X86
支持的文件类型有:
- WAV: 必须是 PCM/LPCM 格式
- AIFF
- AIFF-C
- FLAC: 必须是初始 FLAC 格式;OGG-FLAC 格式不可用
上篇文章介绍了SpeechRecognition的基本概念和优势,这篇文章介绍如何安装和体验一下demo。
一、安装Python,基于Python3.7
从终端安装 SpeechRecognition,使用命令:pip3 install SpeechRecognition:
alicedembp:~ alice$ pip3 install SpeechRecognition
Requirement already satisfied: SpeechRecognition in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (3.8.1)
alicedembp:~ alice$ python -m speech_recognition
二、验证安装是否成功
安装完成后打开解释器窗口输入以下内容来验证安装:
>>> import speech_recognition as sr
>>> sr.__version__
'3.8.1'
三、安装portaudio、pyaudio
接下来,安装必须依赖的两个包,注意顺序不能错,安装pyaudio时必须依赖于portaudio
brew install portaudio
pip install pyaudio
如下:
alicedembp:~ alice$ brew install portaudio
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> New Formulae
allureofthestars csound inlets libgr terrahub
boringtun cubelib itk nlohmann-json vapoursynth-imwri
cfn-lint cypher-shell kahip otf2 vapoursynth-ocr
cmix fasttext ktlint phpstan vapoursynth-sub
cpp-gsl faudio kubeaudit scws
cql gel leela-zero sk
==> Updated Formulae
libpng ✔ godep libdap picard-tools
amazon-ecs-cli golang-migrate libebml pijul
ammonite-repl gopass libedit pilosa
ansifilter goreleaser libestr platformio
apache-geode gradle libetonyek postgresql
apache-spark grafana libfabric postgresql@10
arangodb graphene libfixbuf pre-commit
aravis groovysdk libgit2 presto
argyll-cms grpc libgit2-glib privoxy
asciidoctor gst-editing-services libical prometheus
autojump gst-libav libiconv pspg
autopep8 gst-plugins-bad libjson-rpc-cpp psql2csv
avra gst-plugins-base liblcf pulumi
aws-iam-authenticator gst-plugins-good liblinear purescript
aws-okta gst-plugins-ugly libltc pushpin
aws-sdk-cpp gst-python libmatroska py3cairo
azure-cli gst-rtsp-server libmicrohttpd pygobject3
badtouch gstreamer libmspub qalculate-gtk
ballerina gtranslator libphonenumber qbs
bash hadoop libpqxx qemu
bdw-gc harfbuzz libpulsar quazip
binaryen hebcal libqalculate r
bind helmfile librealsense rawtoaces
bit hexyl libressl rclone
blast hfstospell libssh readline
boost hivemind libtorrent-rasterbar rebar3
botan hledger libuv restic
btfs hlint libvisio ripgrep
buildkit hopenpgp-tools libvmaf rke
bwfmetaedit howdoi libxo roll
carla htmlcxx linkerd root
castxml http-parser lmod rsyslog
ccache httpd lynis ruby
certbot hub lz4 ruby-build
chakra hugo mapnik rust
chronograf hydra maven rustup-init
clang-format hypre maxwell s-nail
cmake i2p media-info salt
cmocka iamy memcached serverless
cockroach icu4c meson shfmt
cogl idnits mimic ship
cointop igv mingw-w64 sile
conan ilmbase minio silk
couchdb imagemagick minio-mc skaffold
cpprestsdk imagemagick@6 mkvtoolnix sn0int
cromwell imake modules sonobuoy
crowdin influxdb mono sops
crystal iniparser mosquitto sqldiff
crystal-icr ios-sim mpd sqlite
ctl ios-webkit-debug-proxy mps-youtube sqlite-analyzer
cython iozone msmtp sqlmap
dartsim ipbt mypy ssh-copy-id
dbhash ipfs mysql stubby
dfmt ipython n subversion
digdag ircii nagios svgo
dmd isl nano swagger-codegen
docfx istioctl nats-streaming-server swagger-codegen@2
doctl itstool ncmpcpp swiftformat
dwdiff jailkit neovim swiftlint
emscripten jbig2dec netdata synfig
epubcheck jena newsboat tarantool
erlang jenkins nghttp2 tcpreplay
erlang@20 jetty nginx tectonic
ethereum jfrog-cli-go nifi telegraf
exploitdb jhiccup node teleport
faas-cli john node-build tmux
ffmpeg joplin node@10 tmuxinator-completion
field3d jp2a node@8 tomcat
firebase-cli jruby nomad topgrade
flatbuffers json_spirit numpy traefik
flow jump ocamlbuild triton
fluxctl just octave tundra
fn kafka odpi typescript
freeling khard opencoarrays ucloud
freetds kibana@5.6 opencolorio ultralist
frps kitchen-sync opencv urbit
frugal klavaro opencv@2 v8
galen knot opencv@3 vapoursynth
gauge knot-resolver openexr varnish
gcc kore openimageio vault
gcc@5 kotlin openrct2 vcdimager
gcc@6 krb5 openssh vim
gcc@7 kubeprod openvdb vips
gegl kubernetes-cli openvpn volt
getdns kyoto-cabinet operator-sdk vte3
ghc kyoto-tycoon packer vtk
ghq lastpass-cli paket webdis
gifsicle laszip parallel widelands
git-lfs latex2html passenger wp-cli
gitfs latexml pazpar2 wtf
gitlab-runner lbdb pbrt xonsh
gitless lcdf-typetools pcapplusplus yaf
gjs lego pcl yaz
glances lgogdownloader pcre2 ykman
glfw libatomic_ops pdal you-get
glib libb2 pdfgrep youtube-dl
glooctl libbluray pdnsrec zebra
glslang libcddb php znc
gmic libcdio php-cs-fixer zorba
gmsh libcdr php@7.1 zstd
go libchamplain php@7.2
goaccess libcoap phpunit
==> Deleted Formulae
safe
==> Downloading https://homebrew.bintray.com/bottles/portaudio-19.6.0.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring portaudio-19.6.0.high_sierra.bottle.tar.gz
? /usr/local/Cellar/portaudio/19.6.0: 33 files, 452KB
alicedembp:~ alice$ pip3 install pyaudio
Collecting pyaudio
Using cached https://files.pythonhosted.org/packages/ab/42/b4f04721c5c5bfc196ce156b3c768998ef8c0ae3654ed29ea5020c749a6b/PyAudio-0.2.11.tar.gz
Building wheels for collected packages: pyaudio
Building wheel for pyaudio (setup.py) ... done
Stored in directory: /Users/alice/Library/Caches/pip/wheels/f4/a8/a4/292214166c2917890f85b2f72a8e5f13e1ffa527c4200dcede
Successfully built pyaudio
Installing collected packages: pyaudio
Successfully installed pyaudio-0.2.11
alicedembp:~ alice$
否则会出现错误提示:src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found
gcc -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch i386 -arch x86_64 -g -DMACOSX=1 -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/_portaudiomodule.c -o build/temp.macosx-10.6-intel-3.7/src/_portaudiomodule.o
src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found
#include "portaudio.h"
^~~~~~~~~~~~~
1 error generated.
error: command 'gcc' failed with exit status 1
import speech_recognition as sr
r = sr.Recognizer()
test = sr.AudioFile('/Users/alice/Documents/Work/Blog/AI/语音识别/speechrecognition/audiofiles/test1.wav')
with test as source:
audio = r.record(source)
type (audio)
r.recognize_google(audio, language='zh-CN', show_all= True)