Speech Recognition
It is the technology by which a device recognizes your voice and then convert it into a device-readable format. The devices respond according to your voice.
By using this technology you can also control your PiArm by voice making it hands-free.
Installation
Before programming, you need to install a few things
Install Speech Recognition
To install speech recognition go to Pi’s terminal and write command as
sudo pip3 install SpeechRecogntion
Install PyAudio
To install PiAudio the command is -
sudo pip3 install pyaudio
Install flac encoder
To install PiAudio the command is -
sudo pip3 install flac
Install alsa library
To install alsa the command is -
sudo apt-get install alsa-utils
Install the driver
To install the driver the command is -
sudo modprobe snd_bcm2835
Note: If ‘pip3’ command doesn’t work then you can try with ‘pip’
Configuring microphone
- Connect the microphone to the Pi.
- Specify the microphone during the program.
There are many ways to find the device name but sometimes our device doesn’t show in the list so the best method is to run a program to find the device.
To find your device you can run a short python program:
Program:
Here, you can see a list of devices is shown.
To know the device ID you can again run a short program as shown below. For the device ID, note the device name.
Program:
Here, you can see the list of devices and also the device ID of that particular device i.e ‘Yeti Stereo Microphone’.
Now you are done with configuring the microphone.
Some important terms to know before programming
- Chunk size: It specifies the number of bytes of data we want to read at once. Typically, this value is specified in powers of 2 (i.e 1024 or 2048). Chunk is like a buffer.
- Sampling rate: It defines how often values are recorded for processing.
- Energy Threshold: It allows the program to wait for a second to adjust the energy threshold of recording so that it is adjusted according to the external noise level.
Functions used in the program
This function list down the list of microphones connected to the Pi.
Its a class having seven methods for recognizing speech from an audio source using various APIs. These are:
- recognize_bing(): Microsoft Bing Speech
- recognize_google(): Google Web Speech API
- recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package
- recognize_houndify(): Houndify by SoundHound
- recognize_ibm(): IBM Speech to Text
- recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
- recognize_wit(): Wit.ai
Note: Only recognize_sphinx() works offline with the CMU Sphinx engine. The other six all require an internet connection.
This function is used to calibrate the energy threshold for ambient noise levels of the source.
This function is used to listen for the phrase and extract it into audio data.
It’s a google web speech recognition API which is used to recognize the voice.
This function converts the received text into lower case.
This function takes a regular expression pattern and a string and searches for that pattern with the string.
It is basically an exception. If the audio is unintelligible, this exception block is executed.
If any request error occurs in the program then this exception block gets executed.
If any assert statement fails or if the user hits the interrupt key then the instruction inside this block is executed.
Program
To pick and place an item
The whole program is shown below.
Import the required modules. PiArm module is needed for controlling PiArm functionalities. Sleep module for providing a proper delay between commands. The regex re module is used for text search from the file. The speech module is used to recognize the voice and convert it into the device-readable format.
import speech_recognition as sr
import time, datetime
import pyaudio
from piarm import PiArm
import re
Creating an instance of the PiArm class and connecting to PiArm via ttyS0 port.
robo = PiArm()
robo.connect('/dev/ttyS0')
Creating instance of the recognizer class. Initialize the recognizer.
r = sr.Recognizer()
Initializing number of samples in a sec, number of frames in a buffer and also the threshold energy level
r.energy_threshold = 4000
sample_rate = 48000
chunk_size = 1024
Defining positions of the motors to pick and place an item. pos contains the positions of the motors to pick an item whereas pos1 contains the positions of the motor to place an item.
pos = [[472, 497, 306, 478, 484, 570],
[471, 499,303 ,739, 486, 571],
[472 ,499,206 ,739 ,421, 576],
[579, 499 ,207, 739 ,408, 576],
[579, 499 ,206, 739 ,533, 576]]
pos1 = [[579, 499 ,206 ,748 ,555, 944],
[579, 499 ,207 ,748 ,413, 944],
[498, 499 ,207 ,748 ,420, 944],
[498, 499 ,208 ,748 ,575, 944],
[496, 499 ,206 ,749 ,614 ,575]]
Generate a list of all audio cards/microphones and stores it in a variable.
mic_list = sr.Microphone.list_microphone_names()
print(mic_list)
Setting up device id of the device you are using to avoid ambiguity.
device_id = 0
for i in mic_list:
if i == 'Yeti Stereo Microphone: USB Audio (hw:1,0)':
device_id = mic_list.index(i)
The microphone is used as a source for input. Specify the device ID. Incase the microphone is not working, an error will pop up saying "device_id undefined". Adjusting the noise.
while True:
with sr.Microphone(device_index = device_id, sample_rate = sample_rate,chunk_size = chunk_size) as source:
r.adjust_for_ambient_noise(source)
print("Say Something")
Listens for the user's input. According to the logics, performs the tasks. If the audio matches with the text it runs commands mentioned in that particular block.
audio = r.listen(source)
try:
text = r.recognize_google(audio)
text = text.lower()
print("you said: " + text)
Search for the audio and perform the task
if re.search("pic", text):
if re.search('item', text):
print("pick item")
Iterate over the positions of PiArm.
for command in pos:
Iterate over each servo of PiArm and write the position value to PiArm. Here the time is constant i.e. 500 and IDs are incremented via the loop.
for ID in range(6):
robo.servoWrite(ID + 1, command[ID], 500)
sleep(1)
elif re.search("place", text):
print("Place item")
Iterate over the positions of PiArm.
for command in pos1:
Iterate over each servo of PiArm and write the position value to PiArm. Here the time is constant i.e. 500 and IDs are incremented via the loop.
for ID in range(6):
robo.servoWrite(ID + 1, command[ID], 500)
sleep(1)
If google could not understand what was said it shows some errors.
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
except AssertionError or KeyboardInterrupt:
print("Problem with Audio Source")
break
Decision-making loop:
If the voice is recognized as ‘pick item’ then it will run the commands inside ‘pos’ list else if the voice is recognized as ‘place item’ then it will run the commands inside ‘pos1’ list.
Go to the following link for the python file -https://github.com/sbcshop/PiArm/blob/master/speech_recognition.py