[EN] Speech to Text & Text to Speech

This article is an example of python implementation to call a speech-to-text service, then send a message to Google’s text-to-speech service to generate and return an mp3 audio file, and finally, use pygame to play the speech out of the speaker. It runs on the Windows operating system.

Speech to Text

In this article, we select the speech_recognition modules. By downloading the setup file from here for a compatible version with your OS. API is called from the google cloud by running the pyaudio library with the following command.

pip install pyaudio

After that install the Speech to Text module with the following command.

pip install SpeechRecognition

The command to use the class library is as follows.

  1. object= SpeechRecognition.Recognizer()
    For starting work.
  2. whith SpeechRecognition.Microphone() as วัตถุไมโครโฟน
    For microphone reservation.
  3. sound_object = object.listen( microphone_object )
    For starting to record and stored the sound to sound object.
  4. result=object.recognize_google( sound_object, None, ‘th’ )
    To convert audio objects to text.

The usage is shown in code23-1, and the result when saying “สวัสดี” (sawaddee) is shown in Figure 1.

# code23-1 : Speech to Text
import speech_recognition as stt
recog = stt.Recognizer()
with stt.Microphone() as mic:
    print("กำลังอัดเสียง")
    audio = recog.listen( mic )
    try:
        print(recog.recognize_google(audio,None,'th'))
    except stt.UnknownValueError:
        print("Google ไม่เข้าใจเสียงที่นำเข้า")
    except stt.RequestError as e:
        print("ไม่สามารถนำข้อมูลมาจากบริการของ Google: {0}".format(e))
(Figure. 1 resulf of code23-1)

Text to Speech

Text to Speech is a text-to-speech conversion process. In the example of this article, we use a Google service to read the text aloud. The result obtained from Google is an MP3 audio file.

Google’s text-to-speech service requires the installation of the gTTS (Google Text to Speech) library, which can be installed with the following command.

pip install gtts

To call the class library for accessing the gTTS service, use the following command format:

from ggts import gTTS

The command for sending messages to Google services is as follows:

ข้อมูลเสียง = gTTS(text = “ข้อความ”, lang=’th’)

The command for importing audio recorded into an MP3 audio file format is as follows.

ข้อมูลเสียง.save(“ชื่อไฟล์.mp3”)

Example program code23-2 is to create an audio file contained ‘Welcome’ sound and save it to file ‘welcome.mp3’

# code23-2 : Text to Speech
from gtts import gTTS
tts = gTTS(text='ยินดีต้อนรับค่ะ',lang='th')
tts.save('welcome.mp3')

Play audio from MP3 files.

To play MP3 audio files the playsound library must be installed with the following command.

pip install playsound

Example program code23-3 reads the welcome.mp3 audio file from code23-2.

# code23-3 : Play MP3
import playsound as sound
filename = './welcome.mp3'
sound.playsound(filename, True)

Example Code

The example code23-4 combines the code23-1, code23-2, and code23-3 snippets to listen to speech and convert it to text. After that, convert the text into an audio file. Finally, play the audio file.

# code23-4
import speech_recognition as stt
from gtts import gTTS
import playsound as sound
recog = stt.Recognizer()
textRecog = ""
with stt.Microphone() as mic:
    print("กำลังอัดเสียง")
    audio = recog.listen( mic )
    try:
        textRecog = recog.recognize_google(audio,None,'th')
        print(textRecog)
    except stt.UnknownValueError:
        print("Google ไม่เข้าใจเสียงที่นำเข้า")
    except stt.RequestError as e:
        print("ไม่สามารถนำข้อมูลมาจากบริการของ Google: {0}".format(e))
tts = gTTS(text=textRecog,lang='th')
tts.save('text.mp3')
sound.playsound('text.mp3', True)
(Figure. 2 The result when saying “ใช่ไหมถูกไหม” (chai mai tuke mai))

Conclusion

From this article, we found how to use AI to convert speech to text, and converting text to audio with a Google service in Python is not difficult but must be connected to the internet during the program. In addition, the reader can apply the text from the speech to display or store it in a file as a text file as well.

Finally, We hope this article will be useful to everyone who learn Python programming for use in AI and have fun with programming.

References

  1. SpeechRecognition
  2. PyAudio

(C) 2020, By Jarut Busarathid and Danai Jedsadathitikul
Updated 2021-09-11