Speech Recognition is the process of recognizing the voice and representing it in a textual manner.
How speech recognition works is as follows: first, conversion of speech from physical sound to electrical signals using a microphone. Then, with the help of an analog-to-digital converter to convert this to digital data. Finally, we use multiple models to transcribe audio to text. In the Hidden Markov Model (HMM), we divide the speech signal into 10-millisecond fragments.
Supported File Formats
Currently, SpeechRecognition library in Python supports the following file formats:
- WAV: must be in PCM/LPCM format
- FLAC: must be native FLAC format; OGG-FLAC is not supported
If you are working on x-86 based Linux, macOS, or Windows, you should be able to work with FLAC files without a problem. On other platforms, you will need to install a FLAC encoder and ensure you have access to the
flac command line tool.