Why Do The Results Of This DeepSpeech Python Program Differ From The Results I Get From The Command Line Interface?

September 29, 2022 Post a Comment

I'm learning about Mozilla's DeepSpeech Speech-To-Text engine. I had no trouble getting the command line interface working, but the Python interface seems to be behaving differentl

Solution 1:

just include your trie and lm.binary files and try again.

from deepspeech import Model
import scipy.io.wavfile

BEAM_WIDTH = 500
LM_WEIGHT = 1.50
VALID_WORD_COUNT_WEIGHT = 2.25
N_FEATURES = 26
N_CONTEXT = 9
MODEL_FILE = 'output_graph.pbmm'
ALPHABET_FILE = 'alphabet.txt'
LANGUAGE_MODEL =  'lm.binary'
TRIE_FILE =  'trie'

ds = Model(MODEL_FILE, N_FEATURES, N_CONTEXT, ALPHABET_FILE, BEAM_WIDTH)

ds.enableDecoderWithLM(ALPHABET_FILE, LANGUAGE_MODEL, TRIE_FILE, LM_WEIGHT, 
VALID_WORD_COUNT_WEIGHT)

def process(path):
    fs, audio = scipy.io.wavfile.read(path)
    processed_data = ds.stt(audio, fs)
    return processed_data   

process('sample.wav')

this might produce same response..use same audio files fir both inference and verify.. the audio files should be 16 bit 16000 hz and mono recording..

Solution 2:

You should convert it to 16000 Hz, most of the issues related to weird output belongs to incorrect audio format. Loading the language model also can improve WER.

Python Tutorial for Beginners

Why Do The Results Of This DeepSpeech Python Program Differ From The Results I Get From The Command Line Interface?

Solution 1:

Solution 2:

Post a Comment for "Why Do The Results Of This DeepSpeech Python Program Differ From The Results I Get From The Command Line Interface?"