Skip to content Skip to sidebar Skip to footer

Why Do The Results Of This DeepSpeech Python Program Differ From The Results I Get From The Command Line Interface?

I'm learning about Mozilla's DeepSpeech Speech-To-Text engine. I had no trouble getting the command line interface working, but the Python interface seems to be behaving differentl

Solution 1:

just include your trie and lm.binary files and try again.

from deepspeech import Model
import scipy.io.wavfile

BEAM_WIDTH = 500
LM_WEIGHT = 1.50
VALID_WORD_COUNT_WEIGHT = 2.25
N_FEATURES = 26
N_CONTEXT = 9
MODEL_FILE = 'output_graph.pbmm'
ALPHABET_FILE = 'alphabet.txt'
LANGUAGE_MODEL =  'lm.binary'
TRIE_FILE =  'trie'

ds = Model(MODEL_FILE, N_FEATURES, N_CONTEXT, ALPHABET_FILE, BEAM_WIDTH)

ds.enableDecoderWithLM(ALPHABET_FILE, LANGUAGE_MODEL, TRIE_FILE, LM_WEIGHT, 
VALID_WORD_COUNT_WEIGHT)

def process(path):
    fs, audio = scipy.io.wavfile.read(path)
    processed_data = ds.stt(audio, fs)
    return processed_data   

process('sample.wav')

this might produce same response..use same audio files fir both inference and verify.. the audio files should be 16 bit 16000 hz and mono recording..


Solution 2:

You should convert it to 16000 Hz, most of the issues related to weird output belongs to incorrect audio format. Loading the language model also can improve WER.


Post a Comment for "Why Do The Results Of This DeepSpeech Python Program Differ From The Results I Get From The Command Line Interface?"