Why Do The Results Of This DeepSpeech Python Program Differ From The Results I Get From The Command Line Interface?
I'm learning about Mozilla's DeepSpeech Speech-To-Text engine. I had no trouble getting the command line interface working, but the Python interface seems to be behaving differentl
Solution 1:
just include your trie
and lm.binary
files and try again.
from deepspeech import Model
import scipy.io.wavfile
BEAM_WIDTH = 500
LM_WEIGHT = 1.50
VALID_WORD_COUNT_WEIGHT = 2.25
N_FEATURES = 26
N_CONTEXT = 9
MODEL_FILE = 'output_graph.pbmm'
ALPHABET_FILE = 'alphabet.txt'
LANGUAGE_MODEL = 'lm.binary'
TRIE_FILE = 'trie'
ds = Model(MODEL_FILE, N_FEATURES, N_CONTEXT, ALPHABET_FILE, BEAM_WIDTH)
ds.enableDecoderWithLM(ALPHABET_FILE, LANGUAGE_MODEL, TRIE_FILE, LM_WEIGHT,
VALID_WORD_COUNT_WEIGHT)
def process(path):
fs, audio = scipy.io.wavfile.read(path)
processed_data = ds.stt(audio, fs)
return processed_data
process('sample.wav')
this might produce same response..use same audio files fir both inference and verify.. the audio files should be 16 bit 16000 hz and mono
recording..
Solution 2:
You should convert it to 16000 Hz, most of the issues related to weird output belongs to incorrect audio format. Loading the language model also can improve WER.
Post a Comment for "Why Do The Results Of This DeepSpeech Python Program Differ From The Results I Get From The Command Line Interface?"