Omnimaga
General Discussion => Technology and Development => Computer Programming => Topic started by: Sorunome on July 04, 2014, 11:08:58 am
-
Anyone with pocketsphinx experience here?
The thing is that it only recognizes jibberish for me :(
-
I've never worked with pocketsphinx, but maybe http://cmusphinx.sourceforge.net/wiki/tutorialam http://cmusphinx.sourceforge.net/wiki/tutorialam or
http://cmusphinx.sourceforge.net/wiki/tutorialam can be of help. Are you also recording your voice in 16-bit 16kHz single channel mono as stated?
-
How can i set how many kHz I want to record, using pyaudio?
-
I found an example on http://people.csail.mit.edu/hubert/pyaudio/#examples which I think has the necesarry options. I guess you'd have to change the CHANNELS to 1 and the RATE to its 16kHz equivalent.
"""PyAudio example: Record a few seconds of audio and save to a WAVE file."""
import pyaudio
import wave
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
[\code]
Again, I've never worked with this or audio in general so I'm making some guesses based on the documentation and interwebs.
-
I don't see any kHz setting, all i see is the RATE setting (which I already use here :) )
EDIT: setting RATE caches the recording :( But On playback it tells me it is 44100Hz rate, which is what i set the RATE variable to.
-
That seems like the kHz setting to me. Have you tried setting it to 16000? That's all I can think of. What are you trying to make btw, a secure Skype clone? :P
j/k really though, what are you making? :)
-
Setting it to 16kHz gives me this:
Expression 'r' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2018
Expression 'PaAlsaStreamComponent_FinishConfigure( &self->capture, hwParamsCapture, inParams, self->primeBuffers, realSr, inputLatency )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2655
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2767
Traceback (most recent call last):
File "speechcontrol.py", line 33, in <module>
inputStream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=False,frames_per_buffer=CHUNK)
File "/usr/lib/pymodules/python2.7/pyaudio.py", line 714, in open
stream = Stream(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.7/pyaudio.py", line 396, in __init__
self._stream = pa.open(**arguments)
IOError: [Errno Unanticipated host error] -9999
And I'm just messing with voice control, I mean how epic would it be if you enter your room and say "ok pi, turn on my computer"
-
That would be epic indeed. It seems to have something to do with the stream itself e.g. opening a surround stream on a stereo device won't work.
I don't know what could be wrong though. Could you maybe paste your script?
-
###!/usr/bin/python2
try:
import pocketsphinx as ps
except:
import pocketsphinx as ps
import sphinxbase,pyaudio,wave
hmmd = '/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k'
lmdir = '/usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP'
dictp = '/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic'
hmmd = '/usr/local/share/pocketsphinx/model/hmm/en/tidigits'
lmdir = '/usr/local/share/pocketsphinx/model/lm/en/tidigits.DMP'
dictp = '/usr/local/share/pocketsphinx/model/lm/en/tidigits.dic'
#lmdir = '/home/sorunome/languagemodel_persona.lm'
#dictp = '/home/sorunome/dictionary_persona.dic'
p = pyaudio.PyAudio()
device = p.get_device_info_by_index(0)
CHUNK = 5750
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = int(device['defaultSampleRate'])
inputStream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=False,frames_per_buffer=CHUNK)
frames = []
for i in range(RATE/CHUNK * 5):
frames.append(inputStream.read(CHUNK))
inputStream.stop_stream()
inputStream.close()
p.terminate()
write_frames = wave.open('tmp.wav','wb')
write_frames.setnchannels(CHANNELS)
write_frames.setsampwidth(p.get_sample_size(FORMAT))
write_frames.setframerate(RATE)
write_frames.writeframes(''.join(frames))
write_frames.close()
wavFile = file('tmp.wav','rb')
wavFile.seek(44)
#speechRec = ps.Decoder(lm='/usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP',dict='/usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.dic',hmm='/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k')
#speechRec = ps.Decoder(lm='/home/sorunome/languagemodel_persona.lm',dict='/home/sorunome/dictionary_persona.dic',hmm='/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k')
speechRec = ps.Decoder(hmm = hmmd,lm = lmdir,dict = dictp)
speechRec.decode_raw(wavFile)
print 'EPIC output',speechRec.get_hyp()
Lol, how did the extra hashes reach the first line of code :P