Omnimaga

General Discussion => Technology and Development => Computer Programming => Topic started by: Sorunome on July 04, 2014, 11:08:58 am

Title: pocketsphinx
Post by: Sorunome on July 04, 2014, 11:08:58 am: Anyone with pocketsphinx experience here?
The thing is that it only recognizes jibberish for me :(
Title: Re: pocketsphinx
Post by: ElementCoder on July 07, 2014, 12:55:23 pm: I've never worked with pocketsphinx, but maybe http://cmusphinx.sourceforge.net/wiki/tutorialam http://cmusphinx.sourceforge.net/wiki/tutorialam or
http://cmusphinx.sourceforge.net/wiki/tutorialam can be of help. Are you also recording your voice in 16-bit 16kHz single channel mono as stated?
Title: Re: pocketsphinx
Post by: Sorunome on July 07, 2014, 01:00:29 pm: How can i set how many kHz I want to record, using pyaudio?
Title: Re: pocketsphinx
Post by: ElementCoder on July 07, 2014, 01:11:59 pm: I found an example on http://people.csail.mit.edu/hubert/pyaudio/#examples which I think has the necesarry options. I guess you'd have to change the CHANNELS to 1 and the RATE to its 16kHz equivalent.
Code: [Select]
"""PyAudio example: Record a few seconds of audio and save to a WAVE file.""" import pyaudio import wave CHUNK = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 2 RATE = 44100 RECORD_SECONDS = 5 WAVE_OUTPUT_FILENAME = "output.wav" p = pyaudio.PyAudio() stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print("* recording") frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print("* done recording") stream.stop_stream() stream.close() p.terminate() wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb') wf.setnchannels(CHANNELS) wf.setsampwidth(p.get_sample_size(FORMAT)) wf.setframerate(RATE) wf.writeframes(b''.join(frames)) wf.close() [\code] Again, I've never worked with this or audio in general so I'm making some guesses based on the documentation and interwebs.
Title: Re: pocketsphinx
Post by: Sorunome on July 07, 2014, 01:16:07 pm: I don't see any kHz setting, all i see is the RATE setting (which I already use here :) )
EDIT: setting RATE caches the recording :( But On playback it tells me it is 44100Hz rate, which is what i set the RATE variable to.
Title: Re: pocketsphinx
Post by: ElementCoder on July 07, 2014, 01:19:51 pm: That seems like the kHz setting to me. Have you tried setting it to 16000? That's all I can think of. What are you trying to make btw, a secure Skype clone? :P
j/k really though, what are you making? :)
Title: Re: pocketsphinx
Post by: Sorunome on July 07, 2014, 01:24:53 pm: Setting it to 16kHz gives me this:

Code: [Select]
Expression 'r' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2018 Expression 'PaAlsaStreamComponent_FinishConfigure( &self->capture, hwParamsCapture, inParams, self->primeBuffers, realSr, inputLatency )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2655 Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2767 Traceback (most recent call last): File "speechcontrol.py", line 33, in <module> inputStream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=False,frames_per_buffer=CHUNK) File "/usr/lib/pymodules/python2.7/pyaudio.py", line 714, in open stream = Stream(self, *args, **kwargs) File "/usr/lib/pymodules/python2.7/pyaudio.py", line 396, in __init__ self._stream = pa.open(**arguments) IOError: [Errno Unanticipated host error] -9999
And I'm just messing with voice control, I mean how epic would it be if you enter your room and say "ok pi, turn on my computer"
Title: Re: pocketsphinx
Post by: ElementCoder on July 07, 2014, 01:45:09 pm: That would be epic indeed. It seems to have something to do with the stream itself e.g. opening a surround stream on a stereo device won't work.
I don't know what could be wrong though. Could you maybe paste your script?
Title: Re: pocketsphinx
Post by: Sorunome on July 07, 2014, 02:55:25 pm: Code: [Select]
###!/usr/bin/python2 try: import pocketsphinx as ps except: import pocketsphinx as ps import sphinxbase,pyaudio,wave hmmd = '/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k' lmdir = '/usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP' dictp = '/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic' hmmd = '/usr/local/share/pocketsphinx/model/hmm/en/tidigits' lmdir = '/usr/local/share/pocketsphinx/model/lm/en/tidigits.DMP' dictp = '/usr/local/share/pocketsphinx/model/lm/en/tidigits.dic' #lmdir = '/home/sorunome/languagemodel_persona.lm' #dictp = '/home/sorunome/dictionary_persona.dic' p = pyaudio.PyAudio() device = p.get_device_info_by_index(0) CHUNK = 5750 FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = int(device['defaultSampleRate']) inputStream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=False,frames_per_buffer=CHUNK) frames = [] for i in range(RATE/CHUNK * 5): frames.append(inputStream.read(CHUNK)) inputStream.stop_stream() inputStream.close() p.terminate() write_frames = wave.open('tmp.wav','wb') write_frames.setnchannels(CHANNELS) write_frames.setsampwidth(p.get_sample_size(FORMAT)) write_frames.setframerate(RATE) write_frames.writeframes(''.join(frames)) write_frames.close() wavFile = file('tmp.wav','rb') wavFile.seek(44) #speechRec = ps.Decoder(lm='/usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP',dict='/usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.dic',hmm='/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k') #speechRec = ps.Decoder(lm='/home/sorunome/languagemodel_persona.lm',dict='/home/sorunome/dictionary_persona.dic',hmm='/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k') speechRec = ps.Decoder(hmm = hmmd,lm = lmdir,dict = dictp) speechRec.decode_raw(wavFile) print 'EPIC output',speechRec.get_hyp()Lol, how did the extra hashes reach the first line of code :P