Omnimaga

General Discussion => Technology and Development => Computer Programming => Topic started by: Sorunome on July 04, 2014, 11:08:58 am

Title: pocketsphinx
Post by: Sorunome on July 04, 2014, 11:08:58 am
Anyone with pocketsphinx experience here?
The thing is that it only recognizes jibberish for me :(
Title: Re: pocketsphinx
Post by: ElementCoder on July 07, 2014, 12:55:23 pm
I've never worked with pocketsphinx, but maybe http://cmusphinx.sourceforge.net/wiki/tutorialam http://cmusphinx.sourceforge.net/wiki/tutorialam or
http://cmusphinx.sourceforge.net/wiki/tutorialam can be of help.  Are you also recording your voice in 16-bit 16kHz single channel mono as stated?
Title: Re: pocketsphinx
Post by: Sorunome on July 07, 2014, 01:00:29 pm
How can i set how many kHz I want to record, using pyaudio?
Title: Re: pocketsphinx
Post by: ElementCoder on July 07, 2014, 01:11:59 pm
I found an example on http://people.csail.mit.edu/hubert/pyaudio/#examples which I think has the necesarry options. I guess you'd have to change the CHANNELS to 1 and the RATE to its 16kHz equivalent.
Code: [Select]
"""PyAudio example: Record a few seconds of audio and save to a WAVE file."""

import pyaudio
import wave

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
[\code]
Again, I've never worked with this or audio in general so I'm making some guesses based on the documentation and interwebs.
Title: Re: pocketsphinx
Post by: Sorunome on July 07, 2014, 01:16:07 pm
I don't see any kHz setting, all i see is the RATE setting (which I already use here :) )
EDIT: setting RATE caches the recording :( But On playback it tells me it is 44100Hz rate, which is what i set the RATE variable to.
Title: Re: pocketsphinx
Post by: ElementCoder on July 07, 2014, 01:19:51 pm
That seems like the kHz setting to me. Have you tried setting it to 16000? That's all I can think of. What are you trying to make btw, a secure Skype clone? :P
j/k really though, what are you making? :)
Title: Re: pocketsphinx
Post by: Sorunome on July 07, 2014, 01:24:53 pm
Setting it to 16kHz gives me this:

Code: [Select]
Expression 'r' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2018
Expression 'PaAlsaStreamComponent_FinishConfigure( &self->capture, hwParamsCapture, inParams, self->primeBuffers, realSr, inputLatency )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2655
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2767
Traceback (most recent call last):
  File "speechcontrol.py", line 33, in <module>
    inputStream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=False,frames_per_buffer=CHUNK)
  File "/usr/lib/pymodules/python2.7/pyaudio.py", line 714, in open
    stream = Stream(self, *args, **kwargs)
  File "/usr/lib/pymodules/python2.7/pyaudio.py", line 396, in __init__
    self._stream = pa.open(**arguments)
IOError: [Errno Unanticipated host error] -9999

And I'm just messing with voice control, I mean how epic would it be if you enter your room and say "ok pi, turn on my computer"
Title: Re: pocketsphinx
Post by: ElementCoder on July 07, 2014, 01:45:09 pm
That would be epic indeed. It seems to have something to do with the stream itself e.g. opening a surround stream on a stereo device won't work.
I don't know what could be wrong though. Could you maybe paste your script?
Title: Re: pocketsphinx
Post by: Sorunome on July 07, 2014, 02:55:25 pm
Code: [Select]
###!/usr/bin/python2
try:
        import pocketsphinx as ps
except:
        import pocketsphinx as ps

import sphinxbase,pyaudio,wave

hmmd = '/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k'
lmdir = '/usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP'
dictp = '/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic'

hmmd = '/usr/local/share/pocketsphinx/model/hmm/en/tidigits'
lmdir = '/usr/local/share/pocketsphinx/model/lm/en/tidigits.DMP'
dictp = '/usr/local/share/pocketsphinx/model/lm/en/tidigits.dic'

#lmdir = '/home/sorunome/languagemodel_persona.lm'
#dictp = '/home/sorunome/dictionary_persona.dic'

p = pyaudio.PyAudio()

device = p.get_device_info_by_index(0)


CHUNK = 5750
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = int(device['defaultSampleRate'])

inputStream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=False,frames_per_buffer=CHUNK)

frames = []
for i in range(RATE/CHUNK * 5):
        frames.append(inputStream.read(CHUNK))

inputStream.stop_stream()
inputStream.close()
p.terminate()
write_frames = wave.open('tmp.wav','wb')
write_frames.setnchannels(CHANNELS)
write_frames.setsampwidth(p.get_sample_size(FORMAT))
write_frames.setframerate(RATE)
write_frames.writeframes(''.join(frames))
write_frames.close()


wavFile = file('tmp.wav','rb')
wavFile.seek(44)
#speechRec =  ps.Decoder(lm='/usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP',dict='/usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.dic',hmm='/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k')
#speechRec =  ps.Decoder(lm='/home/sorunome/languagemodel_persona.lm',dict='/home/sorunome/dictionary_persona.dic',hmm='/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k')
speechRec = ps.Decoder(hmm = hmmd,lm = lmdir,dict = dictp)
speechRec.decode_raw(wavFile)
print 'EPIC output',speechRec.get_hyp()
Lol, how did the extra hashes reach the first line of code :P