Author Topic: Voice-Activated Chat AI (Read 3672 times)

JincS · « **on:** October 14, 2006, 01:56:00 pm »

Ok, I've come up with an idea for a program that I have already started experimenting with (well, I'm writing out the framework, but it doesn't do anything yet). To start, I have two questions for you (whoever you may be):

Who here has ever talked with a chat bot?

If you have, then you've noticed that they only respond to pre-defined keywords that they create automatic responses with. Question #2:

Who here has ever used DragonSpeech, or a program that (poorly) types whatever you say into a microphone?

If you have, you've noticed that they don't work all that well (because everyone in the world sounds different, so speech recognition is very complex).

Anyone see where I'm going with this? Anyway, I was looking at the source code to a chat bot (and optimizing it in my head so that it worked better) the other day, and I thought to myself "ok, so why is it that we can talk (or yell at) to our computers, but they can't talk (or yell) back?" The answer: because nobody has ever thought about combining Artificial Intelligence with basic text-to-speech capabilities!

In short, I've started working a chat program that uses voice instead of text to communicate with the AI. Not only will you be able to talk to the computer through a microphone, but I am also adding in the capability for the computer to literally "talk back" through your computer's speakers.

This is a HIGHLY EXPERIMENTAL project (very long-term) that I plan to work on developing over the next year or two. I'm hoping I can come up with some results before I start applying for colleges in the fall (it would definitely look good on an application).

Questions, comments? All would be appreciated.
===================================

Just a notice: the framework for the SDK is already partially completed. Voice and equalizer data can already be created/manipulated, but I haven't implemented any of the sound functions yet. Gotta do more research.

So far, it should be 100% portable to any GNU/Linux OS (Debian, Redhat, etc.), so long as you have the Ogg Vorbis SDK to build it with. I'm going to do this project 100% portable (I've said that a few times, but this one's for real!), since it's aimed at the open-source community.

I haven't tested it yet though. For the life of me, I can't get my computer to boot the Debian install disks. I've tried EVERYTHING (with the software AND teh hardware configurations), and it's pissing me off...But oh well, life goes on.
===================================

Ok, more information on this experiment:

This little do-hickey now runs off of a combination of OpenAL (Open Audio Library -- www.openal.org) and the Ogg Vorbis SDK (www.xiph.org), as well as a few standard C functions (header file: stdio.h -- look it up yourself). I now have the library set up so that it should be 100% compatible with the following operating systems:

Windows 95 to XP (and Vista)
Most *nix platforms
Mac OS (prolly 8 & above)

Which just about includes all of the popular ones. For the functions: the functions (at first glance) look like they aren't capible of doing all that much, but don't let your guard down! These tiny functions (they all take 2-4 variables) can accept a massive amount of options, which give these FIVE little suckers the power of FORTY TWO seperate functions!

These are ALL of the Functions in the library as of now(Parameters not included for secrecy reasons):

*JSA_Init(parameters); //4 options, takes 4 variables: This function initializes the library for use.
*JSA_Voice(parameters); //10 options, takes 4 variables: This function controls the sound of the voice that the computer uses (male/female, timbre, etc.)
*JSA_Equalizer(parameters); //20 options, takes 4 variables: Controls audio output levels, bass/treble, speakers, etc.
*JSA_Capture(parameters); //4 options, takes 2 variables: Accepts and processes microphone input, records mic input, etc.
*JSA_DeInit(parameters); //4 options, takes 4 variables: This function de-initializes the library, and cleans everything up for a nice program exit.

As like before, I still don't have sound implemented, but I'm working on it. I'm hoping to make a breakthrough very soon...
===========================================

Does anybody know where I can find a good set of grammar rules, one that discusses the CREATION of words, not just the structure (rules like "i after e...", but real ones that actually work)? I need to be able to write a spell-checker algorithmn that doesn't require a dictionary. I believe it'll be the first of it's kind, but I can't afford to have a massive dictionary tagging along with an already massive project.

Radical Pi · « **Reply #1 on:** October 14, 2006, 02:08:00 pm »

This is one of those great ideas that may never be finished, but I would love to see how far you can go with it, hopefully to a public release.

Spellshaper · « **Reply #2 on:** October 15, 2006, 01:52:00 am »

O_O

that is one big complex project I'd never even touch

Good luck on it. Yeah I tried one of those Speech-to-Text programs, and they get about every 3rd word wrong...

JincS · « **Reply #3 on:** October 15, 2006, 08:54:00 am »

Maybe someday (public release). It's more of a project just to research if it is possible (which I have concluded it is, but only if you have ALOT of time and money [not me

]), so I'm not expecting any releases for a LONG time. But hey, it's still a fun little thing to work on.

Radical Pi · « **Reply #4 on:** October 15, 2006, 09:35:00 am »

I would love to see any partially done version. Even if it just acknowledges I'm there, I would be impressed.
It having a full conversation with me, understanding what I'm saying and understanding what IT'S saying on more than a binary level... The concept of perception fascinates me, and so will this project.