🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Computer understands your text and is able to speak back

Started by
34 comments, last by Minsc&Boo 7 years, 9 months ago

Quality speech recognition engines have been around for decades. Assuming you're on Windows, there was a speech recognition engine that was added way back in Windows 95 and has been evolving ever since; Microsoft's built-in speech recognition is quite good out of the box. Best results come from giving them a limited vocabulary of recognized words, since they can scan just the few for high probability matches. You get worse results by allowing natural language and comprehensive dictionaries, but as described, that can still be done.

Recognition is the more tedious step, but isn't difficult. You build your speech recognition grammar that basically says "These words mean this token", and then "these tokens are valid", then you process the tokens as commands.

Text-to-speech is quite easy, although the default voices provided by the system are somewhat bland and computerized. It can be literally as easy as calling a fire-and-forget function like SpeakAsync(myTextString). Most games prefer to use voice actors and pre-recorded lines.

A bigger problem is that voice command of games usually isn't fun. Also many people cannot play them for various reasons like a lack of microphone, being in places where calling out game commands is inappropriate, or being in environments where external noises are a problem.

Advertisement
You *DO* need those symbols to understand speech. And you *DO* need that context information. You get it through intonation and pattern recognition that your brain is very good at processing. So good you often don't consciously realize that you are doing it.

The people at OpenAI have been doing a fair bit of work in this area. Last I saw they were training a neural network on masses and masses of redit conversations. Whether it is truly understanding what it reads is another matter but I think it was giving appropriate responses. As others have said, this is an entire research field and probably one of the most challenging.

Interested in Fractals? Check out my App, Fractal Scout, free on the Google Play store.

The discipline you're looking for is natural language processing. It's one ofthe topics of modern robotics too, where a robot must understand what a human says.

A Python library that I know to exist is NLTK (nltk.org). Never done anything in this direction, so likely my knowledge is partial at best and totally missing much better fish at worst.

You guys are dead set on the punctuation part of my research. You know after thinking about this the only problem I can't seem to solve is memory. We have to remember what we're talking about to properly communicate. I'm just not sure how to keep my AI's memory in tact after the game turns off. I don't think punctuation is as important as some proclaim however it will be considered.

With that being said, I'm designing everything right now. I feel like I can handle this.

You know after thinking about this the only problem I can't seem to solve is memory. We have to remember what we're talking about to properly communicate. I'm just not sure how to keep my AI's memory in tact after the game turns off.


If you can have a memory during the time the program/game is running, then you can have a memory when it is not. Just write the data to a file.


To be utterly frank for a moment: if you can't think of how to solve writing a data structure to disk then you are absolutely, undeniably not experienced enough to be mucking with natural language processing and ignoring solid advice about how to do it.

NLP is a very hard and broad field. It has been around for decades and touches on some of the most advanced research in applied computer science. Throwing that away and pretending you can do better than the accumulated expertise of hundreds (if not thousands) of brilliant people, just because you "dreamed the code", is not healthy.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

NLP is a very hard and broad field. It has been around for decades and touches on some of the most advanced research in applied computer science. Throwing that away and pretending you can do better than the accumulated expertise of hundreds (if not thousands) of brilliant people, just because you "dreamed the code", is not healthy.

There is no tutor more capable than failure.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

Since this is game dev, you would expect the players text to be about some game context.

Likewise the text communication is to be some query or command.

Even for simple verb+noun or verb communications there is some relevant context which the computer must limit its processing to.

Recognizing the words is only the first step for the computer to figure out what the player is asking for with this input, and usually to save on avoiding lawyer-like exactness some assumptions based on the current game context would be applied.

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

You know after thinking about this the only problem I can't seem to solve is memory. We have to remember what we're talking about to properly communicate. I'm just not sure how to keep my AI's memory in tact after the game turns off.


If you can have a memory during the time the program/game is running, then you can have a memory when it is not. Just write the data to a file.


To be utterly frank for a moment: if you can't think of how to solve writing a data structure to disk then you are absolutely, undeniably not experienced enough to be mucking with natural language processing and ignoring solid advice about how to do it.

NLP is a very hard and broad field. It has been around for decades and touches on some of the most advanced research in applied computer science. Throwing that away and pretending you can do better than the accumulated expertise of hundreds (if not thousands) of brilliant people, just because you "dreamed the code", is not healthy.

Whoa, I know I've been dismissive lately focusing on the suggestion to include punctuation but I'm disagreeing with this forum exclusively. Sorry about my stubbiness this does hinder me sometimes. With more focus on the main thing I'm against, I want to attempt to teach my AI to understand without punctuation. thank you all for this suggestion, and other bit of info that will contribute to my research.

Have you looked at the links I gave above?

You don't need to do full natural language recognition, just enough of the key words your game cares about. You can build an enormous vocabulary of words that become a small set of tokens, and those tokens form a recognizer grammar. As far as punctuation you can include that in the grammar, perhaps recognizing "period", "dot", "point" as a token, registering "question" and "question mark" as a token, "exclamation", "exclamation point" and "bang" as a symbol, etc. There are many libraries that work this way, you can find them on all the major platforms either built-in or as free/inexpensive libraries.

On Windows they are available as simple-to-use COM objects, just like Direct3D and other standard COM objects. Tell the system you want a SAPI instance and use it for both pattern recognition and for text-to-speech, documented with the links above.

On other platforms there are similar systems but the W3C has created standardized formats for the grammars so they're generally portable. I've used a few with Java bindings, they're basically mix-and-match. Throw the system a grammar to recognize, occasionally get called with a bunch of recognized tokens that you can mess with.

This topic is closed to new replies.

Advertisement