🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Simple mechanisms for low-budget natural language generation

Started by
1 comment, last by ikarth 10 years, 5 months ago

I'm hacking around on an IRC bot in my spare time, mostly as an interesting exercise in Javascript. It has some basic functionality but it just lacks that special something... so I want to teach it to talk.

Before we get too far into this, I should say that I'm fully aware that NLG is a massive field of research, and I'm not trying to pass any Turing tests here. I don't care if the generated "speech" even makes sense half the time; it's more for amusement than anything else.

My first inclination was to build a Markov model and use simple chains to construct sentences. Unfortunately, the space complexity of this is rather nasty, and the real killer is the amount of data needed to train the model adequately. I don't have a readily available corpus of plaintext to feed into the thing that suits the mood and personality I want to create.

The next obvious route would be to construct a Petri net for the language I want to speak. The major advantage is that this is a compact and fairly efficient way to do poor-man's NLG; the disadvantage is that hand-authoring and tuning a Petri net for nontrivial languages can be a huge time sink.

So I figured I'd poke around here and see if anyone knows of good algorithms for simple NLG that I might be able to take advantage of. I don't mind having to use a huge data set as long as the data is easily constructed and/or readily available in an easily digested format. Runtime is important since this is supposed to be a realtime conversational bot.

Non-goals: contextual recognition, memory, progressive refinement/learning, etc. It doesn't even have to do more than dumb keyword recognition for all I care.

Cheers!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Advertisement

I'd second your Markov model idea, and somehow try to work around the training problem.

If you build a simple semantic model using WordNet for example, you could reduce your training data required significantly. So you'd end up learning at the high-level, <pronoun> <verb> <noun>, or possibly more detailed like <pronoun> <eat> <vegetable>. I'm not sure how good NLP / NLG libraries are for Javascript but there are some awesome ones in Python that could help with this.

Anyway, cool project ;-)

Join us in Vienna for the nucl.ai Conference 2015, on July 20-22... Don't miss it!

You might want to look up what was done for the NaNoGenMo project (look on Github). It might give you a few ideas of some of the different approaches.

This topic is closed to new replies.

Advertisement