I'm hacking around on an IRC bot in my spare time, mostly as an interesting exercise in Javascript. It has some basic functionality but it just lacks that special something... so I want to teach it to talk.
Before we get too far into this, I should say that I'm fully aware that NLG is a massive field of research, and I'm not trying to pass any Turing tests here. I don't care if the generated "speech" even makes sense half the time; it's more for amusement than anything else.
My first inclination was to build a Markov model and use simple chains to construct sentences. Unfortunately, the space complexity of this is rather nasty, and the real killer is the amount of data needed to train the model adequately. I don't have a readily available corpus of plaintext to feed into the thing that suits the mood and personality I want to create.
The next obvious route would be to construct a Petri net for the language I want to speak. The major advantage is that this is a compact and fairly efficient way to do poor-man's NLG; the disadvantage is that hand-authoring and tuning a Petri net for nontrivial languages can be a huge time sink.
So I figured I'd poke around here and see if anyone knows of good algorithms for simple NLG that I might be able to take advantage of. I don't mind having to use a huge data set as long as the data is easily constructed and/or readily available in an easily digested format. Runtime is important since this is supposed to be a realtime conversational bot.
Non-goals: contextual recognition, memory, progressive refinement/learning, etc. It doesn't even have to do more than dumb keyword recognition for all I care.
Cheers!