🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to Artificial Intelligence

Multithreading Monte Carlo - UCB1

Artificial Intelligence Programming

Started by paulo_ May 23, 2015 12:24 PM

5 comments, last by ddyer 9 years ago

paulo_

109

Author

May 23, 2015 12:24 PM

Hello.

I have the basic AI for a card game:


While there is time{
     Find a move that maximizes UCB1
     Play move
     Finish game randomly (if won, increment counter for this move)
}
Choose move that was tried the most.

How would I multithread this?

alvaro

21,604

May 23, 2015 12:52 PM

It would have been nice if you had a more complete description of UCB1. I happen to be familiar with it, but most people here probably aren't.
Also, you slightly misrepresented UCB1 in one key aspect: If lost, increment loss counter for this move.

The usual solution to your problem is to account a "virtual loss" right away when you start exploring the move, so any threads coming after this one will be discouraged from trying the same move. I believe the idea is attributed to Rémi Coulom, but it's very natural and I thought about it myself.

Some reading material: https://www.aaai.org/ocs/index.php/SOCS/SOCS11/paper/viewFile/4023/4374

paulo_

109

Author

May 23, 2015 03:31 PM

Also, you slightly misrepresented UCB1 in one key aspect: If lost, increment loss counter for this move.

I don't think I've read a UCB paper that uses a loss counter. This is my current code:


While there is time{
     Find a move that maximizes UCB1 (move.wins+(explorationRate * sqrt(2*log(totalTries)/move.tries)))
     Play move
     Finish game randomly (if won, move.wins++)
     move.tries++
     totalTries++
}
Choose move that was tried the most.

How would I use a loss counter in this code? Will it be better?

The usual solution to your problem is to account a "virtual loss" right away when you start exploring the move, so any threads coming after this one will be discouraged from trying the same move. I believe the idea is attributed to Rémi Coulom, but it's very natural and I thought about it myself.

For example, if thread1 runs the move with the highest UCB value, thread2 will now run the move with second highest UCB value. Is this correct?

alvaro

21,604

May 23, 2015 05:57 PM

You either keep two counters for wins and losses or you keep two counters for wins and visits (what you call "tries"; I'll stick to the names I'm used to.).

Your UCB1 formula seems to be completely messed up. It should be something like this:

 UCB1 = (move.wins + 1.0) / (move.tries + 2.0) + explorationRate * sqrt(2*log(totalTries)/(move.tries+1))

Now, the idea of the virtual loss is that you increment move.tries right after you pick the move. So while this thread is running the simulation, the other threads will see the statistics as if the result of the random game had been a loss. This doesn't guarantee that they are not going to pick the same move, but they will be less likely to do so.

paulo_

109

Author

May 23, 2015 06:48 PM

Your UCB1 formula seems to be completely messed up. It should be something like this:

Oh yeah, that's what my code looks like, except I don't have the +1,+2. What is that for? Is it because these numbers start with zero? Instead, I initially I visit all the possible moves, so I don't get div by zeros.

Thanks, I'll be trying your suggestions on multi threading.

alvaro

21,604

May 23, 2015 07:30 PM

Adding the +1 and +2 is a minor variation. I believe the original UCB1 is precisely the way you have it, but there are good reasons to have the +1 and +2 (some people just initialize the counters pretending that a loss and a win have already happened, which has the same effect).

I explained the reason for the +1 and +2 in this thread.

ddyer

262

June 11, 2015 10:21 PM

The random playout part needs no special treatment for threads, but the UCT tree-building

and feedback steps need to be carefully synchronized. The most devilish part is, if your

search optimization includes any pruning of the tree, considering what happens if you

remove part of the tree that is in-use by some other thread.

---visit my game site http://www.boardspace.net - free online strategy games

🎉 Celebrating 25 Years of GameDev.net! 🎉

Multithreading Monte Carlo - UCB1

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

🎉 Celebrating 25 Years of GameDev.net! 🎉

Multithreading Monte Carlo - UCB1

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines