CISC181 S2017 Lab4
Preliminaries
- Make a new project with n = 4 (following these instructions)
- Name your main class "Lab4" (when creating a new module in the instructions above, in the Java class name field)
Instructions
Add TicTacToe.java and TicTacToeBoard.java to your project folder. Make your main() function in Lab4.java look like this:
TicTacToe TTT = new TicTacToe(); TTT.run_tournament();
This will start a "tournament", in which the user can choose an opponent for the next game, decide whether to go first or second, or quit. Then the game starts. Code is already written to handle the logic of each game, including checking for a winner and preventing invalid moves. Currently there are two pre-written options for opponents: human, in which you will alternate choosing moves manually; and random, in which the computer chooses a random open space on its turn with no intelligence whatsoever.
Your job in this lab is to add a third opponent semi-intelligent option, which we will call Monte Carlo (explained below). Specifically, you will fill in the body of the chooseMonteCarloMove() method in TicTacToeBoard class. You should also add your name in a comment before the class body.
Simple tic-tac-toe AI
The idea behind our Monte Carlo approach to tic-tac-toe move selection is simple. Some moves may lead to a certain win or defeat (either immediately or inevitably), while others simply make victory more or less likely. Given a choice of M moves (aka open spaces) on the board at any point during the game -- not just the beginning -- we would like to assess the likelihood of victory or defeat resulting from a first move m in each of the M spaces. Whichever candidate move has the highest likelihood of victory is the one we make. Then the opponent makes their move and we repeat the process until the game ends.
It is not always possible or easy to calculate such a win likelihood exactly, so we try to estimate the likelihood with a probabilistic method. Here we make a seemingly weak but often quite effective assumption that after our first move m (out of M available), all subsequent moves by us and by our opponent will be made randomly (but legally). Given this, we can "simulate" a random conclusion to the game after our first move, and then record whether a win, loss, or tie results. By repeatedly running this simulation for each possible first move, we can tabulate the statistics of wins, losses, and ties for all kinds of possible endgames and get a sense of which options improve or reduce our win probability.
To be specific, suppose we assign a score of +1 to each win, -10 to each loss, and 0 to each tie. Furthermore, let us say we will run T=10000 random games, including a random initial move selection. If we are midway through the game and M=5, then we will simulate roughly 2000 games for each possible first move m. Suppose move m_1 leads to about 1500 wins, 300 losses, and 200 ties, and move m_2 leads to 1100 wins, 600 losses, and 300 ties. Then we can calculate aggregate "win values" W for the two moves as:
W(m_1) = 1500 * 1 + 300 * -10 + 200 * 0 = 1500 - 3000 = -1500 W(m_2) = 1100 * 1 + 600 * -10 + 300 * 0 = 1100 - 6000 = -4900
m_1 appears to be the better move of these two, but there are still 3 other moves to evaluate. Whichever move maximizes W is the one we choose for this turn.
For this assignment, you will implement this approach in the chooseMonteCarloMove() method. This method takes two arguments:
- TicTacToeBoard.Piece curRealPlayer: Which "piece" is about to be played. The value will be either TicTacToeBoard.Piece.X or TicTacToeBoard.Piece.O
- int numRandomGames: How many game simulations to run to decide this move. These should be randomly distributed over the M possible first moves that can be made given the current state of the board.
The return value should be the index of the space that the piece will be placed in -- an int in the range [0, 8], but of course not on a space that is already occupied.
Here are some hints:
- You will need an outer loop that runs numRandomGames times. Each time through this is ONE game, starting from the current board state (as shown by the print() method in the TicTacToeBoard class
For more background, here is an intuitive explanation that I found helpful. More formally, see Monte Carlo tree search (FYI, we are doing the "pure", non-tree version here).
Finishing up
Remember to use proper naming and formatting style throughout your code, and submit ONLY your modified TicTacToeBoard.java on Sakai by Friday, March 10