The human goes first. On each turn, you must remove at least one stick, and may remove any number of sticks provided they are all on the same row. The player who removes the last stick loses.

After you are done, press 'AI move' for the AI to make a move. The AI is trained by reinforcement learning (Q-learning).

By playing against itself for 10,000 games, and assigning rewards and punishments to its moves, the AI is able to learn the optimal move to play at any state. To balance exploration and exploitation, the AI chooses its moves based on the epsilon-greedy algorithm.

Toggle computation to allow the AI to train against itself in real time (check the console). Else by default, the AI takes the optimal move from a pre-trained Q-value table so as to save your computing resources.

Source code: Elliott Chong