Reinforcement Learning Tic-Tac-Toe Agent Demo

contact
Player "X"




Player "O"




First player
Evaluation
Number of games:  
N/A
Set defaults:         
When evaluating more than a 100
games, it is recommended not to
track gameplay history

Board settings
   
Why read-only?
AgentQLearner settings
AgentMinimax settings
 
Warning: approx. 27-times
slower when disabled
AgentQLearner operations

AgentQLearner Q-table dump
Pre-created Q-tables with log

Display technical details... Hide technical details...
AgentPreCalc settings

Technical details AgentRandom: Plays randomly.

AgentMinimax: Uses the minimax algorithm. Can play in two basic modes depending on the setting.
In maximizer mode it cannot be defeated, in minimizer mode it will never win.

AgentPreCalc: Uses a pre-calculated dataset for playing. The dataset is created using AgentMinimax.
Once the dataset is ready, AgentPreCalc has a much better performance than AgentMinimax.

AgentQLearner: Uses a Q-table for learning. It becomes smarter and smarter after each gameplay.
One new instance of it is created when the page is loaded, there are no separate instances for the X and O marks.
Assigning alternating marks to it will not cause any confusion in its Q-table or learning process.
If the browser tab is closed without the Q-table exported, all the training data is lost.
Formula for learning:
Q value for a specific state-action pair
Highest available Q value of any of the actions in the next state.
If the Q value is not calculated for an action, the default Q value is used
Learning rate (alpha)
Discount rate (gamma)

Hide
Game Outcome Log Game count: {{ statItemsGameCount.toLS() }}
# Seconds Players Game Count Win Count Draw Count Lose Rate Visited States Tried Actions
on Visited States
(not counting
invalid moves)
Not Tried Actions
on Visited States
X O X O X O

Gameplay History
Started At Moves

AgentQLearner Dataset Dump (at {{ agentQLearnerMemoryDumpedAt }})   
Board