Play with it…
Learning Rate: How aggressive the AI will learn to play (close to 0 will be too slow, while close to 1 will simply replace the old learned value with the new one). Higher is not necessarily better
Discount Factor: Importance between immediate rewards and future rewards
Action Randomization: Percentage of time a random action will be executed instead of the desired action
You can check the GitHub Repository to see the source code of this project.
If you want to play some snake yourself, you should try clicking the canvas and entering the Konami Code! See what happens…
How it works
Snake (the game itself)
Snake is a game in which a snake needs to explore an environment and catch the fruit without hitting any obstacle or itself. Every time the snake catches a fruit, its size increases (for practical reasons explained in the next section, the snake has a fixed size in the live example above).
To develop this part of the project, I used these contents for guidance:
- Coding “Snake” in 4 min 30 sec
- Mastering the Module Pattern (not really about Snake itself, but a pattern I tried to follow on the project)
game.js for the source code.
Accordingly to Christopher Watkins, “Q-Learning is a simple way for agents to learn how to act optimally in controlled Markovian domains” (Watkins, 1989). In simple terms, it uses a MDP (Markov Decision Process) to control and make decisions in an environment. It consists of a Q-Table that is constantly updated.
A Q-Table is a matrix with a set of states and respective action’s probability of success. When the agent explore the environment, the table is updated. The action with the biggest value is considered the best action to make.
In this guide I will explain how I applied Q-Learning in the Snake game. If you want to understand more deeply (yet in a simple way) about Q-Learning and Reinforcement Learning, I suggest this Medium post by Vishal Maini.
q-learning.js for the source code.
The algorithm consists of:
|s = game.state()||get state s|
|act = best-action( Q(s) )||execute best action act given state s|
|rew = game.reward( )||receive reward rew|
|s’ = game.state()||get next state s’|
|Q(s, act) = update-qTable( )||update Q-Table value q of state s and action act|
The new Q-Table value for the action accomplished is given by this formula (taken from the article by Vishal Maini):
It’s executed after the action is taken and the reward is known.
Available actions are “UP”, “DOWN”, “LEFT” and “RIGHT”, simulating a user interaction with the game.
The best action is selected by choosing the biggest action value in a certain state in the Q-Table. If the maximum value equals 0, a random action is selected.
The only reward is given when the snake grabs the fruit ( +1 ). I tried also giving a small reward (approximately +0.1) when it successfully explored the environment without dying, but the result was that the snake only moved on the environment in circles without really caring about the fruit.
The penalty happens whenever the game resets ( -1 ), that is, the snake hits its tail or a wall.
If anything else happens, there’s no reward ( 0 ).
|Catch the Fruit||+ 1|
|Hits tail||- 1|
|Hits wall||- 1|
First I tried creating a dictionary of states based on all the snake positions and trail formats. Although it worked, because of the high number of states, it was necessary to let the code train for a very long time. Since this project was intended to give a fast explanation and to show a fast result to the user (and the results are not saved across sessions), this way to save states was not the best approach.
To work around this limitation, the tail got a fixed size and the dictionary of states is based only in the relative position of the fruit to the head of the snake and the relative position of the last tail section to the head of the snake.
In this way, the dictionary of states store the name like this:
If you want to test this algorithm with a full set of states, you can clone the project on GitHub and change a few lines to see how it behaves.
- Q-Learning-Python-Example (an implementation of Q-Learning for the game “Catch the Ball”, which I used to understand the algorithm steps)