RL PROTOTYPE

Reinforcement Learning Prototype

Reinforcement Learning (RL), a type of machine learning, is another branch of AI with demonstrated uses in the area of video game development and character modeling. RL has two main characteristics: (1) agents receive positive and negative rewards from the environment they interact with, and (2) agents will act so as to maximize long-term, cumulative reward.

Prototype Overview:

The way the reinforcement learning algorithm will be placed into the game will allow the player to observe the algorithm work while being able to manipulate how the algorithm runs during the game. As our entire projects revolves around robotic mining the reinforcement learning algorithm will be represented by a mining robot that searches for the best path through an area to get to where the minerals are. The area the robot is navigating, is depicted as a maze using actors in Unreal Engine known as target points that lay out the different states placed within the maze that contain properties allowing the Robot actor to move within the maze. The algorithm itself is done using a greedy choice algorithm that picks the option that has given the most reward previously. The greediness of this algorithm would lead to one sided choices ( the same choices being picked constantly) in order to alleviate this there is a degree of randomness in the action picking process. Finally, the player will be able to look inside the mind of the algorithm by accessing a computer that connects to the brain of the robot and shows the decision making grid of the simplified reinforcement-learning-based algorithm.

Algorithm Explanation:

In order to implement reinforcement learning in our game environment we chose to use a simplified reinforcement-learning-based algorithm. The algorithm is used to find the optimal way of achieving some goal; in our case completing a maze. Our simplified reinforcement-learning-based algorithm is one of the most basic algorithms that can be used to implement a reinforcement learning system which will allow us to more effectively reach our target audience. The algorithm model consists of an agent, a set of states and a set of actions which can be performed at each state. After each action is executed the agent is provided with a reward, the goal of the algorithm is to maximize its total reward by learning what action is optimal at each state. The information gathered throughout the process is stored in a table.

Algorithm Implementation:

The algorithm works first by taking in a state represented by an integer.
- This integer corresponds to a row in a table similar to a Q-table.
- This state is passed in from the environment which means that the environment itself must label each point of the map with an integer in order for this algorithm to work.
The algorithm then looks at the row for that state and selects the four column values that represent the various actions that can take place at that state.
- In this situation there are four actions for each state representing the four different directions that the robot can move within the maze (Forward, Backward, Left and Right).
- An action represented by a negative one is read as an unavailable action in the maze situation it would represent a wall on that side.
The algorithm then randomly chooses either to greedily pick an action or to randomly pick an action.
- The reasoning behind choosing randomly is to account for new, better paths appearing in the maze. If choosing greedily, the algorithm will look through the available actions (discounting all negative values)and pick the result that has the highest value within the row, if two in the row have the same value it is randomly chosen which action to perform.
Next, the state and action are added to lists in order to track the entire path the robot takes during the run through the maze.
- Once the robot makes it through the maze the lists are parsed through and the successful path is used to increment or reward the values pertaining to taking an action at a given state.
- If the robot reaches a dead end it retraces its steps by parsing backwards through the list until another decision could have been made and then attempts that decision instead.
The rewards are given at the end of a complete run through the maze and a correct path is found, each state in the final path list is awarded a point to their score to factor in during the next run the award will be valued differently depending on the length of the path( shorter paths will get higher rewards).
- If a dead end is found the actions chosen will lose value to a limit there zero in order to reduce the probability of being picked.

RL1.jpg	RL2.jpg	RL3.jpg
RL4.jpg

I AM AI

Interactive Agent Modeling for introducing Artificial Intelligence

Reinforcement Learning Prototype

RL1.jpg

RL2.jpg

RL3.jpg

RL4.jpg