Let’s go in depth on using AI to test the levels of a platform game.
How do you automate the testing of procedurally generated game levels? That is the question we asked ourselves when developing the Water Battle game. The Water Battle is designed to give children and adults an insight in their water usage. In the game there are levels that can be played to earn points. To ensure that there is enough variation in these levels, and we can release new levels from week to week, we use procedural generation to create these levels. The problem we encountered was that when these levels were generated, errors were sometimes encountered which prevented the level from being completed. Also, by the procedural generation of the levels, it was difficult to estimate beforehand how difficult the levels were. To get this information, all levels had to be played by hand. This of course takes an incredible amount of time. And we can use this time much better to make our games even more fun. In order not to have to test everything by hand, the idea to have the levels tested automatically emerged.
Now, half a year later, a system has been developed that can independently play and test the levels of the Water Battle game. This system, which has the name CHIMP (Challenge Intensity Measurement Platform), tries to play the levels and tries to collect as much information as possible about the level played. This information includes how many pickups have been picked up in the level, what actions have been performed to play the level, and where in the level CHIMP has been. And based on this information, we can determine how difficult the level is. This allows us to use CHIMP to test the levels, and based on the information CHIMP returns, adjust the levels so that they are always passable and more fun to play.
But how did we come to CHIMP, and how was this system created? Let’s start at the beginning.
Too many levels to test
For the Water Battle game we have created a lot of levels and level variations. To ensure that the levels are not repetitive, we have developed a way that creates varied levels from week to week. Because of this large variation of levels, there are too many levels to be tested by hand. To give an example of the scale of the possible number of levels: There are 116 puzzle chunks (pieces level) and 68 runner chunks, all of which can be used to create 15 puzzle levels per week and 15 runner levels. If there are on average 3 chunks per level, then there are 116 ^ 3 possible combinations of this puzzle chunks per level. These are approximately 1.5 million different options per puzzle level. This is quite a lot of possibilities to go and test them all by hand.
The importance of performance, complexity and extensibility
To find a solution that best suits how you want the system to work, there are several factors that need to be considered. The most important factors for us are performance, complexity and extensibility.
It was important that the performance of the system did not ask too much of the game, because it was important for us that we could see what CHIMP is doing. And how cool is it to have a screen in your office where you can see how the game you have made is automatically played! It was also important to ensure that it was good to see why the system has chosen to go for a path or make a leap. When we know this, we can make better adjustments to the level. And the system had to be used in multiple games so that we can test not only for the Water Battle, but also for the other games we make.
Interesting techniques for AI testing
These are, of course, a number of specific factors that we considered important for this system. But to invent a system completely new is quite difficult. That is why we have done a lot of research into possible solutions that can be used to solve our problem. We have encountered a number of interesting techniques. Some of these techniques are self-learning algorithms.
Neural network
For example, a neural network, a system based on how the human brain works, could learn how to play a game. It is only difficult to tell why it makes certain choices. It is also possible that it tries to learn certain actions that a human player will never consider to do. For example a jump of which it seems that it cannot successfully bridge a cliff. But because the neural network can more accurately assess whether the jump can be achieved, it will make it.
Imitation Learning algorithm
Another option was an Imitation Learning algorithm. This algorithm tries to mimic the playing behavior of a player. This will never do an action that hasn’t been performed by the player as well. Only to learn the Imitation Learning algorithm, a lot of data is needed on how a player plays the game. As a result, we would still have to play the levels ourselves.
Graph based path finding
Finally, we decided on a Graph Based Path finding system. This system will create a graph based on the level, where a shortest path algorithm can be used to find a path through this graph. Because there is already a lot of knowledge of how the level should be played, and it is easier to read why the algorithm chooses a path or action, this solution best suited our problem. This method would also make it easier to use it in multiple games. Because the system would not need to learn before playing a new game. It can simply re-analyze the levels every time. That’s why this is better to use for new games.
Considering all possible actions
By evaluating the current situation and which actions are possible, a graph is created. The above image shows how these different actions look. From the current position, an action can be done left or right (the green cubes and lines), a single jump to the left or to the right (the red cubes and lines), or a double jump (the black cubes and lines). In these double jumps there are more variations with the jump that can be made.
If an action is possible, then you can look at the actions possible from that point onward. By doing this, to a certain depth, a graph can be created that can be read by an algorithm to find a shortest path.
Layering possible action graphs
When obstacles get in the way, the graph is developed in a different way. Simple obstacles such as a block that can be jumped over are to be overcome in this way. But obstacles such as a water fountain that must be turned off by pressing a button to get through it, will be placed differently in the graph. Because the player knows which button turns off a water fountain, we also gave this information to CHIMP.
When CHIMP recognizes a button, it knows that the water fountain is turned off by that button. Only the actions taken after, or through, the disabled water fountain cannot be added to the graph in the same way as the normal actions. This is because these actions can only be done when the button is pressed. To solve this, the actions, which can be executed after the button is pressed, are placed in a separate layer of the graph. This makes it a requirement that the button is pressed before these actions can be performed and the player can get over the disabled water fountain. In The picture below you can see an example. In the lower layer, the path does not go beyond the button. To simulate the pressing of the button in the graph, the actions are carried out in another layer, where the water fountain is turned off, and can be jumped over.
This allows us to build a graph that considers the order of the actions being performed. If such a graph is completed, it looks like the image below.
This looks like a big haze of various actions. But for the system it is completely logical. By using a shortest path algorithm, such as the Dijkstra or AStar algorithm, to search from the starting point, where the player is now, to a location as close as possible to the end of the level, a path can be found. The following illustration shows how that path looks after an algorithm has been searched through the graph. By assigning actions to these connections, the player can be moved by the computer and follow the path.
By continuing these steps in succession, CHIMP can play the level independently. In The animated GIF below you can see how the first puzzle and the first runner level of the Water Battle are played.
Analyzing and reporting playthroughs
After CHIMP has completed the levels, it makes a report for us stating what CHIMP has encountered in the level. This includes things like how many times he jumped, how long it took to complete the level, and how many pickups have been picked up. It also states where CHIMP has been in the level, and whether it was stuck somewhere. If we summarize these reports, we can see which levels can’t be completed and which are very difficult to play through.
Optimizing AI performance
One of the problems we have encountered during the development is the performance of CHIMP. We have all kinds of different settings, such as how far the graph will be expanded. So to be able to get an acceptable FPS, we have to weigh performance against CHIMP’s ability to look ahead. Because a lot of processing power is needed to analyze the level, we have lowered the depth of the graph. Because of this, CHIMP can only look ahead a limited distance, and sometimes it does not immediately notice which path is the bes. In order to find the right path, CHIMP needs a little more time while it may be immediately clear to the player.
Final words
We can conclude that Graph Based pathfinding is the best way to automate level testing given our considerations: balancing performance optimization, the ability to look ahead and extensibility to other games.
Hopefully, CHIMP will help us in the future to test the levels of all our games. Then we certainly have enough time to make our games even more fun and challenging!
Did you like this article? Then you might be interested in our other in depth articles on procedurally generate water paint stains and ‘Everything Procedural‘. Stay up to date on new articles by subscribing to our newsletter.