Tutorial showing how to implement Markovian Linked Lists (Chains) based on examples from the game development industry for programmers and developers of all genres.
In programming terms, being able to generate samples of data on the fly such that it looks like natural occurring phenomena is very useful. From financial modeling, to language generation, through to landscape creation for video games, taking real data, sampling it and using that to generate convincing synthetic versions is key in many areas. One technique for doing this is known as the Markovian Linked List.
One particular application is in generating natural-sounding names based on samples of real country names. This is the example we are going to be using to illustrate the article; without the actual data, but as a theoretical application.
A linked list is one in which each node is pointing to another. This could be a sibling, parent, or just the next node in the list, depending on whether we are building a list or tree. Each link has equal weight. In other words, if we assume that we have a node that connects to 10 others, each possible link would have an equal chance of being chosen at random.
On the other hand, a weighted linked list contains links which have a specific weight depending on how 'strong' that link is. To illustrate this, let us consider a list of country names. Each one consists of letters, one following the other.
We could analyze this list of countries, and tabulate the frequencies of each pair of letters in an adjacency table. This is just a two-dimensional array of numbers, where each number indicates the amount of times that the letters are paired in the sample data. The array is 28x28 cells, allowing for each letter of the alphabet, and leading and trailing spaces.
When we build the table, we run through each country name, and analyze the letter pairs. So, if the first letter is an 'A', we can increment the cell referenced by ' A', or [0][1]. Assuming the country is Australia, we can then increment the cell referenced by 'Au', or [1][21]. We then proceed through the country, ending with 'a ', or [1][0].
We can also make a linked list version, in which the link count is incremented. The abstract data type for the node might look like:
When we build the list, we add it to the link_target array, and increment the appropriate link_count. We can then follow the link, and do the same analysis again with the next data pair.
When we generate sample data, we need to establish the chances of choosing one of the links at random. To do this, we need first to sum all the valid link_count entries. This gives us a total, which we can then use to generate a random number.
Once we have this random number in hand, we can move through the link_count array, adding the values to a running total. When the running total passes the random number chosen, we select the appropriate link_target. This might be coded thus:
Using this algorithm, almost any data pair, or even tuple set, can be analyzed and natural looking synthetic data sets generated. It can be used for language, terrain, or behavior generation and with some experimentation can produce startling results.