
Kolmogorov-Chaitin Complexity
Nikos A. Salingaros, The University of Texas at San Antonio. Trying to measure the complexity of a system is not straightforward. The simplest measure of a system's complexity reflects not so much its intrinsic complexity, or the complexity of the process that generated it, but the complexity of the system's description. As such, this approach is not without its limitations. We can measure the complexity of a one-dimensional system by using a very simple notion: the length of its description in some programming language. This measure is known as the Kolmogorov-Chaitin Complexity. Analyzing a string of letters or numbers might reveal a pattern that repeats, and in that case, its description becomes shorter. I'm going to use binary entries of 0 and 1 in 70-character strings, yet the discussion is valid for any letters or digits. For example, it is obvious that this information string is not complex at all:1111111111111111111111111111111111111111111111111111111111111111111111Its generative algorithm is: repeat the digit 1 70 times, or "1 x 70". On the other hand, the more complex string1010101010101010101010101010101010101010101010101010101010101010101010still has only minimal complexity, since its algorithm is a short one: "10 x 35". But it has slightly more complexity than the first trivial string. The next orderly sequence is very different, yet it also possesses low complexity: 1111111111111111111111111111111111100000000000000000000000000000000000It has the generating algorithm "1 x 35, 0 x 35". Something non-trivial occurs only at the 36th position. For genuinely complex strings of digits, we face two distinct types of complexity. First, there is organized or coherent complexity, such as the following binary sequence: 1000110101100011010110001101011000110101100011010110001101011000110101Analyzing this information string for patterns, we discover hidden symmetries, and those allow us to reduce its generating code to alternating 10001 with 10101 7 times, or "10001 alternate 10101 x 7". We do have a complex string, yet internal patterns reduce the coding length of its generating algorithm. The second type of complexity represents random, or disorganized complexity, as for example the following random sequence:0010001111010101100001011100101111110101111111101001110001101000100101It has maximal complexity, because its shortest description is exactly as long as the string itself. There is no possible compression, such as hidden patterns in its descriptive rule. We can be sure of that, since we generated this string by using a random-number generator. Probability arguments produce one complication, however: it is quite possible to obtain internal patterns in a randomly-generated string. But you cannot count on it, and might have to try an astronomical number of times before you come up with a regular pattern. The key mechanism revealed by these examples is the presence or not of patterns, which then permit compression of the informational string. Total compression is equivalent to informational collapse, and this occurs in cases of extremely low complexity. In examples of strings with non-trivial complexity, the ability to compress them reduces their complexity measure, but never to near zero. I will now compute the length of the description of each algorithm for the informational strings presented as examples. This is a simple character count of the description in English, and I obtain the following values for all the strings:- 1 x 70 → 4 characters
- 10 x 35 → 5 characters
- 1 x 35, 0 x 35 → 9 characters
- 10001 alternate 10101 x 7 → 21 characters
- [Random] → 70 characters
And there you have it. Thanks, Nikos, for this introduction. Now I'd like to elaborate on some of the points raised in this essay with a few thoughts of my own.First, the numerical value of the Kolmogorov-Chaitin (KC) complexity metricdepends on the programming language used to describe the generatingalgorithm. In the original three volumesof The Art of Computer Programming, computer scientist Don Knuthillustrates algorithms using an assembly language he created that is representative of computer capabilities around1970. In volume four, written decades later, he introduced a revised assembly language representative of computers from the presentmillennium. If one translates algorithms from the old language intothe new, they often become shorter, because the new language is insome sense more powerful. It's not that older computers couldn'tperform the same calculations that newer ones do -- they just took alot longer and needed a lot more instructions.For example, older computers were often limited to working with 8 or16 bit integer quantities. To compute the area of a circle using thefloating point value of pi (3.1415926...) to 16 decimal digits ofaccuracy took a large number of instructions, just as if you had to dothe multiplication long hand with paper and pencil, digit by digit.In contrast, modern computers typically work with 64 or even 128 bitsat a time, and have built-in instructions for working with floatingpoint numbers, which means we can code such calculations with manyfewer instructions, typically just one per operation.Second, computing the KC metric requires knowing the algorithm thatgenerated the string. Often, this algorithm is not available to theend user. Strictly speaking, we want the length of the shortestprogram capable of producing the given string. There is always thetrivial program
print "sss"where the actual string is putinside the double quotes in place of "sss". This program is only 8characters longer than the actual string, so the KC complexity of astring of N letters is at most N+8, but is often less, as Nikosillustrated. A human looking at the string can often see patterns thatenable a shorter description, again as illustrated by Nikos, but wewonder, can a computer find such patterns, and, can we know we havereached the smallest possible description? For any one specificstring, we may be able to convince ourselves that we have the shortestpossible algorithm for generating it. But the task in general isimpossible: one cannot write a computer program (in any language) thatis capable of computing the KC complexity of an arbitrary inputstring. The Wikipedia article sited above has a proof.Essentially, it is because computing KC complexity is equivalent tothe halting problem (computing whether an arbitrary program willalways halt), which is also impossible. In fact, for similar reasons,mathematicians cannot prove every true fact; this is called Godel'sincompleteness theorem.Third, KC complexity is related to the topic of data compression. Whenyou watch a movie or (hi-def) TV show on your computer, you are ableto enjoy it in real time only because the data that makes up the moviehas been compressed. This is possible because real movies do not showrandom noise; instead, they show coherent images of people andbuildings that change only slightly from one frame to the next.The data for the movie is a sequence of many thousands of whole numbersspecifying the amount of red, green and blue light in every pixel ofthe screen in every frame. The TV displays many frames per second togive the illusion of motion. Much like in the examples earlier, thesimilarities from one frame to the next allow compression. One method,called "run length compression", replaces repeating symbols with acount, much like when Nikos writes something like '1 x 7' to mean1111111. Other more sophisticated methods exist that take into accountthe space-time structure of the data in movies -- the fixed size twodimensional grid repeated with minor changes over time -- to achieveeven greater amounts of compression.This connects back to the point that you need to know the algorithm thatgenerated the string (or data file) before you can compute the KCcomplexity. If you view a movie data file as individual bytes of data,they will look very random indeed, and you may find it difficult tocompress it very much. However, if you know in advance that the datais not in fact arbitrary, but represents a movie with a given screenresolution, you will expect to find certain patternsand correlations in the data, which your compression algorithm canexploit.Fourth and finally, KC complexityis also related to the question of how wedefine randomness. True randomness, like radioactive decay,requires infinitely many bits. Computers, which are finite, can onlycompute what mathematicians call pseudo-random numbers:sequences that look random on naive inspection, even though they areactually generated by deterministic algorithms -- often quite shortalgorithms, in fact, with quite low KC complexity values. Inparticular, pseudo-random number algorithms always wind up repeatingthemselves after a finite number of random draws. For instance, agenerator whose internal state consists of integers from 1 to N willnecessarily repeat after at most N draws, and possibly much soonerdepending on the algorithm. Knuth's Volume 2 (mentioned above) goesinto a lot of detail on generators, and subsequent researchers havecome up with even better ones, but they are all still finite.One can pretend that a computer has access to an unbounded amount ofmemory space and cpu time, and discuss algorithms like cellularautomata whose state can grow arbitrarily large over time. While suchalgorithms might seem to escape from the tyranny of having a finitenumber of bits, in practice they cannot really solve the problem: asthe size of their internal state grows, so too will either the time toaccess it, the memory to store it, or both, to the point where anyreal computer will eventually have no resources left.Moreover, randomness (or pseudo-randomness) is a property of thegenerating algorithm, not of the generated sequence. Suppose I have asource for generating random bits (either real random bits from aradioactive source, or pseudo-random ones from an algorithm with avery large internal state and very few correlations). Suppose too that0 and 1 are equally likely. Then, if I draw a sequence of 70 bits, thestrings 1111111111111111111111111111111111111111111111111111111111111111111111and 0010001111010101100001011100101111110101111111101001110001101000100101are in fact equally likely to emerge from the generator, justlike all the other strings in Nikos' examples above.Note that I am speaking here of the specific sequence ofbits. It is true that if you count the number of zeros and ones, it ismore likely that you will get something close to 35 of each, ratherthan 70 vs 0. However, that does not make the string of 1x70any less likely than the more "random" looking example: each has achance of 2^(-70) of emerging from the generator, just like all theother 2^70 possible bit strings of length 70. However, it is the onlyone with no zeros, whereas there are many with 35 zeros and 35 ones. Thus a string that emerges from a random process need notactually look random to a human: any specific string may haveinternal symmetries or regularities, with 1x70 being an extremeexample. That is why randomness is a property of the generatingalgorithm, not the resulting string. And even there, we have to becareful: if you know the generating algorithm (and its current internalstate), suddenly you can see through the randomness -- you can calculatesuccessive bits before they are drawn. So the most one can hope forfrom a pseudo-random number generator is that someone who does notknow the algorithm will be unable to detect any statisticallysignificant difference between the output of the generator and a true(e.g. radioactive) random source, when they are allowed to examine a "long"sequence of bits. Even "long" is subtle: not so long as to approachthe period of the generator (after which it would start to repeatitself), but otherwise long enough to have a lot of statistical power(i.e. 70 bits is not very many for testing). Good modern pseudo-randomgenerators can pass a series of stringent statistical tests known asDie Hard that examines millions of bits for a wide variety of correlation patterns.For readers curious to learn more, an excellentstarting point is the book Information Theory, Inference, and Learning Algorithms by David MacKay, which is free for download on the web, but which is also well worth purchasing, in part for its beautiful typography (courtesy of Knuth's mathematical typesetting system "TeX" -- yes, the same Knuth -- it is a very small world).If you enjoyed this article, you might also like Algorithmic Art, from which comes the picture at the top of today's post.It is a visualization of algorithmic complexity and randomness all mixed together with (hopefully) some aesthetic interest.Also, as you may recall, Nikos wrote two previous Guest articles,Why Monotonous Repetition is Unsatisfying, and Applications of the Golden Mean to Architecture.He has also writtenThe"Life" of a Carpet: an Application of the Alexander Rules, whichis the best (most "mathematically" oriented) article I have yet seen onthe web on the topic of what characteristics of a work of art make itaesthetically pleasing.As usual, please post questions, comments and other suggestions using the box below, or G-mail me directly at the address mentioned in the Welcome post. You can sign up for email alerts about new posts by entering your address in the widget on the sidebar, or follow @ingThruMath on Twitter, for a tweet about each new post. The Contents page has a complete list of previous articles in historical order. You may also want to use the 'Topic' and 'Search' widgets in the side-bar to find other articles of related interest.If you enjoyed this article, click the "M"button below to email it to a friend, or the "t" button toTweet the link, or the "f" button to share it on Facebook.
Hiç yorum yok:
Yorum Gönder