The Astana Challenge
Concealed within the Book are secrets and mysteries (the “Clues”) that, once solved, reveal a hidden message (the “Solution”). The first person, or group of persons, (the “Sleuth”) to deduce the Solution in its exact wording, must follow the Instructions (see below) to become the verified “Winner” and claim the Prize (see below).
I was given a few explicit hints. Firstly, there is a message hidden within the book. Secondly, a video on the challenge's website states that the message is "veiled in symbolism" and "hidden in plain sight".
That sounds pretty daunting at first, but once you open the book you will find a pretty big hint: there are glyphs at the bottom of some pages.
Glyphs from the bottom of page 51
These glyphs match all of the hints. They are within the book, symbols, and hidden in plain sight. However, they are abstract symbols that I had yet to match to any existing data sets. Knowing nothing about the glyphs makes this a bit of a rough starting point, but it is something. The rest of this article will outline my initial approach to analyzing the glyphs.
Separating The Glyphs
With glyphs in the center of my mind, I started working on moving them to a digital platform. I did this because there was 1,008 glyphs, which would make any type of analysis take ages to do by hand. I used a knife to cut the pages from the book, then fed them into the scanner. This was pretty tedious as the pages stuck together very easily, so I had to feed each one in by hand. It took 6 hours in Staples, and a lot of frustrating printer errors (I still have no idea what the "scan multiple" checkbox is for, but it definitely doesn't allow you to scan multiple pages), but I got all of the pages with glyphs scanned.
Once I was done scanning, I went straight to building a program that would extract the glyphs from pages.
Cleaning and Sorting
I needed to sort out the pages with glyphs and rotate the images so that they would not be slightly rotated and possibly fail to match similar glyphs. Originally, I attempted to automate this process by matching four semi-vertical lines on the page (surrounding the left and right blocks), but this proved to be very difficult, and very inaccurate. I also tried using horizontal lines, but the columns were not aligned due to a difference in section header padding, which would have lead to incorrect rotation results. As a result, I ended up doing it by hand. However, that would take ages in a standard image editor, so I decided to write a program to help me do it more quickly through keybindings. This also allowed me to sort the images as I cleaned them up.
A screenshot of the program I was using to clean up pages
Additionally, a few pages had white text on black backgrounds. I just inverted these ones in GIMP before sorting through them.
Extracting the Glyphs
Now that the pages were scanned, I had to figure out how to extract the glyphs from them. This was probably the most fun part of the project, as I had never done anything like it before.
It should be noted that from this point forward I am using thresholds for "white" and "black". As such, when I say that I was looking for "white" parts of the image I mean parts of the image that are a certain percent white, not parts of the image that are completely white (#FFFFFF).
1. More Cleaning
My original sample was very grainy, so I increased the contrast of the images to remove some noise, shadows, and more strongly define lines. It also had some darkness around the edges, which I removed by removing the edges of the images until there was no longer any black in a complete diagonal line. However, both of the above steps ended up being removed as the full scan was quite a lot cleaner.
2. Separating Lines
Now that the page was clean, I had to separate the lines from each other so that I could determine which one contained the glyphs.
An animated GIF showing how page lines are separated
I classified a line of text as a horizontal line of pixels with at least one black pixel, followed by 100 lines with only white pixels. The requirement for multiple lines of white was probably unnecessary, but I wanted to be completely sure I could avoid scanning errors. As you can see from the GIF above, the entire main block is counted as a single line, so the threshold for 100 horizontal white lines is not correct. However, that broken threshold actually worked out really well, as having the main block of text count as one line instead of 40 drastically reduced the chances that a line of text would register as glyphs, so I kept it.
After separating all of the lines, I manually measured the height of the average glyph line. I used this height to determine which of the lines was the line containing the glyphs. I also had the option to feed multiple lines into the next step, as it would have properly filtered through them and found the only glyph line, but using a simple height match worked fine so I left it as is.
3. Finding Individual Glyphs
Two animated GIFs showing how glyphs are separated
Once I was able to find the line containing the glyphs, I scanned for vertical lines of pixels with at least one black pixel, followed by 10 lines with only white pixels. However, this included the glyphs and the page number. The glyphs may appear on both the left and right sides of the page, so I was unable to split by large amounts of whitespace and just stick to one edge.
An example of two pages with glyphs on either side
To get around this, I set a maximum number of completely white vertical lines in a row. If this number was reached, the current collection of glyphs would be considered. If there was under 10, I assumed that the group was actually the page number or something, discarded the list, and continued scanning. This worked very well, especially since I was able to set the threshold for glyph separation low enough that multiple characters (ie "CHAPTER") would be counted as a single "glyph", and as such would only count as one of the ten words required to register as an actual glyph.
Matching the Glyphs
Now that the glyphs are separated from the text we can begin determining which ones are the same. Initially, I tried to do this by comparing them together on a pixel-by-pixel basis. However, most glyphs that were the same came out as different sizes when scanned, and I found that the confidence threshold required to match these glyphs also returned a false positive against completely different glyphs. I wanted to consider feature matching, where I would match what sharp angles and curves glyphs have, the directions of those angles, and the directions they are relative to other points. However, I had already spent so much time trying to match them on a per-pixel basis I decided that it would be a waste of time for, especially since I was unsure if that method would work either.
... So I ended up sorting out all 1,008 glyphs by hand... :(
I quickly wrote a program that would help me do this. It would show me a glyph, and I would either click the glyph it matched, or the "N/A" button if it did not match any. If it did not match any glyphs, it would add another button for me to press with that glyph as the picture.
A screenshot of the program I was using to compare the glyphs
I found that the book contained 55 different glyphs. I assigned each glyph it's own ID from 1-55 for future reference.
I realized that a lot of the glyphs were just mirror images of each other, so I created another set of IDs where these flipped versions were considered to be the same. This resulted in 20 total glyphs, but due to the way I implemented the grouping they still contained IDs from 1-55.
Certain languages have certain characters, or sets of characters, appear more frequently than others. For example, in a 900 million word text the two letters "TH" accounted for 2.71% of all two letter pairs, aka bigrams, while "OF" only accounts for 0.71%. Using this knowledge, we can analyze the frequency of sets of 1, 2, 3, ..., glyphs in the book and compare them to known frequencies of characters in different languages.
The charts below shows the frequency of each glyph in sets of 1 (monograms) and 2 (bigrams). Note that the frequency axis is in decimal percents (ie 0.10 is 10%), and X-axis labels are glyph ID(s) shown in the "Separated Glyphs" image.
Glyph monogram frequencies
Glyph bigram frequencies
I compared these results to English, Chinese (using chinese.py and SUBTLEX-CH-CHR; monograms only because I could not find any bigram data), Turkish (monograms, bigrams, German), Russian, and Spanish. However, I did not find any meaningful results from this. I also performed those steps on the grouped version of the glyphs, but those did not match any either.
The glyphs are separated, but we are yet to deduce meaning from them. However, you will need more than that to stop the genius of Seekintoo!
Stay tuned for Part 2, where we dive deep into some alternative methods of decoding the book!
The source code for the project will be released once the series is complete.