“I have developed much too thick a skin to be surprised, let alone upset, by people telling me how that great CIA tells us the truth. If an organisation accustomed to lying tells you it is telling the truth, where does that leave you?”
Dr Jim Swire — Father of Flora (1965 – December 21 1988, Lockerbie)
Letter to INTEL TODAY
August 20 2017 — The story of the — partial — KRYPTOS decoding by CIA David Stein is very beautiful and inspiring. But is it a true story or a fairy tale? It is my opinion that this narrative can simply not be trusted for multiple reasons. Allow me to explain why. The reader will draw his/her own conclusions. PS: This post will soon be updated to explain carefully several technical points. Follow us on Twitter: @INTEL_TODAY
RELATED POST: The KRYPTOS Sculpture — An Introduction
RELATED POST: The KRYPTOS Code — How to Break a Vigenère Code
RELATED POST: The KRYPTOS Code — The Solution of Section II
RELATED POST : The KRYPTOS Sculpture — History of the NSA Involvement
The cipher text on the left-hand side of the sculpture (as seen from the courtyard) of the main sculpture contains 869 characters in total (865 letters and 4 question marks).
The right-hand side of the sculpture comprises a keyed Vigenère encryption tableau, consisting of 867 letters.
In our last posts about KRYPTOS, we learned how to break a Vigenère code and we apply this knowledge to the entire section II. In a recent post, we look at the inside story of the NSA people who took the challenge to decrypt — part of — the KRYPTOS code.
In this post, we look a bit “deeper” into the story of “Section I” of KRYPTOS.
Between subtle shading and the absence of light lies the nuance of iqlusion.
The Story… According to David Stein
So, you remember that CIA Stein starts his research with an amazingly simple but true observation. Section II of KRYPTOS shows a repeating of three letters spaced by an 8 letters interval.
Yet, He never points out that “Section I” shows a repeating of FIVE letters spaced by an 10 letters interval.
AND, this is not an accident. A detailed mathematical analysis shows that the interval is indeed 10. [The detailed calculations will be provided in various updates of this post. Coming soon! Probably next week.]
Worse. Much worse. As CIA Stein wrote the conclusion of his work, he actually mixed up the length of the passphrase for sections I and II. Could a person who spent several hundred hours on this code make such an error?
How Stein solved Section I, we do not know. He simply says that he applied the same technique as he did for Section II.
The problem is that this technique is obviously mistaken. It applies to a simple Vigenère code but NOT to a KEYED Vigenère Table. In fact, the whole point of KEYING the Vigenère alphabets is precisely to avoid this kind of direct attacks against the ciphertext! [Again, this will need some explaining…]
So, this story raises many questions. One of these is this: Could a person really have solved “Section I” without a computer? I have my doubts. But, if one could have done so, this post is about my best guess as how he/she may have done it!
Comment: I just learned that Ed Scheidt never said that the code could be broken with “pencil and paper”. Scheidt only said that it was decided that the coding would be done with “pencil and paper”. One does not necessarily implies the other. [Again, much to explain about this…]
First, the clue
It is hard — really hard — NOT to notice that FIVE letters repeat over an interval of 10.
Yet, that seems to be something that Stein overlooked? [And none of the NSA papers mention this obvious fact either…] That is very hard to believe, considering the five hundred or so hours CIA STEIN claims to have spent on the code….
What does it tell us? Assuming a Vigenère coding suggested by the two KRYPTOS copper plates on the right side, we can reasonably conclude that the passphrase is going to be 5 or 10 letters long. But we need to work it out. And yes, it is work. A lot of work. (For a human being!)
Then, the Math
The text is very short. We only have 63 letters to work with. The IC [index of coincidence] is about 0.038. If our hunch about the interval is correct, we expect that once the text is reorganized in 10 columns, we will get an average IC equal to about 0.067.
For reasons explained before, we also expect such an IC for an interval 20 (and other multiple of 10) as well as satellite ‘peaks’ at intervals 5, 15, and so on. All other intervals should have a random IC equal to about 0.038.
I have not yet done the calculation, except for L = 10: IC(L-10) = 0.091. [We will show you the detailed result of these calculations soon.] This rather unusual large value is easily explained by the presence in the third column of 4 identical letters in just seven! (IC = 0.29!)
Calculations for L = 1 to 10
0.0423 0.0366 0.0394
0.0238 0.0381 0.0476 0.0365
0.0333 0.0333 0.0333 0.0381 0.0345
0.0641 0.0513 0.1282 0.1212 0.0303 0.0790
0.0364 0.0545 0.0364 0.0222 0.0222 0.0222 0.0323
0.0556 0.0556 0.0556 0.0000 0.0000 0.0556 0.0278 0.0357
0.0357 0.0000 0.0000 0.0000 0.0357 0.0000 0.0000 0.0000 0.0089
0.0000 0.0952 0.0000 0.0476 0.0476 0.0476 0.0000 0.0000 0.0476 0.0317
0.0476 0.0476 0.2857 0.1333 0.1333 0.0667 0.0000 0.0000 0.2000 0.0000 0.0914
For each raw, the last number is the average of all previous columns. The averages for L = 11 to 15 are: 0.0273 0.0250 0.0744 0.0262 0.0622.
The detailed results of the index of coincidence confirm that the length of the passphrase is indeed 10. A rather unusually high value is also obtained for L = 13, as reported first by CIA David Stein. Although it may simply be a statistical fluke, that result is a bit odd.
Important Comment Regarding CIA Stein’s Paper
Here is a comment from Stein about Section I of KRYPTOS:
“Although the graph was cruder than for Part II because it was obtained from a fewer number of letters, (Note that some of the I.C.s seem to exceed the theoretical limit of 0.063), it was still obvious that a key length either 5 or 10 was being used.
(These multiple high I.C. values were the result of the way the I.C.s were calculated. Because I. C. values are calculated by counting letter repetitions within a column, columns split up into multiples of the key length (5, 10, 15, and so forth) will show up as additional peaks at these values.)
I tried both, but it was to that proved to crack this part. I performed the same analysis as just shown previously for Parts II and III, and the final decryption for the entire top half of the code is presented here.”
Remember what the IC is!
The Index of Coincidence [IC] measures the probability that any two randomly chosen source-language letters are the same. This probability — also known as the index — is about 0.067 for monocase English while the probability of a coincidence for a uniform random selection from the alphabet is 1/26 = 0.0385.
where c is the size of the alphabet (26 for English), N is the length of the text, and through are the observed ciphertext letter frequencies, as integers. [Tutorial]
IC is a Probability! Not a limit.
When Ed Stein writes: “some of the I.C.s seem to exceed the theoretical limit of 0.063”, he is badly mistaken.
First, the IC is a probability, and therefore — according to the Kolmogorov axioms — it ranges — by definition — from ZERO to ONE.
[NOTE: Keep this one in mind for the next time you read that “the CIA has concluded with HIGH PROBABILITY…” It is simply frightening that a CIA analyst in the Directorate of Intelligence does not understand such fundamental concept.]
Secondly, the IC for English is NOT 0.063 but about 0.067. Remember, Stein claims to know the frequency of each letter in English by heart. This story is getting weirder and weirder….
IT IS MATH, Not voodoo.
Let us do just some examples together.
The last column reads F L I B E D. So the IC is obviously ZERO as the probability that two randomly chosen letters are identical is nil since these letters are all different!
Next, consider the case of column 9: R Z F Q R R.
If you pick the first letter (R), the probability to find another R is 2/5.
If you pick the second letter (Z), the probability to find another Z is 0.
If you pick the third letter (F), the probability to find another F is 0.
If you pick the fourth letter (Q), the probability to find another Q is 0.
If you pick the fifth letter (R), the probability to find another R is 2/5.
If you pick the sixth letter (R), the probability to find another R is 2/5.
And thus, the average is 3 times 2/5 divided by six. That is one fifth — 0.2 — which is way above average but the result of 3 identical letters in just 6 symbols.
Finally, if you had a hypothetical sequence such as “AAAA”, the IC would obviously be ONE since any two letters randomly chosen in that sequence are necessarily identical!
Back to The Facts
If a human being can break the first section of this KRYPTOS code, the obvious thing to notice — once you have detected the 10 letters pattern — is that the third column has FOUR identical letters.
And now, the linguistics
There is more than one way to skin a cat. Of course, at this point, you may want to try a plaintext attack against a trigram as “VJY” repeats… You could succeed, but it would require a very serious amount of work. (Again, I will explain later why this would be a painful way to go for a human being…)
Thus, my guess is that the best chance for a real/normal human being to solve this one is to notice that the letter Y appears 4 times in the third column.
So, according to English letter frequency, it is likely to be a E, T, A, O … Should we start we a E? This would imply that the 3rd letter of the “10 letters long passphrase/word” is a L.
[COMMENT — As I will explain in a following post, the probability is very high that this letter “Y” in the cipher is a “E” in the plaintext because E is the most likely letter in English. In this post, we will simply assume that it is the case.]
And now what?
Now, it really makes sense to test a trigram. The most frequent trigrams are “the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men,…”
You will notice that there are not so many that end with a “E”. According to an internet resource: THE, NDE, and NCE are the most likely.
[NOTE: This ranking of the English trigrams is not entirely clear to me. Some resources rank NCE above NDE and others even rank ERE above both of these. Wikipedia suggests the list I mention in this post. But obviously, some communities (Python, Gutenberg Project) do not agree. This is a small point but important one which I will discuss in an update.]
For now, just understand that the frequency of a letter or a N-sequence of letters such a trigram will of course depend on the kind of texts you are sampling. X does not show up too much in an english paper unless you read a paper on X-rays!
THE, NDE, NCE …
With really little work, you will give up on THE and NDE. But NCE looks pretty good! (Again, details will be provided later in an update.)
This would imply that the first three letters of the “10 letters long passphrase/word” are “PAL”.
At this point the game is actually over. You will finish it the way you like. For instance, you know that the letter before NCE is very likely to be either A or E as in CHANCE or EVIDENCE.
Only one letter in a KRYPTOS keyed Vigenère table satisfies this request: T.
So — assuming that the passphrase is an english word — we know that it is 10 letters long and that it begins with PAL and ends with T.
10 Letter words that start with PAL
We searched a large scrabble dictionary for scrabble words starting with PAL. Well, there are very few of these and only ONE ends with a T!
Palaestrae Palaestras Palankeens Palanquins Palatalize Palatially Palatinate
Palaverers Palavering Palenesses Paleoliths Palimonies Palimpsest Palindrome
Palisading Palladiums Pallbearer Palletised Palletises Palletized Palletizer
Palletizes Palliasses Palliating Palliation Palliative Palliators Pallidness
Palmations Palmerworm Palmettoes Palmisters Palmitates Paloverdes
Palpations Palpitated Palpitates Palsgraves Paltriness Palynology
This could actually be useful to solve the last section and crack the riddle…
“In textual studies, a palimpsest is a manuscript page, either from a scroll or a book, from which the text has been scraped or washed off so that the page can be reused for another document
In colloquial usage, the term palimpsest is also used in architecture, archaeology, and geomorphology to denote an object made or worked upon for one purpose and later reused for another.”
What do we conclude about the CIA document from Ed Stein?
Here are the conclusions of Stein’s CIA paper.
“Even though it was not necessary to determine the keywords used in the substitution code in order to read Parts I, II and III, I was curious to find out what they were.
It was easy enough at this point: find the plaintext letters in the top row of the Vigenère Tableau in Figure 1, follow those columns down until the ciphertext letter is found, and then read off the first letters of those rows for the keyword letters.
(Because the Kryptos cipher actually uses a modified version of the Vigenère code, the alphabets down the left-hand side and along the top of Figure 1 first have to be removed for this to work properly.)
It turned out that the eight-letter keyword for the first two lines of Part 1 was “ABSCISSA,” and the 10-letter keyword for Parts II and III was “PALIMPSEST.”
Once again, Stein is wrong, dead wrong. He gets the logic backward. “PALIMPSEST” is the 10-letter word/passphrase for the first part of KRYPTOS while “ABSCISSA” is the 8-letter one for the second part of the code!
Again, how could a man who claims to have spent five hundred hours on this code could get this conclusion so completely wrong?
At the very least, it shows clearly one of the many dangers of secrecy. Stein’s paper would never have been approved for a publication in a peer-reviewed magazine. That much is obvious. But I fear that there is more to that story… Stay tuned!
Kryptos: The CIA’s Unsolved Secret Code
Kryptos remains one of the most famous unsolved codes in the world today.
Since the encrypted sculpture was placed on display by American artist Jim Sanborn on the grounds of the Central Intelligence Agency (CIA) in Langley, Virginia, in 1989, there has been much speculation about the meaning of the encrypted messages it bears.
Of the four messages, three have been solved, with the fourth remaining a mystery. Over the years, hints and slight cracks have appeared in the armour of this puzzle, however its continuity of being one of the greatest enigmas of all time continues to provide a diversion for cryptanalysts, both amateur and professional, who are attempting to decrypt the final section.
Even after solving the final section, the final riddle of this enigma within an enigma must be worked out, which from the solutions so far seem to be connected with (1) Illusion in Darkness, (2) Using the Earth’s magnetic field being transmitted to something buried underground and (3) Ancient Egyptian tombs, with the clue (4) still waiting to be solved after more than 2 decades.
Kryptos — Wikipedia
Stein, David D. (1999). “The Puzzle at CIA Headquarters: Cracking the Courtyard Crypto” (pdf). Studies in Intelligence. 43 (1).
Vigenère cipher — Wikipedia
The KRYPTOS Sculpture — SECTION I : A Vigenère Cipher — UPDATE
One Year Ago –The KRYPTOS Sculpture — SECTION I : A KEYED Vigenère Cipher [And why the CIA lies so much about it?]