200-year-old ciphers may reveal the location of treasure buried in Virginia

rwallace · on June 17, 2018

> That attitude would reign among professional cryptanalysts until January 1970, when Dr. Carl Hammer, Director of Computer Sciences at Sperry-Univac, made a startling revelation at the Third Annual Simulation Symposium in Tampa, Florida. He had analyzed the Beale ciphers with a UNIVAC 1108 computer and compared the codes to the musings of a random number generator. The results showed signs of an intelligent pattern.

> “Beale Cyphers 1 and 3 are ‘for real,’” Hammer concluded. “They are not random doodles but do contain intelligence and messages of some sort. Further attempts at decoding are indeed warranted.”

Be careful of concluding that. The human brain is a bad random number generator. A hoaxer trying to write random gibberish in those days would do it by hand, not by rolling dice to ensure uniform randomness, and the handwritten gibberish would inevitably contain patterns.

Animats · on June 17, 2018

A book code. Someone could run it against every digitized book earlier than the date of the code. With enough off-peak AWS instances...

janwillemb · on June 17, 2018

The problem would be to define what is considered a successful outcome, measurable by an automated system.

ComplexSystems · on June 17, 2018

There are, I dunno, maybe 1,000,000 words in the English language. With a book cipher, almost all books will produce gibberish words not on the list. If you find one for which almost all words are magically on that list of 1,000,000 words, you win.

I get the appeal of the notion that some second grader could stumble on the winning book and claim the prize, but this article is like halfway between cryptographic fantasy and reality. Google suggests there are only 130,000,000 books in existence, which would be more than easy enough to brute force if we had access to all of them. Furthermore, there is a clear success criterion. The limiting factor is that not all books have been digitized, but if the book is relatively well known, it should be.

I'm surprised someone at Google hasn't just put this whole thing to bed by now.

tobinfricke · on June 17, 2018

See my post below. Even using the supposedly-correct key, the decoding produces near-gibberish. It would certainly not pass a test that requires the decoded plaintext to consist of dictionary words.

tobinfricke · on June 17, 2018

Indeed, even if you decode the supposedly "cracked" Beale letter, you get nearly complete gibberish:

ihaie depos otedi nthec opntt olBed oorta boupf ourmi lesfr ombul ordsi nanep caiat ionor iault sipfe stbel owthe surla csoft hhgto undth sfotl owing artic issbe aongi ngjoi otltt othep artfe swhos lnamf sateg iieti nnumb erthr ffhtt ewith

https://nibot-lab.livejournal.com/tag/beale%20ciphers

It's almost certainly a hoax.

DoctorOetker · on June 17, 2018

Your frequency plot is very peculiar and almost unexplainable:

1) while it can be explained if the author was intelligent enough to understand letter frequency cryptanalysis, but how widespread would that idea have been in his day and age?

2) even though the symbol frequencies match up, the text seems gibberish, which for now I can only explain as follows:

The author of the pamphlet somehow obtained the 3 original ciphertexts and sincerely believed in their authenticity. (even if his source may or may not have scammed him). He identified the DOI as the correct key for ciphertext 2 (this insight too may have come from the potential source). Trivially the same key is tried on ciphertext 1. The author notices the quasi alphabet and otherwise gibberish. Unable to solve the puzzle he notes that if he cuts up decipherable ciphertext 2 in say 7 letter chunks, they clearly form random parts of english phrases. Similarily he cuts up ciphertext 1 in say 7 letter chunks (but leaves the alphabet part intact), and publishes it, in the hope that someone will find a "weird convincing but failed decoding", such that he only has to restore the ordering of the letters (since he knows how he reordered the 7 letter chunks).

I.e. the pamphlet author might sincerely believe in the authenticity of the originnal ciphertext, but is trying to scam his audience into solving the puzzle for him!

mickronome · on June 17, 2018

It's almost certainly a hoax, but any book substitution cipher is very sensitive to errors both in counting, and in exactly which book. From that perspective, if you remove the spaces, it's almost intelligeble even to a non native speaker. Observe:

  ihaiedeposotedinthecopnttolBedoortaboupfourmilesfrombulordsinanepcaiationoriaultsipfestbelowthesurlacsofthhgtoundthsfotlowingarticissbeaongingjoiotlttothepartfeswhoslnamfsategiietinnumberthr ffhttewith

i haie deposoted in the copntt ol Bedoort aboup four miles from bulords in an epcaiation or iault sip fest below the surlacs of thh gtound ths fotlowing articiss beaonging joiotltto the partfes whosl namfs ate giiet in number thrffhtte with

I have deposited in the county of Bedoort (Bedford?) about for miles from bulords in an epcaiation or vault/fault six feet below the surface of the ground.

Supposedly epcaiation is excavation, that one I didn't get. Some of the latter parts I also have trouble reading without guessing a lot. Like 'joiotltto' and 'thrffhtte'.

killaken2000 · on June 17, 2018

Also in the past there was no standardized way of writing words so its possible that some spellings are off by today's standards.

I remember when the spellings of words became standardized.

paulie_a · on June 17, 2018

> Supposedly epcaiation is excavation

I buy it, spelling in that area at that time wasn't exactly top knotch. I know that is stereotypical, but come on.

DoctorOetker · on June 17, 2018

I also believe it was a hoax, but this is not a good argument for it being a hoax:

The perpetrator would expect buyers of the pamphlet to try reproduce the decoding of the second ciphertext, so presumably there were versions of the declaration that correctly formed the key.

For the edits (compared to original declaration) see also: https://en.wikipedia.org/wiki/Beale_ciphers#Deciphered_messa...

I would like to see a python (or other) script with the most important observations reproduced (such as the alphabet like sequence)

tobinfricke · on June 17, 2018

> The perpetrator would expect buyers of the pamphlet to try reproduce the decoding of the second ciphertext, so presumably there were versions of the declaration that correctly formed the key.

The version of the DoI that was used to decode ciphertext #2 was included with the pamphlet. It does not produce an error-free decoding.

> I would like to see a python (or other) script with the most important observations reproduced (such as the alphabet like sequence)

The link in the post to which you are replying contains, at the bottom, all three ciphertexts, the key text, and a C++ program that performs the decoding.

DoctorOetker · on June 18, 2018

The alphabetic sequences in the first "cipher" may simply be key material: you can speed up your work by constructing a look-up table that maps each letter of the alphabet to a list of suitable numbers for words starting with that letter.

While encoding is a work in progress this is a perfectly good tool to use, but it is dangerous to explicitly write the letter i.e. "A: 107, 204, ... B: 48, 87, ..." (I made up the numbers) because if the lookup table is found with the ciphertext, the attacker no longer needs to find the correct book (DoI)!

An alternative to actually writing down the letters on the same piece of paper is to have a second strip of paper, to be held above/below the numbers, in an aligned fashion, so that one paper has "A [whitespaces] B [whitespaces] C ..." and the second has some numbers corresponding to the letters. (if you look at things like "indenture" the concept of aligning paper was a commonly used trick for verification etc purpouses)

The grass is always greener on the other side:

1) whenever the encoder was working without lookup table, he labouriously had to traverse the DoI lookinng for a suitable word, wishing he had a lookup table

2) whenever the encoder retried constructing a good lookup table, he (erroneously?) felt he was wasting his time constructing a lookup table instead of encoding the text

this could explain the restarting runs of the alphabet (with letters recurring i.e. aaaabbbccc..., and also explains why the number distribution is flatter compared to N and E as in "Where are the N and E characters" at http://rogergrambihler.tripod.com/BealeHoax.htm )

DoctorOetker · on June 18, 2018

You probably know of this page, but I think this has the clearest explanation for decoding errors:

http://rogergrambihler.tripod.com/BealeHoax.htm

although he doesn't notice his own mismatcc of "hith" instead of "with" (I haven't checked/reproduced his mapping in code, I intend to implement his decoding with quirks by adapting your .c file, if you wish I can send it over when done)

DoctorOetker · on June 18, 2018

Regarding the similar distribution you pointed out in the plot, I was considering that perhaps the cipher numbers were ordered in a gridlike fashion on the original, such that perhaps vertical or diagonal reading, or more exact positioning would reveal extra information.

So I was browsing the NSA pdf files "The Beale Papers" from archive.org, and apperently according to these files theres even different versions of the CIPHERTEXTS!

It is unclear to me why there would be different ciphertexts floating around, possibly:

1) the pamphlet author published multiple versions 2) after analysis, later authors "fixed" the ciphertexts as opposed to describing the encryption errors in the decoding mechanism 3) flat out disinformation by: the publisher (how does the anonymous author defend the true ciphertext? would other publishers at the time dare to publish a second version of the cipher? how does this author prove he is the same anonymous person?), disinformation by treasure hunters (you can recognnize your own manipulation, but this confuses everyone else)

Do we know what happened with the publisher? did it merge with others, and is it still in existence under a new name? Do they perhaps have any early original (from the box, or original manuscript from the anonymous person, etc...)

DoctorOetker · on June 18, 2018

I fixed your decryptor, the code is here, feel free to update no credit needed:

```cpp #include <iostream> #include <string> #include <vector> #include <fstream>

using namespace std;

char quirkdecode(int i, vector<string> dict);

int main(int argc, char argv) {

  // Load the dictionary  
  vector<string> dict;


  ifstream dictstream("doi.txt");
  string str;
  while (dictstream >> str) 
    dict.push_back(str);

  int i, j=0;
  while (cin >> i) {
    j ++;

    cout << quirkdecode(i,dict);
    if (j%5 == 0) cout << " ";
    if (j%75 == 0) cout << endl;
    //*/
  }

  cout << endl;
  return 0;

}

char quirkdecode(int i, vector<string> dict){ int delta=0;

  if (i==155) return 'a';// Beale Paper DOI has extra 'a' in "institute a new government"
  if (i>155)  delta-=1;  // Compensate for missing 'a' in doi.txt

  if (i>=242) delta+=1;  // 241 decrypts correct 246 not so delta  +1 @ [241-245] 
  if (i>=480) delta+=10; // 466 decrypts correct 485 not so delta +10 @ [267-285] known misnumbering
  if (i>=510) delta-=1;  // TODO presicely ocate
  if (i>=621) delta+=1;  // 620 decrypts correct 643 is not so delta  +1 @ [621-643]
  if (i>=667) delta+=1;  // 666 decrypts correct 807 is not so delta  +1 @ [667-807] ? double check ??

  // Hardcoded Jokers: NOTE 811 and 1005 are also the highest used codes, so it DOES seem to be intended as real
  if (i==811) return 'y' ;  // fundamentallY
  if (i==1005) return 'x' ; // seXes

  i+=delta;
  if (i<dict.size())
      return dict.at(i-1)[0];
  else
      return '!';

} ```

which gives:

``` ihave depos itedi nthec ounty ofBed forda boutf ourmi lesfr ombuf ordsi nanex cavat ionor vault sixfe etbel owthe surfa ceoft hegro undth efoll owing artic lesbe longi ngjoi ntlyt othep artie swhos ename sareg iveni nnumb erthr eeher ewith thefi rstde posit consi stcdo ftenh undre dandf ourte enpou ndsof golda ndthi rtyei ghthu ndred andtw elvep ounds ofsil verde posit ednov eight eenni netee nthes econd Wasma dedec eight eentw entyo neand consi stedo fnine teenh undre dands evenp ounds ofgol dandt welve hundr edand eight yeigh tofsi lvera lsoje welso btain edins tloui sinex chang etosa vetra nspor tatio nandv alued atthi rteen rhous anddo llars theab oveis secur elypa ckedi niron potsw ithir oncov ersth evaul tisro ughly lined withs tonea ndthe vesse lsres tonso lidst onean darec overe dwith other spape rnumb erone descr ibest hcexa ctloc ality ofthe varlt sotha tnodi fficu ltywi llbeh adinf indin git ```

analogmemory · on June 17, 2018

Wouldn't a successful outcome just be recognizable sentences?

DoctorOetker · on June 18, 2018

Wow!

What is the probability that a trivial variation on the decoding method results in otherwise unreadable text BUT STARTING with "sited at" ??? roughly 26^6?!?

1) I used/fixed Tobin Fricke's (@tobinfricke) doi-decode.c to first fix the indexing/versions of the Declaration of Independence. The result can be found in the thread: https://news.ycombinator.com/item?id=17337421

2) Due to the quasi alphabetic sequences in beale.1 I came to the conclusion that beale.1 is in fact just scratchpad notes containing fragments of look-up table to use while encrypting, so is not the location file. Bummed at the lack of a location file, I just pressed on out of curiosity for the names file (beale.3)

3) I next tried some trivial variations on the cipher concept (second letter of each word, last letter etc) such that the kolmogorov complexity increase of the decoding would be nearly zero, but always got rubbish.

4) Until I tried a trivial variation (not giving yet, perhaps after convincing myself it is absurd coincidence), giving "sitedat...." with "..." being rubbish text.

EDIT1: the same decoding also contains "twograil" somewhere in the middle, but not as convincing, does "two grail" signifiy something in the area?

hagreet · on June 17, 2018

"As long as a key is available, a substitution cipher is a safe, simple way to encrypt a message." ...quality article

yorwba · on June 17, 2018

I can't tell whether you're being sarcastic, but the article is essentially correct. The security of an encryption algorithm doesn't depend on how complex it is if you never reuse the key, because a uniformly random key produces uniformly random output for any input. Only key reuse can introduce statistical regularities that allow cryptanalysis to be applied. The reason most encryption algorithms are more complex than simple substitution is exactly that they are intended to allow applying a relatively short key multiple times, both to encrypt messages longer than the key and to encrypt multiple messages.

ShorsHammer · on June 17, 2018

Also known as a One Time Pad for anyone wanting more info, generally the key and message are xor'd if its digital and that's the entirety of the encryption algorithm.

Used properly it's proven to be unbreakable.

empath75 · on June 17, 2018

It’s breakable if you use some publically published document as the key as in cipher 2

olliej · on June 17, 2018

The algorithm isn’t broken, the definition of a good encryption scheme is that it can’t be decrypted unless you have the key, in this case the algorithm is to substitute each letter with a lookup into a document. The non-public key is the name of the document not the document itself.

Ostensibly the author decrypted the the second document by doing a brute force search of the key space (that is the set of documents available at the time). This is functionally the same as aes - just a much smaller key space.

shawnz · on June 17, 2018

> a uniformly random key produces uniformly random output for any input.

This is clearly not true for a simple substitution cipher though, otherwise it couldn't be attacked with frequency analysis

olliej · on June 17, 2018

A one time pad is a specific case of substitution cipher (it’s a generalization of vignere) where the key is the length of the document. It is probably secure - as in it is actually impossible to break.

The reason one time pads are not used in general is that you need a “perfect” rng, and you have to be able to get the random values to the recipient. Those old “person traveling with brief case of secrets” trope was a real thing. Key distribution is the problem solved by public key cryptography. But you can’t use one time pads with public key crypto, because the weakness is then breaking public keys (which is probably possible).

Stream ciphers loosely acted like a one time pad in that you generate a “random” stream and xor with the message. But it doesn’t reach the actual requirement of security for a one time pad because the key is the RNG seed, which means you can brute force the seed key space and only the correct key will produce a completely sensible decrypted output.

A true one time pad means that a brute force search of the key space for a message of length N will find every valid message of length N.

Eg an 11 letter message would produce (among others) “hello world” and “hello earth” as well as “die planet!”.

yorwba · on June 17, 2018

It is true for any cipher, but remember that you are not allowed to reuse the key. If you are just scrambling the alphabet, you can never encrypt more than a single character without key reuse.

mhluongo · on June 17, 2018

Two different techniques here. One-time pad is a strong random cipher, versus a typical "lookup" substitution cipher which is garbage.

raverbashing · on June 17, 2018

There's a world of difference between a substitution cypher that maps the same characters to the same code points and one that doesn't.

The former is trivially crackable and the latter is a "one time pad" hard (which is how the texts were created)

AppleseedJenny · on June 17, 2018

Yeah. They should just use ROT13 as substitution. Then key availability is not an issue anymore.

onion2k · on June 17, 2018

For extra security it's important to run ROT13 twice.

olliej · on June 17, 2018

AppleseedJenny · on June 17, 2018

You get it.

flashman · on June 18, 2018

The first cipher gives directions in yards, in this order: east, south, west, north. When followed in order, this gives a location just north of an old dolomite quarry, 3.95 miles from Buford's Tavern. It's not far off the Appalachian Trail, but not so close that you would stumble on it accidentally.

wet_grass_sound · on June 17, 2018

What is point of finding it if the govt is going to claim it?

nighthawk1 · on June 17, 2018

After black swan, it seems like big treasure hunting is close to hopeless https://en.m.wikipedia.org/wiki/Black_Swan_Project

sethrin · on June 17, 2018

> "The ineffable truth of this case is that the Mercedes is a naval vessel of Spain and that the wreck of this naval vessel, the vessel's cargo, and any human remains are the natural and legal patrimony of Spain."

Legal opinion seems to be fairly solid on this point. Perhaps the lesson instead should be to not pillage national warships.

macintux · on June 17, 2018

Virginia has a finders keepers law per the article.

killaken2000 · on June 17, 2018

I hear that money has a connection to crime. Time for a civil asset forfeiture.

behringer · on June 17, 2018

Don't store your treasure in the US...