Matasano Crypto Challenges: Set 1

I’ve played around with the Matasano Crypto Challenges several times in the last few years. I finally decided to just sit down and go through them all - partly to prove to myself that I could actually understand crypto problems, if only I tried. It also seemed like a great opportunity to brush up on my Ruby, as a long-time Python advocate.

Convert hex to base64

This challenge should have been simple, but since I don’t know much Ruby, I ended up spending a lot of time muddling through Ruby’s confusing pack and unpack syntax. While documented fairly thoroughly here, I’ve yet to find a reason why pack('H*') converts a hex string to byte string, while pack('m') converts a byte string to base64. To me, it seems like one of these should be unpack, but then, I’m coming from Python where this would be an decode/encode pair.

I ended up writing short utility functions so that I didn’t have to think about which way I wanted to go anymore. I’ve used m0 for encoding base64 values, so that they all appear on one line, but m for decoding, as that’s the format all challenges in this set are given in.

def hex_to_bytes(arg); [arg].pack('H*'); end
def bytes_to_hex(arg); arg.unpack('H*').first; end

def bytes_to_base64(arg); [arg].pack('m0'); end
def base64_to_bytes(arg); arg.unpack('m').first; end

def hex_to_base64(arg)
  bytes_to_base64(hex_to_bytes(arg))
end

Fixed XOR

I originally solved this challenge by converting the hex strings to integers using to_i(16), however this led to problems later - I needed to convert byte strings to hex before xoring them, and I also ended up hackily padding the number back to its original length with zeros when the first bytes xored to null. I think xoring char by char is probably a more natural way to think about the problem.

def xor_byte_strings(xor1, xor2)
  xored = xor1.chars.zip(xor2.chars).map do |c1, c2|
    (c1.ord ^ c2.ord).chr
  end

  xored.join('')
end

def xor_hex_strings(xor1, xor2)
  return bytes_to_hex(xor_byte_strings(hex_to_bytes(xor1), hex_to_bytes(xor2)))
end

Single-byte XOR cipher

The pertinent part of this challenge is figuring out how to score a string. All samples up to this point were in ASCII, which only has 7 bits of valid characters, so I rejected strings containing bytes with the first bit set. I also rejected strings with bytes lower than \x10, since those are ASCII control characters. Finally, we’ll define a set of good characters, and score the string on the percentage of those it has: alphanumerics and whitespace being acceptable.

def get_string_score(str)
  return -1 if str.chars.any? {|c| c.ord >= 0x80 || c.ord < 0x0a}
  str.gsub(/[^a-zA-Z0-9 ]/, '').length.to_f / str.length.to_f
end

def find_single_byte_xor(str)
  results = (0x00..0xff).map do |c|
    decoded = xor_byte_strings(str, c.chr * str.length)
    [c, decoded, get_string_score(decoded)]
  end

  results.sort_by {|c, d, s| s}.reverse[0]
end

def find_single_byte_xor_hex(hexstr)
  find_single_byte_xor(hex_to_bytes(hexstr))
end

Detect single-character XOR

This challenge is fairly simple - find the best single-xor result for each string, then find the best result from among those results.

def detect_single_byte_xor(filename)
  results = File.open(filename).each_line.map do |line|
    find_single_byte_xor_hex(line.strip)
  end

  results.sort_by{|c, d, s| s}.reverse[0]
end

Implement repeating-key XOR

Again, fairly trivial - the hardest part of this challenge is figuring out how to pad the key to the right length. I’m not completely satisfied with my solution in the end, but it does work.

def encode_repeating_key_xor(msg, key)
  repeated_key = (key * ((msg.length / key.length) + 1))[0..msg.length - 1]
  xor_byte_strings(msg, repeated_key)
end

Break repeating-key XOR

The warning on this page is correct - this challenge was definitely the most frustrating of this set. Writing the edit distance function was not particularly difficult, though I did debate the efficiency of converting the integer to a binary string. I originally wrote this using an extra nested loop, that would bit shift the xored result, but decided that converting to a binary string was clearer.

def get_edit_distance(a, b)
  (a.chars.zip(b.chars).map {|c, d| (c.ord ^ d.ord).to_s(2).count('1')}).inject(:+)
end

Figuring out the best key size was slightly fiddly, but mostly just because I wasn’t familiar with Ruby’s slicing syntax. I’m finding the .inject(:+) trick very frustrating - I wish Ruby had a .sum method.

def get_block(str, block_size, block_num)
  start_index = block_size * block_num
  end_index = (block_size * (block_num + 1)) - 1
  str[start_index..end_index]
end

def get_best_key_sizes(str, max_key_len=40, num_sample_blocks=10)
  results = (2..max_key_len).map do |key_len|
    # Get the edit distance between the first N block pairs
    distance = ((num_sample_blocks - 1).times.map do |i|
      get_edit_distance(get_block(str, key_len, i), get_block(str, key_len, i + 1))
    end).inject(:+)

    [key_len, (distance.to_f / key_len.to_f) / num_sample_blocks]
  end

  results.sort_by{|k, d| d}.map{|k, d| k}
end

The final step of determining which of the top N key sizes was valid was also a bit annoying - I forgot to discount any blocks with invalid characters the first time, which lead to some weird edge cases. There was also a bit of mucking around with Ruby arrays required to get the .transpose working correctly.

def get_blocks(str, block_len)
  blocks = Hash.new('')
  str.chars.each_with_index {|c, i| blocks[i % block_len] += c}
  block_len.times.map {|i| blocks[i]}
end

def rpad_arr(arr, len, pad_with)
  arr.fill(pad_with, arr.length...len)
end

def flatten_blocks(blocks)
  (blocks.map {|b| rpad_arr(b.chars, blocks[0].length, '')}).transpose.join
end

def break_repeating_key_xor(contents)
  get_best_key_sizes(contents).each do |key_len|
    blocks = get_blocks(contents, key_len)

    results = blocks.map{|b| find_single_byte_xor(b)}
    # If any blocks decrypt invalidly, skip!
    next if results.any? {|_, _, score| score == -1}
    return flatten_blocks(results.map{|_, d, _| d})
  end
end

def break_repeating_key_xor_file(filename)
  break_repeating_key_xor(base64_to_bytes(File.open(filename).read))
end

AES in ECB mode

This challenge was like the first - it should have been easy, but Ruby made it hard. In this case, because the Ruby OpenSSL implentation of AES-ECB automatically adds padding to ciphertexts. This would be fine, were it well-documented, but I ended up scouring the Internet, confused by why Python was giving me a different ciphertext in what seemed to be the same conditions.

def decrypt_aes_ecb(to_decrypt, key)
  decipher = OpenSSL::Cipher.new('AES-128-ECB')
  decipher.decrypt
  decipher.key = key
  decipher.padding = 0
  decipher.update(to_decrypt) + decipher.final
end

def decrypt_aes_ecb_file(filename, key)
  decrypt_aes_ecb(read_file_contents(filename), key)
end

Detect AES in ECB mode

This challenge didn’t feel like it really had an end - unlike the others, it was very difficult to tell if I’d successfully solved it. I picked a ciphertext, but as far as I know, there’s no way for me to tell if it’s the right one.

My solution here was to check for ciphertexts with the same blocks repeated, as this seemed to be what was hinted at. There was also only a single ciphertext with this property.

def get_chunks(str, chunk_size)
  (0..str.length - 1).step(chunk_size).map do |i|
    str[i..(i + chunk_size - 1)]
  end
end

def detect_aes_ecb(ciphertexts)
  results = [ciphertexts[132]].map do |c|
    chunks = get_chunks(c, 16)

    # For each combination of chunks, check if they are identical
    num_identical_chunks = (chunks.each_with_index.map do |c1, i|
      (chunks.each_with_index.map do |c2, j|
        next if j <= i
        c1 == c2
      end).count(true)
    end).inject(:+)

    [c, num_identical_chunks]
  end

  results.sort_by{|l, s| s}.map{|l, _| l}.reverse[0]
end

def detect_aes_ecb_file(filename)
  bytes_to_hex(detect_aes_ecb(File.readlines(filename).map {|l| hex_to_bytes(l.strip)}))
end

Putting it all together

This set was primarily frustrating - it’s the least fun set in my opinion, and can be pretty boring if you already know some basic crypto. I felt like I was fighting with language details, rather than interesting problems. However, I was writing in an unfamiliar language, and things definitely get better from here on in.

My solutions for this challenge are up on Github here, along with test cases, as best as I could determine a correct solution.