Encoding and Extraneous Characters

Came across and interesting problem. We decode a base64 string that is received as a parameter and decode it to work out where to build a new path. The problem here is that the extra characters \xDB\x9D\x00 shouldn’t be there and cause YAML to blow up

2.2.2 :017 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A')
 => "---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n  attributes:\n    id: 8025\n"
2.2.2 :023 > YAML.load("---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n  attributes:\n    id: 8025\n")
Psych::SyntaxError: (<unknown>): control characters are not allowed at line 1 column 1

Somehow control characters are appearing in our Base64 string and we need to strip them. It just doesn’t seem as simple as

2.2.2 :025 > "---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n  attributes:\n    id: 8025\n".gsub("\xDB\x98\x00",'')
 => "---\n- :polymorphic_path\n- !ruby/ActiveRecord:۝\u0000Project\n  attributes:\n    id: 8025\n" 

I found that if you call ord on a single string character you get the ASCII code

2.2.2 :042 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A').chars[45..47].map(&:ord)
 => [219, 157, 0] 

So if I strip those ASCII characters from my string I should be good? Right?

class String
  def without_extended_characters
    chars.select {|s| (1..127).include?(s.ord) }.join

and then

2.2.2 :043 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A').without_extended_characters
 => "---\n- :polymorphic_path\n- !ruby/ActiveRecord:Project\n  attributes:\n    id: 8025\n" 

That’s much better and now YAML can decode this successfully.

It is at this stage I notice…. %250A in the Base64 string and at the end of the string. This represents a carriage return, that can happen in parameters easily enough, so I am decoding a carriage return and then stripping it from the string. I should just strip the carriage returns in the first place.

2.2.2 :045 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A'.gsub('%250A',''))
 => "---\n- :polymorphic_path\n- !ruby/ActiveRecord:Project\n  attributes:\n    id: 8025\n" 

Oh well, I learnt about String#ord all the same.