Encoding and Extraneous Characters
Came across and interesting problem. We decode a base64 string that is received as a parameter and decode it to work out where to build a new path. The problem here is that the extra characters \xDB\x9D\x00 shouldn’t be there and cause YAML to blow up
2.2.2 :017 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A')
=> "---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n attributes:\n id: 8025\n"
2.2.2 :023 > YAML.load("---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n attributes:\n id: 8025\n")
Psych::SyntaxError: (<unknown>): control characters are not allowed at line 1 column 1
Somehow control characters are appearing in our Base64 string and we need to strip them. It just doesn’t seem as simple as
2.2.2 :025 > "---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n attributes:\n id: 8025\n".gsub("\xDB\x98\x00",'')
=> "---\n- :polymorphic_path\n- !ruby/ActiveRecord:\u0000Project\n attributes:\n id: 8025\n"
I found that if you call ord on a single string character you get the ASCII code
2.2.2 :042 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A').chars[45..47].map(&:ord)
=> [219, 157, 0]
So if I strip those ASCII characters from my string I should be good? Right?
class String
def without_extended_characters
chars.select {|s| (1..127).include?(s.ord) }.join
end
end
and then
2.2.2 :043 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A').without_extended_characters
=> "---\n- :polymorphic_path\n- !ruby/ActiveRecord:Project\n attributes:\n id: 8025\n"
That’s much better and now YAML can decode this successfully.
It is at this stage I notice…. %250A in the Base64 string and at the end of the string. This represents a carriage return, that can happen in parameters easily enough, so I am decoding a carriage return and then stripping it from the string. I should just strip the carriage returns in the first place.
2.2.2 :045 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A'.gsub('%250A',''))
=> "---\n- :polymorphic_path\n- !ruby/ActiveRecord:Project\n attributes:\n id: 8025\n"
Oh well, I learnt about String#ord all the same.