“...I've been working since 2008 with Ruby / Ruby on Rails, love a bit of Elixir / Phoenix and learning Rust. I also poke through other people's code and make PRs for OpenSource Ruby projects that sometimes make it. Currently working for InPay...”

Rob Lacey
Senior Software Engineer, UK

Encoding and Extraneous Characters

Came across and interesting problem. We decode a base64 string that is received as a parameter and decode it to work out where to build a new path. The problem here is that the extra characters \xDB\x9D\x00 shouldn’t be there and cause YAML to blow up

2.2.2 :017 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A')
 => "---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n  attributes:\n    id: 8025\n"
2.2.2 :023 > YAML.load("---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n  attributes:\n    id: 8025\n")
Psych::SyntaxError: (<unknown>): control characters are not allowed at line 1 column 1

Somehow control characters are appearing in our Base64 string and we need to strip them. It just doesn’t seem as simple as

2.2.2 :025 > "---\n- :polymorphic_path\n- !ruby/ActiveRecord:\xDB\x9D\x00Project\n  attributes:\n    id: 8025\n".gsub("\xDB\x98\x00",'')
 => "---\n- :polymorphic_path\n- !ruby/ActiveRecord:۝\u0000Project\n  attributes:\n    id: 8025\n"

I found that if you call ord on a single string character you get the ASCII code

2.2.2 :042 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A').chars[45..47].map(&:ord)
 => [219, 157, 0]

So if I strip those ASCII characters from my string I should be good? Right?

class String
  def without_extended_characters
    chars.select {|s| (1..127).include?(s.ord) }.join
  end
end

and then

2.2.2 :043 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A').without_extended_characters
 => "---\n- :polymorphic_path\n- !ruby/ActiveRecord:Project\n  attributes:\n    id: 8025\n"

That’s much better and now YAML can decode this successfully.

It is at this stage I notice…. %250A in the Base64 string and at the end of the string. This represents a carriage return, that can happen in parameters easily enough, so I am decoding a carriage return and then stripping it from the string. I should just strip the carriage returns in the first place.

2.2.2 :045 > Base64.decode64('LS0tCi0gOnBvbHltb3JwaGljX3BhdGgKLSAhcnVieS9BY3RpdmVSZWNvcmQ6%250AUHJvamVjdAogIGF0dHJpYnV0ZXM6CiAgICBpZDogODAyNQo=%250A'.gsub('%250A',''))
 => "---\n- :polymorphic_path\n- !ruby/ActiveRecord:Project\n  attributes:\n    id: 8025\n"

Oh well, I learnt about String#ord all the same.

GPK of the Day Mad MIKE