Ruby fuzzy matching
I am running through a data set which is lots of answers and trying to make the data consistent as human input is always going to be wonky from typos, spaces instead of hiphens to complete misspelling. Found this…
https://github.com/seamusabshere/fuzzy_match
2.3.3 :002 > require 'fuzzy_match'
=> true
2.3.3 :003 > fm = FuzzyMatch.new(['Uwe Rosenberg', 'X-Com'])
=>#<FuzzyMatch:0x007ff02b05f370 @read=nil, @groupings=[], @identities=[], @stop_words=[], @default_options={:must_match_grouping=>false, :must_match_at_least_one_word=>false, :gather_last_result=>false, :find_all=>false, :find_all_with_score=>false, :threshold=>nil, :find_best=>false, :find_with_score=>false}, @haystack=[w("Uwe Rosenberg"), w("X-Com")]>
2.3.3 :004 > fm.find('Use Roenberb')
=> "Uwe Rosenberg"
Perfect. Time to de-dupe some dodgy data.