I have a plain text table like this. I need to group the result line so that the data is together in their respective columns.
I can split the string (one line) on a space and then I will get an array like:
["2", "1/47", "M4044", "25:03*", "856", "12:22", "12:41", "17.52", "Some", "Name", "Yo", "Prairie", "Inn", "Harriers", "Runni", "25:03"]
I can also split on two spaces, which gets me close, but still inconsistent, as you see with the name:
["2", " 1/47", "M4044", " 25:03*", "856", " 12:22", " 12:41", "17.52 Some Name Yo", "", "", "", "", "", "", "Prairie Inn Harriers Runni", " 25:03 "]
I can specify which indexes to join on, but I need to grab possibly thousands of files just like this, and the columns are not always going to be in the same order.
The one constant is that the column data is never longer than the divider between column name and data (the ====
). I tried to use this to my advantage, but found some loopholes.
I need to write an algorithm to detect what stays in the name column and what stays in whatever other 'word' columns. Any thoughts?
First we set up the problem:
data = <<EOF
Place Div/Tot Div Guntime PerF 1sthalf 2ndhalf 100m Name Club Nettime
===== ======= ===== ======= ==== ======= ======= ====== ========================= ========================== =======
1 1/24 M3034 24:46 866 12:11 12:35 15.88 Andy Bas Prairie Inn Harriers 24:46
2 1/47 M4044 25:03* 856 12:22 12:41 17.52 Some Name Yo Prairie Inn Harriers Runni 25:03
EOF
lines = data.split "\n"
I like to make a format string for String#unpack:
format = lines[1].scan(/(=+)(\s+)/).map{|f, s| "A#{f.size}" + 'x' * s.size}.join
#=> A5xA7xA5xA7xxA4xA7xA7xA6xA25xA26xA7x
The rest is easy:
headers = lines[0].unpack format
lines[2..-1].each do |line|
puts Hash[headers.zip line.unpack(format).map(&:strip)]
end
#=> {"Place"=>"1", "Div/Tot"=>"1/24", "Div"=>"M3034", "Guntime"=>"24:46", "PerF"=>"866", "1sthalf"=>"12:11", "2ndhalf"=>"12:35", "100m"=>"15.88", "Name"=>"Andy Bas", "Club"=>"Prairie Inn Harriers", "Nettime"=>"24:46"}
#=> {"Place"=>"2", "Div/Tot"=>"1/47", "Div"=>"M4044", "Guntime"=>"25:03", "PerF"=>"856", "1sthalf"=>"12:22", "2ndhalf"=>"12:41", "100m"=>"17.52", "Name"=>"Some Name Yo", "Club"=>"Prairie Inn Harriers Runni", "Nettime"=>"25:03"}