Search code examples
ruby-on-railsrubyruby-hash

How to sum repetitions of a value and add it in two values of a key in Ruby?


Im trying to to create a hash with one key per each type of extension on a directory. To every key I would like to add two values: number of times that extension is repeated and total size of all the files with that extension.

Something similar to this:

{".md" => {"ext_reps" => 6, "ext_size_sum" => 2350}, ".txt" => {"ext_reps" => 3, "ext_size_sum" => 1300}}

But I´m stuck on this step:

hash = Hash.new{|hsh,key| hsh[key] = {}}
ext_reps = 0
ext_size_sum = 0

Dir.glob("/home/computer/Desktop/**/*.*").each do |file|
  hash[File.extname(file)].store "ext_reps", ext_reps
  hash[File.extname(file)].store "ext_size_sum", ext_size_sum 
end

p hash

With this result:

{".md" => {"ext_reps" => 0, "ext_size_sum" => 0}, ".txt" => {"ext_reps" => 0, "ext_size_sum" => 0}}

And I can't finde the way to increment ext_reps and ext_siz_sum

Thanks


Solution

  • Suppose the file name extensions and files sizes drawn are as follows.

    files = [{ ext: 'a', size: 10 },
             { ext: 'b', size: 20 },
             { ext: 'a', size: 30 },
             { ext: 'c', size: 40 },
             { ext: 'b', size: 50 },
             { ext: 'a', size: 60 }]
    

    You can use Hash#group_by and Hash#transform_values.

    files.group_by { |h| h[:ext] }.
          transform_values do |arr|
            { "ext_reps"=>arr.size, "ext_size_sum"=>arr.sum { |h| h[:size] } }
          end
            #=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
            #    "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
            #    "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}
    

    Note that the first calculation is as follows.

    files.group_by { |h| h[:ext] }
      #=> {"a"=>[{:ext=>"a", :size=>10}, {:ext=>"a", :size=>30},
      #          {:ext=>"a", :size=>60}],
      #    "b"=>[{:ext=>"b", :size=>20}, {:ext=>"b", :size=>50}],
      #    "c"=>[{:ext=>"c", :size=>40}]}
    

    Another way is use the forms of Hash#update (aka Hash#merge!) and Hash#merge that employ a block to compute the values of keys that are present in both hashes being merged. (Ruby does not consult that block when a key-value pair with key k is being merged into the hash being built (h) when h does not have a key k.)

    See the docs for an explanation of the three parameters of the block that returns the values of common keys of hashes being merged.

    files.each_with_object({}) do |g,h|
       h.update(g[:ext]=>{"ext_reps"=>1, "ext_size_sum"=>g[:size]}) do |_k,o,n|
         o.merge(n) { |_kk, oo, nn| oo + nn }
       end
    end
      #=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
      #    "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
      #    "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}
    

    I've chosen names for the common keys of the "outer" and "inner" hashes (_k and _kk, respectively) that begin with an underscore to signal to the reader that they are not used in the block calculation. This is common practive.

    Note that this approach avoids the creation of a temporary hash similar to that created by group_by and therefore tends to use less memory than the first approach.