Search code examples
rubyhashdiffcumulative-sumaccumulator

Ruby diff two hashes and merge in loop with cumulative sum


help with ruby needed!

I have loop in with I collect data about PHP processes (PID, utimes). I have two hashes. First 'h1' looks like:

"domain1" => { :utime => 0, :last_seen => 0, :process_count => 0, :process_count_avg => 0 },
"domain2" => { :utime => 0, :last_seen => 0, :process_count => 0, :process_count_avg => 0 }

This is code for it:

h1[vhostname] ||= { :utime => 0, :last_seen => 0, :process_count => 0, :process_count_avg => 0 }
h1[vhostname][:utime] += utime_proc 
h1[vhostname][:last_seen] = 0

'vhostname' is string containing domain name. 'utime_proc' is a utime value.

In every loop I sum all utimes for all processes of specific domain and output is Domain and its summed utime. But this distorts the real state.

What I need to do is to make cummulative sum of utimes for specific domain. To be the sum of utime for all current processes, but to add the sum of utime for all processes that have already ended. I will probably have to store pid processes for each subdomain utime, and if this process disappears, its last utime is added to the default value (the 'h1' hash): utime at that domain.

So, I create the two more hashes: 'h2' and 'h3'. The value of 'h3' is reseting in every loop round and this stores all PIDs for (sub)domain with its utimes like this:

h3[vhostname] = []
h3[vhostname] << {:pid => pid, :utime => utime_proc}

the 'h2' hash stores PIDs from all loop rounds before, same like 'h3' but its not reseting:

if not h2.key?(vhostname)
h2[vhostname] = []
h2[vhostname] << {:pid => pid, :utime => utime_proc}
end

The output hash should looks like:

{"domain1"=>[{:pid=>2, :utime=>20}, {:pid=>1, :utime=>10}], "domain2"=>[{:pid=>1, :utime=>10}, {:pid=>3, :utime=>30}]}

Now I need to help with: 1. probably diff these two hashes and if PID for disseapear I need to remove them from 'h2' hash and its last utime value sum to utime value stored in 'h1'. 2. I new PID appears for domain (in 'h3' is new pid and in 'h2' not yet), add this PID to 'h2' with its utime to particular domain.

And these are points what Im not able to do. I know I can simply do:

'h2-h3' or 'h3-h2' but I dont know what to do with the result and how to handle it.

Guys, can you help me, please? Short version of my code is bellow. Im still ruby-newbie.

h1 = {}
# Hash to collect PID and its utimes
h2 = {}

loop do  

# Hash to temporarly store PID and its utimes - in each cycle is reseting
h3 = {}

# Here I collect processes
#############
#############

# Collect PIDs and its utimes
# Store PIDs and its utimes temporarly - only for this loop round
h3[vhostname] = []
h3[vhostname] << {:pid => pid, :utime => utime_proc}

# is h2 empty? if so, this is probably first loop round
if not h2.key?(vhostname)
h2[vhostname] = []
h2[vhostname] << {:pid => pid, :utime => utime_proc}
else
# h2 is not empty, we can diff and sum
# PROBABLY PLACE I NEED TO HELP WITH

end
# Here I do some more magic with h1 and output the result with some delay
end

UPDATE

I changed the h2 and h3 hash structure to:

{:domain => "domain1.com", :pid => XXXX, :utime => YYYYY}

Solution

  • I will not try to implement your logic, but dealing with hashes is what I will try to elucidate. For the sake of simplicity, I haven't looped, but just dealt with test data

    domains = ['domain1','domain2','domain3']
    
    h1 = {}
    h2 = {}
    # generate blank template for each domain
    domains.each { |vhostname|
      h1[vhostname] ||= { :utime => 0, :last_seen => 0, :process_count => 0, :process_count_avg => 0 }
      h2[vhostname] ||= {}
    }
    
    # dummy loop
    1.upto(5) {
      h3={}
      # start collecting data for each domain
      domains.each { |vhostname|
        # TEST DATA
        h3[vhostname] ||= {}
        1.upto(5) {
          _pid = rand(1..10)
          h3[vhostname][:"#{_pid}"] ||= {:utime => rand(9999)}
        }
        # TEST DATA
    
        h2[vhostname].merge!(h3[vhostname])
        h2[vhostname].each { |proc, details|
          unless h3[vhostname].key?(proc)
            h1[vhostname][:utime] = h1[vhostname][:utime] + details[:utime]
          end
        }
        h2[vhostname] = h2[vhostname].keep_if { |proc| h3[vhostname].key?(proc)}
      }
    }