Search code examples
rubyhashncbi

How can you add to a hash value instead of having it overwrite with the new value?


Basically I have these files (medline from NCBI). Each is associated with a journal title. Each has 0, 1 or more genbank identification numbers (GBIDs). I can associate the number of GBIDs per file with each journal name. My problem is that I may have more than one file associated with the same journal, and I don't know how to add the number of GBIDs per file into a total number of GBIDs per journal.

My current code: jt stands for journal title, pulled out properly from the file. GBIDs are added to the count as encountered.

Full code:

 #!/usr/local/bin/ruby

 require 'rubygems'
 require 'bio'


Bio::NCBI.default_email = '[email protected]'

ncbi_search = Bio::NCBI::REST::ESearch.new
ncbi_fetch = Bio::NCBI::REST::EFetch.new


print "\nQuery?\s" 

query_phrase = gets.chomp

"\nYou said \"#{query_phrase}\". Searching, please wait..."

pmid_list = ncbi_search.search("pubmed", "#{query_phrase}", 0)

puts "\nYour search returned #{pmid_list.count} results."

if pmid_list.count > 200
puts "\nToo big."
exit
end

gbid_hash = Hash.new
jt_hash = Hash.new(0)


pmid_list.each do |pmid|

ncbi_fetch.pubmed(pmid, "medline").each do |pmid_line|

    if pmid_line =~ /JT.+- (.+)\n/
        jt = $1
        jt_count = 0
        jt_hash[jt] = jt_count

        ncbi_fetch.pubmed(pmid, "medline").each do |pmid_line_2|

            if pmid_line_2 =~ /SI.+- GENBANK\/(.+)\n/
                gbid = $1
                jt_count += 1
                gbid_hash["#{gbid}\n"] = nil
            end 
        end 

        if jt_count > 0
            puts "#{jt} = #{jt_count}"

        end
        jt_hash[jt] += jt_count
    end
end
end


jt_hash.each do |key,value|
# if value > 0
    puts "Journal: #{key} has #{value} entries associtated with it. "
# end
end

# gbid_file = File.open("temp_*.txt","r").each do |gbid_count|
#   puts gbid_count
# end

My result:

 Your search returned 192 results.
 Virology journal = 8
 Archives of virology = 9
 Virus research = 1
 Archives of virology = 6
 Virology = 1

Basically, how do I get it to say Archives of virology = 15, but for any journal title? I tried a hash, but the second archives of virology just overwrote the first... is there a way to make two keys add their values in a hash?


Solution

  • I don't entirely follow what you are asking for here.

    However, you are overwriting your value for a given hash key because because you are doing this:

    jt_count = 0
    jt_hash[jt] = jt_count
    

    You already initialized your hash earlier like this:

    jt_hash = Hash.new(0)
    

    That is, every key will have a default value of 0. Thus, there's no need to do initialize jt_hash[jt] to 0.

    If you remove this line:

     jt_hash[jt] = jt_count
    

    Then the values for jt_hash[jt] should accumulate for each pass through the loop

    ncbi_fetch.pubmed(pmid, "medline").each do |pmid_line|
      ....
    end