I have extracted the first level of keys from multidimensional hash, which look like:
my @string = keys %hash;
print "@string\n";
Bacteroides fragilis (strain YCH46).Agrocybe aegerita (Black poplar mushroom) (Agaricus
aegerita).Parabacteroides distasonis (strain ATCC 8503 / DSM 20701 / CIP 104284 / JCM 5825 / NCTC
11152).Pelodictyon phaeoclathratiforme (strain DSM 5477 / BU-1).Clostridium kluyveri (strain NBRC
12016).Torpedo marmorata (Marbled electric ray).Aethionema grandiflorum (Persian stone-cress).Conus
consors (Singed cone).Saguinus labiatus (Red-chested mustached tamarin).Staphylococcus haemolyticus
(strain JCSC1435).Aeromonas salmonicida (strain A449).Acinetobacter genomosp. 13.Staphylococcus
aureus (strain USA300 / TCH1516).Loxosceles variegata (Recluse spider). and so on...
I am trying to count how many times a same organism is repeated (I know for sure that some of there are repeated many times).
I have tried this code:
my %count;
foreach my $os (@string)
{
$count{$os}++;
}
foreach my $os (sort keys %count)
{
print $os, " ", $count{$os}, "\n";
}
But I obtain the output like all of the organisms where just appearing once, although I know that is not the case.
Strangely, when I tried to define a test string manually with some organisms repeated, the code worked.
What is happening with my hash keys?
I am able to access them separately within the list so they are well defined in principle...
Any help?
Edited:
Dumper structure when organism are values:
'ACYP_SYNJB' => {
'94' => 'Synechococcus sp. (strain JA-2-3B\'a(2-13))
(Cyanobacteria bacterium Yellowstone B-Prime).'
},
'ACTM_STRPU' => {
'374' => 'Strongylocentrotus purpuratus (Purple sea
urchin).'
},
'A2ML1_HUMAN' => {
'1454' => 'Homo sapiens (Human).'
},
'ACTP_SALDC' => {
'549' => 'Salmonella dublin (strain CT_02021853).'
},
'ACBG2_XENLA' => {
'739' => 'Xenopus laevis (African clawed frog).'
},
'ACO1_AJECA' => {
'476' => 'Ajellomyces capsulatus (Darling\'s disease
fungus) (Histoplasma capsulatum).'
},
'ACTM_PISOC' => {
'376' => 'Pisaster ochraceus (Ochre sea star)
(Asterias ochracea).'
},
'3MGH_RHOPB' => {
'200' => 'Rhodopseudomonas palustris (strain
BisB18).'
}
};
And when keys:
$VAR3585 = 'Geobacter sulfurreducens (strain ATCC 51573 / DSM 12127 / PCA).';
$VAR3586 = {
'ACPS_GEOSL' => 126,
'ACP_GEOSL' => 77,
'ACKA_GEOSL' => 421,
'ACYP_GEOSL' => 91,
'ACCA_GEOSL' => 319
};
$VAR3587 = 'Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis).';
$VAR3588 = {
'ACT3_BACDO' => 376,
'ACT5_BACDO' => 376,
'ACT1_BACDO' => 376,
'ACT2_BACDO' => 376
};
$VAR3589 = 'Caenorhabditis elegans.';
$VAR3590 = {
'ACH5_CAEEL' => 511,
'6PGD_CAEEL' => 484,
'ACM2_CAEEL' => 627,
'ACADM_CAEEL' => 417,
'ADAL_CAEEL' => 388,
'ACON_CAEEL' => 777,
'ACBP3_CAEEL' => 116,
'2AB1_CAEEL' => 495,
'3HIDH_CAEEL' => 299,
'ACH1_CAEEL' => 498,
'6PGL_CAEEL' => 269,
'2A51_CAEEL' => 542,
'2AAA_CAEEL' => 590,
'A16L2_CAEEL' => 534,
'ACH4_CAEEL' => 548,
'ACC2_CAEEL' => 445,
'ADA17_CAEEL' => 686,
'ACR5_CAEEL' => 598,
'ACTL1_CAEEL' => 360,
'ADBP1_CAEEL' => 217,
'ACH8_CAEEL' => 474,
'5NT3_CAEEL' => 376,
'ACT2_CAEEL' => 376,
'AAR2_CAEEL' => 357,
'ACH23_CAEEL' => 545,
'ACD11_CAEEL' => 617,
'ABF2_CAEEL' => 85,
'ABDH3_CAEEL' => 375,
'ABF1_CAEEL' => 85,
'ABH51_CAEEL' => 355,
'ACX15_CAEEL' => 659,
'ACC1_CAEEL' => 466,
'ABL1_CAEEL' => 1224,
'ACC3_CAEEL' => 517,
'ABH52_CAEEL' => 444,
'ACT4_CAEEL' => 376,
'ACH2_CAEEL' => 493,
'ACBP1_CAEEL' => 86,
'14332_CAEEL' => 248,
'ACR7_CAEEL' => 538,
'ACC4_CAEEL' => 408,
'ACE1_CAEEL' => 620,
'AATC_CAEEL' => 408,
'ACH6_CAEEL' => 502,
'ACH3_CAEEL' => 564,
'ACR3_CAEEL' => 487,
'ACMSD_CAEEL' => 401,
'ACH7_CAEEL' => 507,
'ACR2_CAEEL' => 575,
'ACASE_CAEEL' => 272,
'ACM3_CAEEL' => 611,
'AAPK2_CAEEL' => 626,
'ACN1_CAEEL' => 906,
'3HAO_CAEEL' => 281,
'ADAS_CAEEL' => 597,
'ACT1_CAEEL' => 376,
'A4_CAEEL' => 686,
'ADA10_CAEEL' => 922,
'A16L1_CAEEL' => 578,
'ACT3_CAEEL' => 376,
'ACP1_CAEEL' => 426,
'ACM1_CAEEL' => 713,
'AAPK1_CAEEL' => 589,
'ACOC_CAEEL' => 887,
'ACLY_CAEEL' => 1106,
'14331_CAEEL' => 248
};
$VAR3591 = 'Anopheles stephensi (Indo-Pakistan malaria mosquito).';
$VAR3592 = {
'ACES_ANOST' => 664
};
$VAR3593 = 'Bacillus thuringiensis subsp. konkukian (strain 97-27).';
$VAR3594 = {
'ACKA_BACHK' => 397,
'ACCD_BACHK' => 289,
'ACPS_BACHK' => 119,
'3MGH_BACHK' => 205,
'ACCA_BACHK' => 324,
'ACP_BACHK' => 77
};
More exactly, I wanto to know which organisms have more than 50 proteins Ids in my hash, and select them, getting rid of the other organisms with less number of proteins
More exactly, I wanto to know which organisms have more than 50 proteins Ids in my hash, and select them, getting rid of the other organisms with less number of proteins
I'm not fully sure that I've completely understood your question but it looks like you have the following kind of hash:
my %hash = (
'protein_id#1' => {
'some-number' => 'organism-name'
},
'protein_id#2' => {
'some-number' => 'same-or-other-organism-name',
},
...
);
And you want to count how many protein_id#X´ are for each different
organism-name`.
In this case the following should work:
my %organism;
# "outer" hash has protein_id as key
while (my ($protein,$h2) = each %hash) {
# "inner" hash has organism-name as value
# same organism could maybe be multiple times inside the same inner hash
# but should only be counted once per protein_id
my %organism;
while (my ($some_number,$o) = each %$h2) {
$organism{$o}++
}
for (keys %organism) {
$count{$_}++;
}
}