Search code examples
duplicateswolfram-mathematicastring-search

Print Different Output Values Corresponding to Duplicate Input in a Table?


For example, TableA:

     ID1    ID2   
     123    abc
     123    def
     123    ghi
     123    jkl
     123    mno
     456    abc
     456    jkl

I want to do a string search for 123 and return all corresponding values.

    pp = Cases[#, x_List /; 
     MemberQ[x, y_String /; 
       StringMatchQ[y, ToString@p, IgnoreCase -> True]], {1}] &@TableA

    {f4@"ID2", f4@pp[[2]]}

Above, p is the input, or 123. This returns only one value for ID2. How do I get all values for ID2?


Solution

  • To complement other solutions, I would like to explore the high-performance corner of this problem, that is, the case when the table is large, and one needs to perform many queries. Obviously, some kind of preprocessing can save a lot of execution time in such a case. I would like to show a rather obscure but IMO elegant solution based on a combination of Dispatch and ReplaceList. Here is a small table for an illustration (I use strings for all the entries, to keep it close to the original question):

    makeTestTable[nids_, nelems_] :=
      Flatten[Thread[{"ID" <> ToString@#, 
             ToString /@ Range[#, nelems + # - 1]}] & /@ Range[nids], 1]
    
    In[57]:= (smallTable = makeTestTable[3,5])//InputForm
    Out[57]//InputForm=
    {{"ID1", "1"}, {"ID1", "2"}, {"ID1", "3"}, {"ID1", "4"}, {"ID1", "5"}, 
     {"ID2", "2"}, {"ID2", "3"}, {"ID2", "4"}, {"ID2", "5"}, {"ID2", "6"}, 
     {"ID3", "3"}, {"ID3", "4"}, {"ID3", "5"}, {"ID3", "6"}, {"ID3", "7"}}
    

    The preprocessing step consists of making a Dispatch-ed table of rules from the original table:

    smallRules = Dispatch[Rule @@@ smallTable];
    

    The code to get (say, for "ID2") the values is then:

    In[59]:= ReplaceList["ID2", smallRules]
    
    Out[59]= {"2", "3", "4", "5", "6"}
    

    This does not look like a big deal, but let us move to larger tables:

    In[60]:= Length[table = makeTestTable[1000,1000]]
    Out[60]= 1000000
    

    Preprocessing step admittedly takes some time:

    In[61]:= (rules = Dispatch[Rule @@@ table]); // Timing
    
    Out[61]= {3.703, Null}
    

    But we only need it once. Now, all subsequent queries (perhaps except the very first) will be near instantaneous:

    In[75]:= ReplaceList["ID520",rules]//Short//Timing
    Out[75]= {0.,{520,521,522,523,524,525,<<988>>,1514,1515,1516,1517,1518,1519}}
    

    while an approach without the preprocessing takes a sizable fraction of a second for this table size:

    In[76]:= Cases[table,{"ID520",_}][[All,2]]//Short//Timing
    Out[76]= {0.188,{520,521,522,523,524,525,<<988>>,1514,1515,1516,1517,1518,1519}}
    

    I realize that this may be an overkill for the original question, but tasks like this are rather common, for example when someone wants to explore some large dataset imported from a database, directly in Mathematica.