Search code examples
regexgrepmatchjuliacase-insensitive

DataArray case-insensitive match that returns the index value of the match


I have a DataFrame inside of a function:

using DataFrames

myservs = DataFrame(serverName = ["elmo", "bigBird", "Oscar", "gRover", "BERT"],
                    ipAddress = ["12.345.6.7", "12.345.6.8", "12.345.6.9", "12.345.6.10", "12.345.6.11"])
myservs
5x2 DataFrame
| Row | serverName | ipAddress     |
|-----|------------|---------------|
| 1   | "elmo"     | "12.345.6.7"  |
| 2   | "bigBird"  | "12.345.6.8"  |
| 3   | "Oscar"    | "12.345.6.9"  |
| 4   | "gRover"   | "12.345.6.10" |
| 5   | "BERT"     | "12.345.6.11" |

How can I write the function to take a single parameter called server, case-insensitive match the server parameter in the myservs[:serverName] DataArray, and return the match's corresponding ipAddress?

In R this can be done by using

myservs$ipAddress[grep("server", myservs$serverName, ignore.case = T)]

I don't want it to matter if someone uses ElMo or Elmo as the server, or if the serverName is saved as elmo or ELMO.


Solution

  • I referenced how to accomplish the task in R and tried to do it using the DataFrames pkg, but I only did this because I'm coming from R and am just learning Julia. I asked a lot of questions from coworkers and the following is what we came up with:

    This task is much cleaner if I was to stop thinking in terms of vectors in R. Julia runs plenty fast iterating through a loop.

    Even still, looping wouldn't be the best solution here. I was told to look into Dicts (check here for an example). Dict(), zip(), haskey(), and get() blew my mind. These have many applications.

    My solution doesn't even need to use the DataFrames pkg, but instead uses Julia's Matrix and Array data representations. By using let we keep the global environment clutter free and the server name/ip list stays hidden from view to those who are only running the function.

    In the sample code, I'm recreating the server matrix every time, but in reality/practice I'll have a permission restricted delimited file that gets read every time. This is OK for now since the delimited files are small, but this may not be efficient or the best way to do it.

    # ONLY ALLOW THE FUNCTION TO BE SEEN IN THE GLOBAL ENVIRONMENT
    let global myIP
    
      # SERVER MATRIX
      myservers = ["elmo" "12.345.6.7"; "bigBird" "12.345.6.8";
                   "Oscar" "12.345.6.9"; "gRover" "12.345.6.10";
                   "BERT" "12.345.6.11"]
    
      # SERVER DICT
      servDict = Dict(zip(pmap(lowercase, myservers[:, 1]), myservers[:, 2]))
    
      # GET SERVER IP FUNCTION: INPUT = SERVER NAME; OUTPUT = IP ADDRESS
      function myIP(servername)
        sn = lowercase(servername)
        get(servDict, sn, "That name isn't in the server list.")
      end
    end
    
    ​# Test it out
    myIP("SLIMEY")
    ​#>​"That name isn't in the server list."
    
    myIP("elMo"​)
    #>​"12.345.6.7"