Search code examples
kdb

How to check for valid file name format in kdb/q?


I'd like to check that the file names in my directory are all formatted properly. First I create a variable dir and then use the keyword key to see what files are listed...

q)dir:`:/myDirectory/data/files
q)dirkey:key dir
q)dirkey
`FILEA_XYZ_20190501_b233nyc9_OrderPurchase_000123.json
`FILEB_ABC_20190430_b556nyc1_OrderSale_000456.meta

I select and parse the .json file name...

q)dirjsn:dirkey where dirkey like "*.json"
q)sepname:raze{"_" vs string x}'[dirjsn]
"FILEA"
"XYZ"
"20190501"
"b233nyc9"
"OrderPurchase"
"000123.json"

Next I'd like to confirm that each character in sepname[0] and sepname[1] are letters, that characters in sepname[2] are numerical/temporal, and that sepname[3] contains alphanumeric values.

What is the best way to optimize the following sequential if statements for performance and how can I check for alphanumeric values, like in the case of sepname[3], not just one or the other?

q)if[not sepname[0] like "*[A-Z]";:show "Incorrect Submitter"];
  if[not sepname[1] like "*[A-Z]";:show "Incorrect Reporter"];
  if[not sepname[2] like "*[0-9]";:show "Incorrect Date"];
  if[not sepname[3] like " ??? ";:show "Incorrect Kind"];
  show "Correct File Format"

Solution

  • like will not work in this case as we need to check each character. One way to do that is to use in and inter:

      q) a: ("FILEA"; "XYZ"; "20190501"; "b233nyc9")
    

    Create a character set

      q) c: .Q.a, .Q.A
    

    For first 3 cases, check if each charcter belongs to specific set:

      q) r1: all@'(3#a) in' (c;c;.Q.n)  / output 111b
    

    For alphanumeric case, check if it contains both number and character and no other symbol.

      q)r2: (sum[b]=count a[3]) & all b:sum@'a[3] in/: (c;.Q.n) / output 1b
    

    Print output/errors:

    q) errors: ("Incorrect Submitter";"Incorrect Reporter";"Incorrect Date";"Incorrect Kind")
    q) show $[0=count r:where not r1,r2;"All good";errors r]
    q) "All good"