Search code examples
parsingj

Reading tabular data from flat file in J


I have a file whose contents look something like this:

A  12    17.5   3.2
B   7    12    11
C   6.2   9.3  13

The whitespace between cells can vary and is not significant, though there must be at least one space. Additionally, the first column only contains (or should only contain) one of those three letters, and I am content to work with 0-2 instead if it simplifies life with J (I suspect it would).

I'm not even sure how to approach this in J. Two approaches jump out at me:

  1. Use ;: to break the file contents into "words". This will produce something like this for me:

       (;: file)
    ┌─┬───────────┬─┬─┬───────┬─┬─┬──────────┐
    │A│12 17.5 3.2│ │B│7 12 11│ │C│6.2 9.3 13│
    └─┴───────────┴─┴─┴───────┴─┴─┴──────────┘
    

    This is interesting, because it has grouped the numeric values together. I could see then selecting out those columns like so:

    (0=3|i.#;:file)#;:file
    

    I could use ". to convert the other rows to numbers. For some reason, doing it piecemeal like this feels hackish.

  2. Use sequential machine (;:)

    The documentation on this verb is making my head spin, but I think if I drew a state transition diagram I could get the words broken up. I don't know if it would be possible to convert any of the words to numbers at the same time though, or if it's possible to return a matrix this way. Is it?

I worry that I'm bringing too much of my experience with other languages to bear on this and it's actually a simple problem in J, if you know how to do it. Is that the case? What's a more idiomatic way to do this with J?


Solution

  • If the file is a string of numbers it does make things a bit easier, so I will replace your A B C with 1 2 3, but I will also add a couple of rows to show how filtering can be done.

    file is the string of characters.

       [ file=.'1  12  17.5   3.2 2   7    12    11   3  6.2   9.3  13 2 2.3 3.6 12 1 3.4 2 3.4'    
    1  12  17.5   3.2 2   7    12    11   3  6.2   9.3  13 2 2.3 3.6 12 1 3.4 2 3.4
    

    Convert file to numerals using ". then take numbers 4 at a time to create a table using _4 ]\ which makes use of dyad Infix \ http://www.jsoftware.com/help/dictionary/d430.htm

       [ array=. _4]\ ". file
    1  12 17.5 3.2
    2   7   12  11
    3 6.2  9.3  13
    2 2.3  3.6  12
    1 3.4    2 3.4
    

    Once that is done you can then group the rows according to their first column and perform any operation that you would like using v/. where v is any verb attached to the key conjunction /. http://www.jsoftware.com/help/dictionary/d421.htm

       ({."1 </. }."1) array
    +------------+----------+----------+
    | 12 17.5 3.2|  7  12 11|6.2 9.3 13|
    |3.4    2 3.4|2.3 3.6 12|          |
    +------------+----------+----------+
    

    For example you take the average of the entries for each row depending on the category of the first column.

       ({."1 (+/ % #)/. }."1) array
     7.7 9.75  3.3
    4.65  7.8 11.5
     6.2  9.3   13
    

    From the comment below, using the ;: trick you can end up with the shape and type that you would like from the original file.

       ;"1 ".each(('123'{~ 'ABC'&i.) each @:{. , }.)"1[ _2 [\ ;: 'A 1.1 2.2 3.3 B 3.4 4.5 5.6 C 6.7 7.8 8.9'
    1 1.1 2.2 3.3
    2 3.4 4.5 5.6
    3 6.7 7.8 8.9