Search code examples
listparsingtcls-expression

parsing keyed lists from a file in tcl?


I have a file full of records in the following format:

{TOKEN 
    { NAME {name of this token} }
    { GROUPS {Group 1} }
    { VALUE value }
    { REPEATING {
        { MAX 3 }
        { TIME {nmin 30} }
    } }
    { WINDOW */*/*/* }
    { ACTION {
        { EXEC {code to run here} }
    } }
}
{TOKEN 
    { NAME {name of next token} }
    { GROUPS {Group 1} }
    { VALUE value }
    { WINDOW 0/0:30-2:00,3:30-7:30/*/* }
    { HOST {localhost} }
    { ACTION {
        { email {
            { FROM [email protected] }
            { TO [email protected] }
            { SUBJ {email subject test} }
            { MSG {this is the email body} }
        } }
    } }

Not all of the records have the same keywords but they all are nested keyed lists and I need to parse them into a .csv file for easier review. However, when I read in the file, it comes in as a single string rather than as a list of keyed lists. Splitting on whitespace or newline wouldn't help because they are located inside the keyed lists too. I tried to insert a pipe (|) between }\n and {T and split on the pipe but I still ended up with strings.

I hope someone can point me in the right direction to parse these s-expression files.

thanks in advance!

J


Solution

  • That looks like a list of TclX keyed lists, which were an earlier attempt to do what modern Tcl does with dictionaries. Keyed lists nest quite nicely — that's a tree, not a table — so mapping to CSV will not be maximally efficient, but their syntax is such that the easiest way to handle them is with the TclX code.

    Preliminaries:

    package require TclX
    package require csv;        # From Tcllib
    

    List the columns that we're going to be interested in. Note the . separating bits of names.

    set columns {
        TOKEN.NAME TOKEN.GROUPS TOKEN.VALUE TOKEN.REPEATING.MAX TOKEN.REPEATING.TIME
        TOKEN.WINDOW TOKEN.HOST TOKEN.ACTION.EXEC TOKEN.ACTION.email.FROM
        TOKEN.ACTION.email.TO TOKEN.ACTION.email.SUBJ TOKEN.ACTION.email.MSG
    }
    # Optionally, put a header row in:
    puts [csv::join $columns]
    

    Loading the real data into Tcl:

    set f [open "thefile.dta"]
    set data [read $f]
    close $f
    

    Iterate over the lists, extract the info, and send to stdout as CSV:

    foreach item $data {
        # Ugly hack to munge data into real TclX format
        set item [list [list [lindex $item 0] [lrange $item 1 end]]]
        set row {}
        foreach label $columns {
            if {![keylget item $label value]} {set value ""}
            lappend row $value
        }
        puts [csv::join $row]
    }
    

    Or something like that.