I recently came across the toolz repository and decided to give it a spin.
Unfortunately, I’m having some trouble properly using it, or at least understanding it.
My first simple task for myself was to parse a tab separated TSV file and get the second column entry in it.
For example, given the file foo.tsv
:
a b c
d e f
I’d like to return a list of ['b', 'e']
. I successfully achieved that with the following piece of logic
from toolz.curried import *
with open("foo.tsv", 'r') as f:
data = pipe(f, map(str.rstrip),
map(str.split),
map(get(1)),
tuple)
print(data)
However, if I change the foo.tsv
file to use commas instead of tabs as the column delimiters I cannot seem to figure out the best way to adjust the above code to handle that. It’s not clear to me how to add best a ","
argument to the str.split
function while using the map
with either the pipe
or thread_first
functions.
Is there already some existing documentation that already describes this?
Don't be afraid of using lambdas.
map(lambda s: s.split(','))
It's maybe a bit less pretty than map(str.split)
but it gets the point across
Consider using pluck(...)
rather than map(get(...))
map(get(1)) -> pluck(1)
If you have a CSV file you might consider just using Pandas, which is very fast and highly optimized for this kind of work.