Looking for advice on the best approach.
I'm working with a text file that is colon delimited, with 4 columns:
user1:company1:QUOTE:printer1
user1:company2:INVOICE:printer2
user1:company1:PURCHASE:printer3
user1:company2:CREDIT:printer4
user2:company1:QUOTE:printer4
user2:company2:INVOICE:printer5
user2:company1:PURCHASE:printer5
user2:company2:CREDIT:printer1
user3:company1:QUOTE:printer2
user3:company2:INVOICE:printer3
user3:company1:PURCHASE:printer4
user3:company2:CREDIT:printer6
This file maps a user to a printer for a specific type of document.
I need to read and potentially manipulate this file.
When reading the file I want to be able to answer different questions:
So the access is somewhat random, ie there is no single query.
My current attempt is with nested dictionaries:
mydict[user][printer] = [list of documents]
I'm looking for a cleaner way to do this.
My current thinking is to use dataclass and create an instance of every record. But how do I do efficiently query these as per my examples above?
Thanks for reading, hope you can guide me.
pandas is made for such analyses.
import pandas as pd # pip install pandas
df = pd.read_csv("path_to_your_file.txt",
sep=":",
names=['User', 'Company', 'Doctype', 'Printer'])
>>> df[df.User == "user1"].Printer
0 printer1
1 printer2
2 printer3
3 printer4
Name: Printer, dtype: object
>>> df[df.Printer == "printer1"].User
0 user1
7 user2
Name: User, dtype: object
>>> df[df.Doctype == "PURCHASE"].User
2 user1
6 user2
10 user3
Name: User, dtype: object
>>> df[(df.User == "user1") & (df.Doctype == "PURCHASE") & (df.Printer == "printer2")]
Empty DataFrame
Columns: [User, Company, Doctype, Printer]
Index: []
(Note the obligatory(!) parentheses around each condition and usage of &
- not and
- in the last example. That's a major source of errors for pandas beginners.)