I have a textfile with hundreds of rows where each row looks like the following:
"LastName, FirstName MiddleName", 222555,X-150,2023,0.15,0.20,0.5,"1, 2, 10",--,1.5,5.10,report
The separators are normally commas except when within quotes. The above row each needs to be separated into the following columns:
LastName, FirstName MiddleName
222555
X-150
2023
0.15
0.20
0.5
1,2,10
--
1.5
5.10
report
Try something like this if you want to use Pandas. Just using read_csv
should still work fine on your data, then you can split the columns that are in quotes into their own columns. Last, remove the columns with the quotes:
#### Making mock csv/text file
from io import StringIO
csv_file = StringIO('\n'.join(['"LastName, FirstName MiddleName", 222555,X-150,2023,0.15,0.20,0.5,"1, 2, 10",--,1.5,5.10,report']*3))
#########################
# Load in the file as a CSV
df = pd.read_csv(csv_file, header = None)
# Split the names into separate name columns
df[['Last', 'First', 'Middle']] = [x for x in df[0].str.replace(',', '').str.split(' ')]
# Split the numbers in quotes into 3 separate columns
df[['num1', 'num2', 'num3']] = [x for x in df[7].str.split(', ')]
# Remove the columns that you split
df = df.drop(df.columns[[0, 7]], axis = 1)
Output:
1 2 3 4 5 6 8 9 10 11 Last First Middle num1 num2 num3
0 222555 X-150 2023 0.15 0.2 0.5 -- 1.5 5.1 report LastName FirstName MiddleName 1 2 10
1 222555 X-150 2023 0.15 0.2 0.5 -- 1.5 5.1 report LastName FirstName MiddleName 1 2 10
2 222555 X-150 2023 0.15 0.2 0.5 -- 1.5 5.1 report LastName FirstName MiddleName 1 2 10