I am getting data as follows:-
$0011:0524-08-2021
$0021:0624-08-2021
&0011:0724-08-2021
&0021:0924-08-2021
$0031:3124-08-2021
&0031:3224-08-2021
$0041:3924-08-2021
&0041:3924-08-2021
$0012:3124-08-2021
&0012:3324-08-2021
In $0011:0524-08-2021
, $
denotes start of string, 001
denotes ID, 1:05
denotes time, 24-08-2021
denotes date. Similarly &0011:0624-08-2021
everything is same except &
denotes end of string.
Taking the above data I want to create a data frame as follows:-
1. $0011:0524-08-2021 &0011:0724-08-2021
2. $0021:0624-08-2021 &0021:0924-08-2021
3. $0031:3124-08-2021 &0031:3224-08-2021
4. $0041:3924-08-2021 &0041:3924-08-2021
5. $0012:3124-08-2021 &0012:3324-08-2021
Basically I want to sort the entries into a data frame as shown above. There are few conditions that must be satisfied in doing so.
1.) Column1 should have only entries of $
and Column2 should have only &
entries.
2.) Both the columns should be arranged in increasing order of time. Column1 with $
entries
should be arranged in increasing order of time and same goes for column2 with &
entries.
If you're getting the lines as you shown in your example, you can try:
import pandas as pd
def process_lines(lines):
buffer = {}
for line in map(str.strip, lines):
id_ = line[1:4]
if line[0] == "$":
buffer[id_] = line
elif line[0] == "&" and buffer.get(id_):
yield buffer[id_], line
del buffer[id_]
txt = """$0011:0524-08-2021
$0021:0624-08-2021
&0011:0724-08-2021
&0021:0924-08-2021
$0031:3124-08-2021
&0031:3224-08-2021
$0041:3924-08-2021
&0041:3924-08-2021
$0012:3124-08-2021
&0012:3324-08-2021"""
df = pd.DataFrame(process_lines(txt.splitlines()), columns=["A", "B"])
print(df)
Prints:
A B
0 $0011:0524-08-2021 &0011:0724-08-2021
1 $0021:0624-08-2021 &0021:0924-08-2021
2 $0031:3124-08-2021 &0031:3224-08-2021
3 $0041:3924-08-2021 &0041:3924-08-2021
4 $0012:3124-08-2021 &0012:3324-08-2021