Create DataFrame from raw input

I am getting data as follows:-

$0011:0524-08-2021
$0021:0624-08-2021
&0011:0724-08-2021
&0021:0924-08-2021
$0031:3124-08-2021
&0031:3224-08-2021
$0041:3924-08-2021
&0041:3924-08-2021
$0012:3124-08-2021
&0012:3324-08-2021

In $0011:0524-08-2021, $ denotes start of string, 001 denotes ID, 1:05 denotes time, 24-08-2021 denotes date. Similarly &0011:0624-08-2021 everything is same except & denotes end of string.

Taking the above data I want to create a data frame as follows:-

1. $0011:0524-08-2021   &0011:0724-08-2021
2. $0021:0624-08-2021   &0021:0924-08-2021
3. $0031:3124-08-2021   &0031:3224-08-2021
4. $0041:3924-08-2021   &0041:3924-08-2021
5. $0012:3124-08-2021   &0012:3324-08-2021

Basically I want to sort the entries into a data frame as shown above. There are few conditions that must be satisfied in doing so.

1.) Column1 should have only entries of $ and Column2 should have only & entries.

2.) Both the columns should be arranged in increasing order of time. Column1 with $ entries should be arranged in increasing order of time and same goes for column2 with & entries.

Solution

If you're getting the lines as you shown in your example, you can try:

import pandas as pd


def process_lines(lines):
    buffer = {}
    for line in map(str.strip, lines):
        id_ = line[1:4]
        if line[0] == "$":
            buffer[id_] = line
        elif line[0] == "&" and buffer.get(id_):
            yield buffer[id_], line
            del buffer[id_]


txt = """$0011:0524-08-2021
$0021:0624-08-2021
&0011:0724-08-2021
&0021:0924-08-2021
$0031:3124-08-2021
&0031:3224-08-2021
$0041:3924-08-2021
&0041:3924-08-2021
$0012:3124-08-2021
&0012:3324-08-2021"""

df = pd.DataFrame(process_lines(txt.splitlines()), columns=["A", "B"])
print(df)

Prints:

                    A                   B
0  $0011:0524-08-2021  &0011:0724-08-2021
1  $0021:0624-08-2021  &0021:0924-08-2021
2  $0031:3124-08-2021  &0031:3224-08-2021
3  $0041:3924-08-2021  &0041:3924-08-2021
4  $0012:3124-08-2021  &0012:3324-08-2021