R: splitting a textfile into a workable dataframe

I am working with a text file containing localization data. Every 5 minutes there are multiple reports which can result into a calculated zone. If it resolves the zone it outputs a identified room ID (4260 and 4256 in the example):

[08/14/2021 05:05:59 600] - TagId: 4194912 Identified RoomId:4260
[08/14/2021 05:05:59 616] - TagId: 4194912 Last Monitorid:4195283
[08/14/2021 05:05:59 631] - TagId: 4194912 After RoomId:2199
[08/14/2021 05:05:59 631] - Localization RoomId: 2199
[08/14/2021 05:05:59 663] - TagId: 4194912 Reporting RoomId:2199
[08/14/2021 05:05:59 663] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:05:59 678] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:05:59 678] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:05:59 694] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_A8 RSSI: -83
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:05:59 709] - MacId: F0_5C_19_C6_88_2E RSSI: -54
[08/14/2021 05:05:59 913] - MacId: 40_E3_D6_CA_56_5C RSSI: -92
[08/14/2021 05:05:59 913] - MacId: F0_5C_19_C6_88_52 RSSI: -92
[08/14/2021 05:05:59 928] - MacId: F0_5C_19_C6_88_B8 RSSI: -80
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:06:00 288] - MacId: 40_E3_D6_CA_57_0A RSSI: -90
[08/14/2021 05:06:00 288] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:06:00 413] - MacId: F0_5C_19_C6_88_90 RSSI: -90
[08/14/2021 05:06:00 413] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_A8 RSSI: -83
[08/14/2021 05:06:00 428] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:00 974] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:11:01 006] - TagId: 4194912 Identified RoomId:4256
[08/14/2021 05:11:01 021] - TagId: 4194912 Last Monitorid:4195283
[08/14/2021 05:11:01 037] - TagId: 4194912 After RoomId:2199
[08/14/2021 05:11:01 052] - Localization RoomId: 2199
[08/14/2021 05:11:01 084] - TagId: 4194912 Reporting RoomId:2199
[08/14/2021 05:11:01 084] - MacId: F0_5C_19_C7_86_54 RSSI: -83
[08/14/2021 05:11:01 084] - MacId: F0_5C_19_C6_88_78 RSSI: -90
[08/14/2021 05:11:01 099] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_A8 RSSI: -84
[08/14/2021 05:11:01 349] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:11:01 365] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:11:01 474] - MacId: 40_E3_D6_CA_56_5C RSSI: -93
[08/14/2021 05:11:01 490] - MacId: F0_5C_19_C6_88_52 RSSI: -90
[08/14/2021 05:11:01 490] - MacId: F0_5C_19_C6_88_BE RSSI: -89
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_88_A4 RSSI: -72
[08/14/2021 05:11:01 802] - MacId: 40_E3_D6_CA_57_0A RSSI: -90
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_89_3C RSSI: -45
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C6_88_78 RSSI: -89
[08/14/2021 05:11:01 802] - MacId: F0_5C_19_C7_86_54 RSSI: -82
[08/14/2021 05:11:02 006] - MacId: F0_5C_19_C6_88_90 RSSI: -89
[08/14/2021 05:11:02 006] - MacId: F0_5C_19_C6_88_22 RSSI: -80
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_A8 RSSI: -84
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_2E RSSI: -55
[08/14/2021 05:11:02 021] - MacId: F0_5C_19_C6_88_12 RSSI: -60
[08/14/2021 05:11:02 115] - MacId: F0_5C_19_C6_88_52 RSSI: -91
[08/14/2021 05:11:02 115] - MacId: F0_5C_19_C6_88_BE RSSI: -88

I would like to end up with data in the form of:

Image of desired dataframe

In case the RoomId is not resolved in the 5 minute timeframe (from the raw text file) the column RoomId can just be NA.

A very helpful member already showed how to split the columns in the right manner (R: Splitting a string into multiple columns)

So the main question is: How can I structure this raw text file in a way similar to the image where it is a workable dataframe although not all the lines from the raw text file are similar?

Solution

Assuming the data is stored in a text file called 'temp.txt', you can read it using readLines. Keep only the rows which has MacId and RSSI values and to get the RoomId keep the 'Identified RoomId' row as well. Split the data in sets and from each set extract Datetime, MacId and RSSI using the code from previous post and extract the room id by removing everything until RoomId. You can combine the output into one dataframe.

data <- readLines('temp.txt')
req_data <- grep('MacId.*RSSI|Identified RoomId', data, value = TRUE)
result <- do.call(rbind, by(req_data, cumsum(grepl('Identified', req_data)), 
          function(x) {
  room_id <- sub('.*RoomId:\\s*', '', x[1])
  cbind(strcapture('\\[(.*)\\] - MacId: (.*) RSSI: (.*)', x[-1], 
             proto = list(Datetime = character(), MacId = character(), 
                          RSSI = numeric())), RoomId = room_id)
}))
rownames(result) <- NULL

For the text data shared I get output as -

result
                  Datetime             MacId RSSI RoomId
1  08/14/2021 05:05:59 663 F0_5C_19_C6_88_A4  -72   4260
2  08/14/2021 05:05:59 678 F0_5C_19_C7_86_54  -82   4260
3  08/14/2021 05:05:59 678 F0_5C_19_C6_89_3C  -45   4260
4  08/14/2021 05:05:59 694 F0_5C_19_C6_88_22  -80   4260
5  08/14/2021 05:05:59 709 F0_5C_19_C6_88_12  -60   4260
6  08/14/2021 05:05:59 709 F0_5C_19_C6_88_A8  -83   4260
7  08/14/2021 05:05:59 709 F0_5C_19_C6_88_90  -89   4260
8  08/14/2021 05:05:59 709 F0_5C_19_C6_88_2E  -54   4260
9  08/14/2021 05:05:59 913 40_E3_D6_CA_56_5C  -92   4260
10 08/14/2021 05:05:59 913 F0_5C_19_C6_88_52  -92   4260
11 08/14/2021 05:05:59 928 F0_5C_19_C6_88_B8  -80   4260
12 08/14/2021 05:06:00 288 F0_5C_19_C6_88_A4  -72   4260
13 08/14/2021 05:06:00 288 F0_5C_19_C7_86_54  -82   4260
14 08/14/2021 05:06:00 288 40_E3_D6_CA_57_0A  -90   4260
15 08/14/2021 05:06:00 288 F0_5C_19_C6_89_3C  -45   4260
16 08/14/2021 05:06:00 413 F0_5C_19_C6_88_90  -90   4260
17 08/14/2021 05:06:00 413 F0_5C_19_C6_88_12  -60   4260
18 08/14/2021 05:06:00 428 F0_5C_19_C6_88_22  -80   4260
19 08/14/2021 05:06:00 428 F0_5C_19_C6_88_A8  -83   4260
20 08/14/2021 05:06:00 428 F0_5C_19_C6_88_2E  -55   4260
21 08/14/2021 05:11:00 974 F0_5C_19_C6_88_A4  -72   4260
22 08/14/2021 05:11:01 084 F0_5C_19_C7_86_54  -83   4256
23 08/14/2021 05:11:01 084 F0_5C_19_C6_88_78  -90   4256
24 08/14/2021 05:11:01 099 F0_5C_19_C6_89_3C  -45   4256
25 08/14/2021 05:11:01 349 F0_5C_19_C6_88_12  -60   4256
26 08/14/2021 05:11:01 349 F0_5C_19_C6_88_2E  -55   4256
27 08/14/2021 05:11:01 349 F0_5C_19_C6_88_A8  -84   4256
28 08/14/2021 05:11:01 349 F0_5C_19_C6_88_90  -89   4256
29 08/14/2021 05:11:01 365 F0_5C_19_C6_88_22  -80   4256
30 08/14/2021 05:11:01 474 40_E3_D6_CA_56_5C  -93   4256
31 08/14/2021 05:11:01 490 F0_5C_19_C6_88_52  -90   4256
32 08/14/2021 05:11:01 490 F0_5C_19_C6_88_BE  -89   4256
33 08/14/2021 05:11:01 802 F0_5C_19_C6_88_A4  -72   4256
34 08/14/2021 05:11:01 802 40_E3_D6_CA_57_0A  -90   4256
35 08/14/2021 05:11:01 802 F0_5C_19_C6_89_3C  -45   4256
36 08/14/2021 05:11:01 802 F0_5C_19_C6_88_78  -89   4256
37 08/14/2021 05:11:01 802 F0_5C_19_C7_86_54  -82   4256
38 08/14/2021 05:11:02 006 F0_5C_19_C6_88_90  -89   4256
39 08/14/2021 05:11:02 006 F0_5C_19_C6_88_22  -80   4256
40 08/14/2021 05:11:02 021 F0_5C_19_C6_88_A8  -84   4256
41 08/14/2021 05:11:02 021 F0_5C_19_C6_88_2E  -55   4256
42 08/14/2021 05:11:02 021 F0_5C_19_C6_88_12  -60   4256
43 08/14/2021 05:11:02 115 F0_5C_19_C6_88_52  -91   4256
44 08/14/2021 05:11:02 115 F0_5C_19_C6_88_BE  -88   4256