I have 15 minute interval barologger pressure data and 5 minute interval CT2X pressure data. I would like to match the CT2X data with the barologger data as close as possible. My timeseries runs from 2024-09-02 to 2024-10-17 with the specific time for each ending differently due to taking them out of the field to download the data (11:31 for CT2X and 12:05 for barologger).
I have run the following code with a two minute allowable threshold difference to match, and it works but only provides 1,409 rows and seems to end early on 2024-09-18 but it only provides me with results until 2024-09-18 3:50 for some reason. I don't believe there is a slippage in time as it ran continuously without interruption.
Here is a sample of the data before running the code:
Here is a sample of the data after running the code
Here is the code I used in the Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Barologger DateTime (24hr)", type datetime}, {"Barologger LEVEL (cm)", type number}, {"CT2X Date / Time (24 hour)", type datetime}, {"CT2X Pressure (psi)", type number}}),
#"Added Custom" =Table.AddColumn(#"Changed Type","Match CT2X", (r)=>
let
//generate a list of differences between each of the times
difs = List.TransformMany(
#"Changed Type"[#"CT2X Date / Time (24 hour)"],
each {_ -r[#"Barologger DateTime (24hr)"]},
(a,b)=> Number.Abs(Number.From(b))),
//determine the row numbers with the matching min times, using 2 minutes (2/1440) as the threshold
pos = if List.Min(difs) < 2/1440
then List.PositionOf(difs,List.Min(difs))
else null
in pos, Int64.Type),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Match CT2X] <> null and [Match CT2X] <> ""),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}),
#"Matching CT2X" = Table.TransformColumns(#"Removed Columns",{"Match CT2X",
each Table.SelectColumns(#"Added Custom",{"CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}){_},type [#"CT2X Date/Time"=datetime, CT2X Pressure=number]}),
#"Expanded Match CT2X" = Table.ExpandRecordColumn(#"Matching CT2X", "Match CT2X", {"CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}, {"CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"})
in
#"Expanded Match CT2X"
Here is the link to the dataset that I'm working with:
I see you have started a new thread with the exact same question and data. I had mentioned in that thread that I was going to work out a more rapid method, but I will post it here instead.
I used the CombinedBaroCT2X!A:D range for the data source, in the code below.
One could easily use the raw data sheets if that is preferable. Here is a very fast Power Query method (and a sample of the output) that works on your data. It works by
Table.Group
to return times that are not more than 2 minutes apart.let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Barologger DateTime (24hr)", type datetime},
{"Barologger LEVEL (cm)", type number}, {"CT2X Date / Time (24 hour)", type datetime}, {"CT2X Pressure (psi)", type number}}),
//Split the tables and add ID column
Barologger = Table.AddColumn(
Table.SelectColumns(#"Changed Type",List.Range(Table.ColumnNames(#"Changed Type"),0,2)),
"ID", each "B",type text),
CT2X = Table.AddColumn(
Table.SelectColumns(#"Changed Type", List.Skip(Table.ColumnNames(#"Changed Type"),2)),
"ID", each "C", type text),
//Interleave and sort by times
t1 = Table.RenameColumns(Barologger,List.Zip({Table.ColumnNames(Barologger),{"DateTime","Pressure","ID"}})),
t2= Table.RenameColumns(CT2X,List.Zip({Table.ColumnNames(CT2X),{"DateTime","Pressure","ID"}})),
tbl = Table.Combine({t1,t2}),
#"Sorted Rows" = Table.Sort(tbl,{{"DateTime", Order.Ascending}}),
#"Filtered Rows" = Table.SelectRows(#"Sorted Rows", each ([DateTime] <> null)),
//Group less than 2 minutes apart
#"Group by Time" = Table.Group(#"Filtered Rows",{"DateTime"},{{
"All", (t)=> if Table.RowCount(t)=1 or List.Count(List.Distinct(t[ID])) = 1 then null
else if Table.RowCount(t)>2 then Error.Record("Too many matching rows")
else let
sortBC = Table.Sort(t,{"ID", Order.Ascending}),
Baro = Record.RemoveFields(Record.RenameFields(t{0},{
{"DateTime","Barologger DateTime (24hr)"},{"Pressure","Barologger LEVEL (cm)"}}),{"ID"}),
CT = Record.RemoveFields(Record.RenameFields(t{1},{
{"DateTime","CT2X Date / Time (24 hour)"}, {"Pressure","CT2X Pressure (psi)"}}),{"ID"})
in
Record.Combine({Baro,CT}),
type [#"Barologger DateTime (24hr)"=datetime,#"Barologger LEVEL (cm)"=number,
#"CT2X Date / Time (24 hour)"= datetime, #"CT2X Pressure (psi)"=number]
}}, GroupKind.Local,(x,y)=>Number.From(Number.Abs(Duration.TotalMinutes(x[DateTime]-y[DateTime]))>2)),
#"Removed Columns" = Table.RemoveColumns(#"Group by Time",{"DateTime"}),
#"Removed Nulls" = Table.SelectRows(#"Removed Columns", each [All]<>null),
#"Expanded All" = Table.ExpandRecordColumn(#"Removed Nulls", "All", {"Barologger DateTime (24hr)", "Barologger LEVEL (cm)", "CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}, {"Barologger DateTime (24hr)", "Barologger LEVEL (cm)", "CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"})
in
#"Expanded All"
If you should want to start with the RawBarologger
and Raw CT2X
, you can use the following code:
let
//Read in the raw tables separately into two different queries, named as per Source and Source2
//Process the raw tables
Source = RawBarologger,
#"Remove Extra Columns" = Table.RemoveColumns(Source,{"ms","TEMPERATURE"}),
#"Combine Date Time" =
Table.TransformColumns(
Table.CombineColumnsToRecord(#"Remove Extra Columns","Date",{"Date","Time"}),
{"Date", each [Date] & [Time], type datetime}),
//Add ID Column and Rename Level => Pressure
#"Renamed Columns" = Table.RenameColumns(#"Combine Date Time",{{"LEVEL", "Pressure"}}),
baroID = Table.AddColumn(#"Renamed Columns","ID", each "B",type text),
//Process Raw CT2x
Source2 = #"Raw CT2X Y1",
#"Remove Extra Columns2" = Table.SelectColumns(Source2, {"Date / Time", "Pressure (psi)"}),
#"Renamed Columns1" = Table.RenameColumns(#"Remove Extra Columns2",{{"Date / Time", "Date"}, {"Pressure (psi)", "Pressure"}}),
ctID = Table.AddColumn(#"Renamed Columns1", "ID", each "C", type text),
//Combine tables and sort
sorted = Table.Sort(Table.Combine({baroID, ctID}),{"Date", Order.Ascending}),
//group with 2 minute separation max
#"Group by Time" = Table.Group(sorted,{"Date"},{{
"All", (t)=> if Table.RowCount(t)=1 or List.Count(List.Distinct(t[ID])) = 1 then null
else if Table.RowCount(t)>2 then Error.Record("Too many matching rows")
else let
sortBC = Table.Sort(t,{"ID", Order.Ascending}),
Baro = Record.RemoveFields(Record.RenameFields(t{0},{
{"Date","Barologger DateTime (24hr)"},{"Pressure","Barologger LEVEL (cm)"}}),{"ID"}),
CT = Record.RemoveFields(Record.RenameFields(t{1},{
{"Date","CT2X Date / Time (24 hour)"}, {"Pressure","CT2X Pressure (psi)"}}),{"ID"})
in
Record.Combine({Baro,CT}),
type [#"Barologger DateTime (24hr)"=datetime,#"Barologger LEVEL (cm)"=number,
#"CT2X Date / Time (24 hour)"= datetime, #"CT2X Pressure (psi)"=number]
}}, GroupKind.Local,(x,y)=>Number.From(Number.Abs(Duration.TotalMinutes(x[Date]-y[Date]))>2)),
//Remove Date column and null tables
#"Removed Columns" = Table.RemoveColumns(#"Group by Time",{"Date"}),
#"Removed Nulls" = Table.SelectRows(#"Removed Columns",each [All] <> null),
#"Expanded All" = Table.ExpandRecordColumn(#"Removed Nulls", "All", {"Barologger DateTime (24hr)", "Barologger LEVEL (cm)", "CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}, {"Barologger DateTime (24hr)", "Barologger LEVEL (cm)", "CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"})
in
#"Expanded All"