Search code examples
exceltime-seriespowerquery

How can I match timeseries data using PowerQuery in Microsoft Excel?


I have 15 minute interval barologger pressure data and 5 minute interval CT2X pressure data. I would like to match the CT2X data with the barologger data as close as possible. My timeseries runs from 2024-09-02 to 2024-10-17 with the specific time for each ending differently due to taking them out of the field to download the data (11:31 for CT2X and 12:05 for barologger).

I have run the following code with a two minute allowable threshold difference to match, and it works but only provides 1,409 rows and seems to end early on 2024-09-18 but it only provides me with results until 2024-09-18 3:50 for some reason. I don't believe there is a slippage in time as it ran continuously without interruption.

Here is a sample of the data before running the code: enter image description here

Here is a sample of the data after running the code enter image description here

Here is the code I used in the Power Query:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Barologger DateTime (24hr)", type datetime}, {"Barologger LEVEL (cm)", type number}, {"CT2X Date / Time (24 hour)", type datetime}, {"CT2X Pressure (psi)", type number}}),

#"Added Custom" =Table.AddColumn(#"Changed Type","Match CT2X", (r)=>
let

//generate a list of differences between each of the times
    difs = List.TransformMany(
                #"Changed Type"[#"CT2X Date / Time (24 hour)"],
                each {_ -r[#"Barologger DateTime (24hr)"]},
                (a,b)=> Number.Abs(Number.From(b))),

//determine the row numbers with the matching min times, using 2 minutes (2/1440) as the threshold
    pos = if List.Min(difs) < 2/1440
            then List.PositionOf(difs,List.Min(difs))
            else null
    in pos, Int64.Type),

    
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Match CT2X] <> null and [Match CT2X] <> ""),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}),
#"Matching CT2X" = Table.TransformColumns(#"Removed Columns",{"Match CT2X", 
    each Table.SelectColumns(#"Added Custom",{"CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}){_},type [#"CT2X Date/Time"=datetime, CT2X Pressure=number]}),
#"Expanded Match CT2X" = Table.ExpandRecordColumn(#"Matching CT2X", "Match CT2X", {"CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}, {"CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"})

in
#"Expanded Match CT2X"

Here is the link to the dataset that I'm working with:

CT2X and Barologger DataFile


Solution

  • I see you have started a new thread with the exact same question and data. I had mentioned in that thread that I was going to work out a more rapid method, but I will post it here instead.

    I used the CombinedBaroCT2X!A:D range for the data source, in the code below.

    One could easily use the raw data sheets if that is preferable. Here is a very fast Power Query method (and a sample of the output) that works on your data. It works by

    • Combine the CTX and Baro data into a single pair of columns (adding an ID column so we will be able to tell them apart at the end).
    • Sort by Date order
    • Group using the fourth and fifth arguments of Table.Group to return times that are not more than 2 minutes apart.
    • Check that the grouped tables have two rows with different ID's
      • Recombine
    • It does some error checking, but let me know if you run across errors that are not detected.
    let
        Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Barologger DateTime (24hr)", type datetime}, 
            {"Barologger LEVEL (cm)", type number}, {"CT2X Date / Time (24 hour)", type datetime}, {"CT2X Pressure (psi)", type number}}),
     
    //Split the tables and add ID column
        Barologger = Table.AddColumn(
            Table.SelectColumns(#"Changed Type",List.Range(Table.ColumnNames(#"Changed Type"),0,2)), 
            "ID", each "B",type text),
    
        CT2X = Table.AddColumn(
                    Table.SelectColumns(#"Changed Type", List.Skip(Table.ColumnNames(#"Changed Type"),2)),
                    "ID", each "C", type text),
    
    //Interleave and sort by times
        t1 = Table.RenameColumns(Barologger,List.Zip({Table.ColumnNames(Barologger),{"DateTime","Pressure","ID"}})),
        t2=  Table.RenameColumns(CT2X,List.Zip({Table.ColumnNames(CT2X),{"DateTime","Pressure","ID"}})),
        tbl = Table.Combine({t1,t2}),
        #"Sorted Rows" = Table.Sort(tbl,{{"DateTime", Order.Ascending}}),
        #"Filtered Rows" = Table.SelectRows(#"Sorted Rows", each ([DateTime] <> null)),
    
    //Group less than 2 minutes apart
        #"Group by Time" = Table.Group(#"Filtered Rows",{"DateTime"},{{
            "All", (t)=> if Table.RowCount(t)=1 or List.Count(List.Distinct(t[ID])) = 1  then null
                        else if Table.RowCount(t)>2 then Error.Record("Too many matching rows")
                else let 
                        sortBC = Table.Sort(t,{"ID", Order.Ascending}),
                        Baro = Record.RemoveFields(Record.RenameFields(t{0},{
                                {"DateTime","Barologger DateTime (24hr)"},{"Pressure","Barologger LEVEL (cm)"}}),{"ID"}),
                        CT = Record.RemoveFields(Record.RenameFields(t{1},{
                                {"DateTime","CT2X Date / Time (24 hour)"}, {"Pressure","CT2X Pressure (psi)"}}),{"ID"})
                    in 
                        Record.Combine({Baro,CT}),
                    type [#"Barologger DateTime (24hr)"=datetime,#"Barologger LEVEL (cm)"=number, 
                            #"CT2X Date / Time (24 hour)"= datetime, #"CT2X Pressure (psi)"=number]
        
        }}, GroupKind.Local,(x,y)=>Number.From(Number.Abs(Duration.TotalMinutes(x[DateTime]-y[DateTime]))>2)),
    
        #"Removed Columns" = Table.RemoveColumns(#"Group by Time",{"DateTime"}),
        #"Removed Nulls" = Table.SelectRows(#"Removed Columns", each [All]<>null),
        #"Expanded All" = Table.ExpandRecordColumn(#"Removed Nulls", "All", {"Barologger DateTime (24hr)", "Barologger LEVEL (cm)", "CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}, {"Barologger DateTime (24hr)", "Barologger LEVEL (cm)", "CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"})
    in
        #"Expanded All"
    

    Partial Output
    enter image description here

    If you should want to start with the RawBarologger and Raw CT2X, you can use the following code:

    let
    
    //Read in the raw tables separately into two different queries, named as per Source and Source2
    //Process the raw tables
        Source = RawBarologger,
        #"Remove Extra Columns" = Table.RemoveColumns(Source,{"ms","TEMPERATURE"}),
        
        #"Combine Date Time" = 
            Table.TransformColumns(
                Table.CombineColumnsToRecord(#"Remove Extra Columns","Date",{"Date","Time"}),
                {"Date", each [Date] & [Time], type datetime}),
    
    //Add ID Column and Rename Level => Pressure
        #"Renamed Columns" = Table.RenameColumns(#"Combine Date Time",{{"LEVEL", "Pressure"}}),
        baroID = Table.AddColumn(#"Renamed Columns","ID", each "B",type text),
    
    //Process Raw CT2x
        Source2 = #"Raw CT2X Y1",
        #"Remove Extra Columns2" = Table.SelectColumns(Source2, {"Date / Time", "Pressure (psi)"}),
        #"Renamed Columns1" = Table.RenameColumns(#"Remove Extra Columns2",{{"Date / Time", "Date"}, {"Pressure (psi)", "Pressure"}}),
        ctID = Table.AddColumn(#"Renamed Columns1", "ID", each "C", type text),
    
    //Combine tables and sort
        sorted = Table.Sort(Table.Combine({baroID, ctID}),{"Date", Order.Ascending}),
    
    //group with 2 minute separation max
           #"Group by Time" = Table.Group(sorted,{"Date"},{{
            "All", (t)=> if Table.RowCount(t)=1 or List.Count(List.Distinct(t[ID])) = 1  then null
                        else if Table.RowCount(t)>2 then Error.Record("Too many matching rows")
                else let 
                        sortBC = Table.Sort(t,{"ID", Order.Ascending}),
                        Baro = Record.RemoveFields(Record.RenameFields(t{0},{
                                {"Date","Barologger DateTime (24hr)"},{"Pressure","Barologger LEVEL (cm)"}}),{"ID"}),
                        CT = Record.RemoveFields(Record.RenameFields(t{1},{
                                {"Date","CT2X Date / Time (24 hour)"}, {"Pressure","CT2X Pressure (psi)"}}),{"ID"})
                    in 
                        Record.Combine({Baro,CT}),
                    type [#"Barologger DateTime (24hr)"=datetime,#"Barologger LEVEL (cm)"=number, 
                            #"CT2X Date / Time (24 hour)"= datetime, #"CT2X Pressure (psi)"=number]
        
        }}, GroupKind.Local,(x,y)=>Number.From(Number.Abs(Duration.TotalMinutes(x[Date]-y[Date]))>2)),
    
    //Remove Date column and null tables
        #"Removed Columns" = Table.RemoveColumns(#"Group by Time",{"Date"}),
        #"Removed Nulls" = Table.SelectRows(#"Removed Columns",each [All] <> null),
        #"Expanded All" = Table.ExpandRecordColumn(#"Removed Nulls", "All", {"Barologger DateTime (24hr)", "Barologger LEVEL (cm)", "CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"}, {"Barologger DateTime (24hr)", "Barologger LEVEL (cm)", "CT2X Date / Time (24 hour)", "CT2X Pressure (psi)"})
    in
        #"Expanded All"