i have the following problem:
In Power Query i can bring together multiple csv files with power query, with the function to load from a folder. Unfortenatuly, the timeseries data we get is overlapping, with changed values for the entire timeseries. So for example we get a file for the last 30 days every monday. The values in the last 30 days do change, so the values from my report last week might not be up to date anymore. My goal is to overwrite the old data from last week with the new data from this week.
Example:
File A
Date Item Hits Solved
01.01.2018 A 100 50
01.01.2018 B 138 65
02.01.2018 A 124 70
...
07.01.2018 A 143 50
07.01.2018 B 147 57
...
30.01.2018 A 250 124
File B
Date Item Hits Solved
07.07.2018 A 143 80
07.07.2018 B 147 95
...
06.02.2018 A 341 148
The data from file A should be updated with the new information from File B, keeping only the values from (in this case) the 01.01.2018 to the 06.01.2018 from file A. The rest should be from file B.
Until now i tried to give my files numbers (the newer the higher, so file A -> 1, file B -> 2), making a new (key) column with Date+Item, sorting (descendant) by file number and then removing duplicates in the key column. This sometimes works (only keeping the most recent entries), but sometimes it does not. Removing duplicates from the Date Column does not work for me. Here it just remove the newest entries, keeping the old.
Is there a better way to solve this problem?
Thanks alot!
Go the step where you are sorting the data based on your file order and put it inside Table.Buffer(), so it would look like
= Table.Buffer(Table.Sort(BLAH BLAH TABLE SORTING CODE))
I don't remember the technical reasons, but for purposes here this resets the "internal" sort order PQ uses when running remove duplicates to follow the visual sort order of the table.
Also it's not necessary to create a key column, you can just select your Item + Date Columns and remove duplicates and it will accomplish the same thing.