I have two columns that look like this:
Make/Model Parc
Abarth*500 15
Abarth*500 19
Abarth*500 2
Abarth*500 14
Abarth*500 15
Abarth*500 25
Abarth*500 20
Abarth*500 17
Abarth*500C 12
Abarth*500C 16
Abarth*500C 23
Abarth*500C 18
Abarth*500C 1
Alfa Romeo*145 405
Alfa Romeo*145 431
Alfa Romeo*146 36
Alfa Romeo*146 80
Alfa Romeo*146 121
Alfa Romeo*146 72
Alfa Romeo*146 39
etc...
The sum of the values in the "Parc" column is 10,475,387.
I perform a consolidate on this table to remove the repetitions in "Make/Model" and sum the values in the "Parc" column for each "Make/Model."
Once I perform the consolidation, everything looks as expected:
Make/Model Parc
Abarth*500 127
Abarth*500C 70
Alfa Romeo*145 836
Alfa Romeo*146 348
Alfa Romeo*147 3848
Alfa Romeo*155 29
Alfa Romeo*156 3148
Alfa Romeo*159 573
Alfa Romeo*164 111
etc...
So what's the problem? The new sum of the "Parc" column is 10,554,076! Verification against an alternate method indicates that the column of unique labels is incorrect in seemingly random places. I also do not understand how the total sum could have gone up. I have over 18,000 rows in this dataset so individually checking all of the data would be very time consuming.
You can find the full dataset here: Excel File
What have I tried?
I have tried removing all formatting. Also, these numbers are all integers and the option to "Automatically insert a decimal point" is not checked. I have also tried creating a new column of =INT() values to ensure they are all integers. My results are exactly the same in all cases.
I posted this question yesterday and it was put on hold because it could not be reproduced without the full dataset: Excel consolidate sum yields different column total
EDIT: As Mauricio Noris Freire has shown, you can create a unique item column using advanced filter, and then use SUMIF to solve this problem correctly. I still would like to know why the Excel data tool consolidate does not work as expected.
Your observations are correct. Both Consolidate
and SUMIFS
are being affected by the asterisks in your data. If you remove the asterisks, or, as I did, replace them with a pipe |
character, your totals will match.
You could also get a correct report by using a Pivot Table
Examine, for example the output for the Mercedes*190
In the raw data, we show these lines if we filter for that value:
In the Consolidation report, we see that the total for Mercedes*190 will include both that car, and also the Mercedes*A190. In other words the A190 is being counted twice, because of the wild card issue.
I suspect this is some kind of design flaw.