algorithm data-structures segment-tree range-query

CSES Range Query Question: Salary Queries

I'm trying to solve this problem: https://cses.fi/problemset/task/1144/

Given an array of up to 200000 elements, my task is to process up to 200000 queries, which either ask me to update a single value within the array or ask me to find the number of elements between a and b that lie in a given range (for example, a query would ask how many elements from indices 1 to 5 are in the range [2, 3]).

My current idea is to first use index compression on the values in the given array (since the values can be up to 10^9, so keeping a simple occurrence array would exceed storage limits), then keep another array that contains the number of occurrences of each compressed number. Then, processing and updating queries could be done using a sum segment tree.

However, I ran into a problem while trying to implement this approach. I realized that updating a single array value would force me to change the compressed array.

For example, given an array [1, 5, 3, 3, 2], I would define a compression function C such that

C[1] = 0;
C[2] = 1;
C[3] = 2;
C[5] = 3;

Then, the occurrence array would be [1, 1, 2, 1], and processing sum queries would be efficient. However, if I were instructed to update a value, say, change the third element to 4, then that throws everything out of balance. The compression function would have to change to

C[1] = 0;
C[2] = 1;
C[3] = 2;
C[4] = 3;
C[5] = 4;

which would force me to reconstruct my occurrence array, resulting in O(N) update time.

Since N can be up to 200000, my current approach will not work efficiently enough to solve the problem, although I think I have the right idea with index compression. Can somebody please point me in the right direction with my method?

Solution

You have the right idea in using index compression - great thinking! As N is only up to 200000, keeping an occurrence array will at most require 200000 elements for the initial values of the given array, instead of 10^9 array indices.

According to yourself, the problem you face is when you encounter new values during processing queries. You're right; this would throw the occurrence array off-balance and cause updates to have to run in O(N) time. The solution to this problem is just a tiny modification to your current method.

To solve the problem of encountering new values, we can just make sure that we'll never encounter any new values. We can do this by reading in all the queries before constructing the sum segment tree. This will result in a maximum of N + 2*Q unique values, or 600000 in the worst case, which is far enough to construct an occurrence array with the problem's 512MB storage limit. After that, a sum segment tree will be able to answer these queries efficiently.

So in the end, a strategy to solve this problem would be to input every unique number, then construct an index compression function, then use a sum segment tree to efficiently process sum queries.

In the future, remember that in these sorts of query-answering questions, it might be useful to read in ALL the input before precomputation. Good luck with your program.