How to find the rank of an element in a stream of numbers efficiently?

Recently I'm trying to find the median of a stream of numbers with the following conditions:

3-pass algorithm
O(nlog(n)) time
O(sqrt(n)) space

The input is repeated 3 times, including n, the number of integers, followed by n integers a_i such that:

n is odd
1≤n≤10^7
|a_i| ≤ 2^{30}

The format of an input data is shown as follows:

My code so far is shown as follows:

#ifdef STREAMING_JUDGE
#include "io.h"
#define next_token io.next_token
#else
#include<string>
#include<iostream>
using namespace std; 
string next_token()
{
    string s;
    cin >> s;
    return s;
}
#endif

#include<cstdio>
#include<cstdlib>
#include<vector>
#include<algorithm>
#include<iostream>
#include<math.h>

using namespace std;

int main()
{
    srand(time(NULL));
    //1st pass: randomly choose sqrt(n) numbers from the given stream of numbers
    int n = atoi(next_token().c_str());
    int p = (int)ceil(sqrt(n));
    vector<int> a;
    for(int i=0; i<n; i++)
    {
        int s=atoi(next_token().c_str());
        if( rand()%p == 0 && (int)a.size() < p )
        {
            a.push_back(s);
        }
    }
    sort(a.begin(), a.end());
    //2nd pass: find the k such that the median lies in a[k] and a[k+1], and find the rank of the median between a[k] and a[k+1]
    next_token();
    vector<int> rank(a.size(),0);
    for( int j = 0; j < (int)a.size(); j++ )
    {
        rank.push_back(0);
    }
    for( int i = 0; i < n; i++ )
    {
        int s=atoi(next_token().c_str());
        for( int j = 0; j < (int)rank.size(); j++ )
        {
            if( s<=a[j] )
            {
                rank[j]++;
            }
        }
    }
    int median = 0;
    int middle = (n+1)/2;
    int k;
    if( (int)a.size() == 1 && rank.front() == middle )
    {
        median=a.front();
        cout << median << endl;
        return 0;
    }
    for( int j = 0; j < (int)rank.size(); j++ )
    {
        if( rank[j] == middle )
        {
            cout << rank[j] << endl;
            return 0;
        }
        else if( rank[j] < middle && rank[j+1] > middle )
        {
            k = j;
            break;
        }
    }
    //3rd pass: sort the numbers in (a[k], a[k+1]) to find the median
    next_token();
    vector<int> FinalRun;
    if( rank.empty() )
    {
        for(int i=0; i<n; i++)
        {
            a.push_back(atoi(next_token().c_str()));
        }
        sort(a.begin(), a.end());
        cout << a[n>>1] << endl;
        return 0;
    }
    else if( rank.front() > middle )
    {
        for( int i = 0; i < n; i++ )
        {
            int s = atoi(next_token().c_str());
            if( s < a.front() )  FinalRun.push_back(s);
        }
        sort( FinalRun.begin(), FinalRun.end() );
        cout << FinalRun[middle-1] << endl;
        return 0;
    }
    else if ( rank.back() < middle )
    {
        for( int i = 0; i < n; i++ )
        {
            int s = atoi(next_token().c_str());
            if( s > a.back() )  FinalRun.push_back(s);
        }
        sort( FinalRun.begin(), FinalRun.end() );
        cout << FinalRun[middle-rank.back()-1] << endl;
        return 0;
    }
    else
    {
        for( int i = 0; i < n; i++ )
        {
            int s = atoi(next_token().c_str());
            if( s > a[k] && s < a[k+1] )  FinalRun.push_back(s);
        }
        sort( FinalRun.begin(), FinalRun.end() );
        cout << FinalRun[middle-rank[k]-1] << endl;
        return 0;
    }
}

But I still cannot reach the O(nlogn) time complexity. I guess that the bottleneck is in the ranking part (i.e. finding the rank of the median in (a[k], a[k+1]) by finding the ranks of the sampled a[i]'s in the input stream of numbers.) in the 2nd pass. This part has O(nsqrt(n)) in my code.

But I have no idea about how to improve the efficiency of ranking...... Is there any suggestion for efficiency improvement? Thanks in advance!

Further explanation of "rank": the rank of a sampled number calculates the number of numbers in the stream less than or equal to the sampled number. For instance: In the input given as above, if the numbers a[0]=2, a[1]=4, and a[2]=5 are sampled, then rank[0]=2 because there are two numbers (1 and 2) in the stream less than or equal to a[0].

Thanks for all of your help. Especially @alexeykuzmin0 's suggestion can indeed speed up the 2nd pass to O(n*logn) time. But there is a remaining issue: In the 1st pass, I sample the numbers with the probability 1/sqrt(n). When there is no number sampled (the worst case), the vector a is empty, causing that the following passes cannot be executed (i.e., a segmentation fault (core dumped) occurs). @Aconcagua, what do do mean "select all remaining elements, if there aren't more than required any more"? Thanks.

Solution

You right, your second part works in O(n√n) time:

for( int i = 0; i < n; i++ )                    // <= n iterations
  ...
    for( int j = 0; j < (int)rank.size(); j++ ) // <= √n iterations

To fix this, we need to get rid of the inner loop. For example, instead of directly calculating amount of elements of initial array that are less than a threshold, we could first calculate amount of elements of the array falling into each interval:

// Same as in your code
for (int i = 0; i < n; ++i) {
    int s = atoi(next_token().c_str());
    // Find index of interval in O(log n) time
    int idx = std::upper_bound(a.begin(), a.end(), s) - a.begin();
    // Increase the rank of only that interval
    ++rank[idx];
}

And then calculate ranks of your threshold elements:

std::partial_sum(rank.begin(), rank.end(), rank.begin());

The resulting complexity is O(n log n) + O(n) = O(n log n).

Here I used two STL algorithms:

std::upper_bound which finds a first element in a sorted array which is strictly greater than given number in logarithmic time, using binary search method.
std::partial_sum which calculates partial sums of an array given in a linear time.