Search code examples
javaalgorithmsortinglarge-data

Cannot short string array having large strings numbers


I have to sort large data-sets having long strings(length up to 1 million char) containing only digits. Also considering all the strings as large positive numbers only.

I have modified merge-sort code that works really fine for large data sets (with array size of 200000) if the length of string is within 18 (so I could convert it to long number for comparision).

I have also implemented a logic that should work theoretically for any length of string (string of numbers). But there is some glitch in my code that is not allowing to sort array with long (length > 18) strings. I have added a capital-latter comment in that code block below.

Note: the code is successfully executing within a few seconds for all length of data sets and gives not so correct output as shown at the end.

Below is my code :

    package algorithms;
    import java.math.BigInteger;
    import java.util.Scanner;
    public class BigSort {

    static void merge(String arr[], int l, int m, int r)
    {
        int n1 = m - l + 1;
        int n2 = r - m;

        String L[] = new String [n1];
        String R[] = new String [n2];

        for (int i=0; i<n1; ++i)
            L[i] = arr[l + i];
        for (int j=0; j<n2; ++j)
            R[j] = arr[m + 1+ j];

        int i = 0, j = 0;
        int k = l;

        while (i < n1 && j < n2){   
        if (L[i].length() <= R[j].length()){
            if(L[i].length()<=18 && R[j].length() <=18) {
                if(BigInteger.valueOf(Long.parseLong(L[i])).compareTo(BigInteger.valueOf(Long.parseLong(R[j]))) <=0){   
                    //this will convert strings to numbers and compare them.
                    //I have used it just to possibly decrease load of-
                    //comparing each characters for sorting smaller strings.
                    arr[k] = L[i];
                    i++;
                }else{
                    arr[k] = R[j];
                    j++;
                }
            }else{//THIS ELSE PART IS HAVING SOME PROBLEM.
                //if length of string is greater than 18digits
                //it will compare two string character by character to find 
                //the larger string or if they are equal.
                char[] c1 = L[i].toCharArray();
                char[] c2 = R[j].toCharArray();
                int c1leng= c1.length;
                int c2leng= c2.length;
                //int shorter= c1leng < c2leng ? c1leng : c2leng ;
                if(c1leng==c2leng){
                    for(int p=0; p<c1leng; p++){
                    if(c1[p]==c2[p]){
                        if(p == c1leng-1) {
                            arr[k] = L[i];
                            i++;
                            break;
                        }
                        continue;
                    }else if(c1[p]<c2[p]){
                        arr[k] = L[i];
                        i++;
                        break;
                    }else if(c1[p]>c2[p]){
                        arr[k] = R[j];
                        j++;  
                        break;
                    }
                    }
                }else{
                    arr[k] = R[j];
                    j++;     
                }
            }
        }else{
            arr[k] = R[j];
            j++;
        }
        k++;
        }

        while (i < n1){
            arr[k] = L[i];
            i++;
            k++;
        }

        while (j < n2){
            arr[k] = R[j];
            j++;
            k++;
        }
    }


    static void sort(String arr[], int l, int r)
    {
        if (l < r){
            int m = (l+r)/2;
            sort(arr, l, m);
            sort(arr , m+1, r);
            merge(arr, l, m, r);
        }
    }

    static String[] bigSorting(String[] arr) {
        sort(arr, 0, arr.length-1);
        return arr; 
    }

    public static void main(String[] args){
        Scanner in = new Scanner(System.in);
        int n = in.nextInt();
        String[] arr = new String[n];

        for(int arr_i = 0; arr_i < n; arr_i++){
              arr[arr_i] = in.next().trim();
        }

        System.out.println("result is:");
        String[] result = bigSorting(arr);
        for (int i = 0; i < result.length; i++) {
            System.out.print(result[i] + (i != result.length - 1 ? "\n" : ""));
        }
        in.close();
    }
}

these are the inputs that I was using (first row takes number of strings and then follows all the strings to be sorted. output is the sorted number-strings in each line):

input(1) 
10
5454545454
212101225515
51212
5141215
52
521
52145
5
5
5

Output(1)//correct
5
5
5
52
521
51212
52145
5141215
5454545454
212101225515

Input(2)
10
5454545454
212101225515
51212
5141215
52
5465156165164215612616546954512202496421
2121564
216451564561564651564561256065
11
55

Output(2)//incorrect
11
52
55
216451564561564651564561256065
51212
2121564
5465156165164215612616546954512202496421
5141215
5454545454
212101225515

Solution

  • Thanks a lot all of you for clearing my misconception about compareTo method.

    But here in case of Large data set, compareTo method was not feasible (was taking too much time with full CPU utilization) and that was also one of the reason that I had implemented this manual sorting code. As the main problem was in this code (that I wanted to solve and is solved now) I am now accepting my own answer. Thanks a lot @MrSmith42 and @OldCurmudgeon

               else{//THIS ELSE PART IS HAVING SOME PROBLEM.
                    //if length of string is greater than 18digits
                    //it will compare two string character by character to find 
                    //the larger string or if they are equal.
                    char[] c1 = L[i].toCharArray();
                    char[] c2 = R[j].toCharArray();
                    int c1leng= c1.length;
                    int c2leng= c2.length;
                    //int shorter= c1leng < c2leng ? c1leng : c2leng ;
                    if(c1leng==c2leng){
                        for(int p=0; p<c1leng; p++){
                        if(c1[p]==c2[p]){
                            if(p == c1leng-1) {
                                arr[k] = L[i];
                                i++;
                                break;
                            }
                            continue;
                        }else if(c1[p]<c2[p]){
                            arr[k] = L[i];
                            i++;
                            break;
                        }else if(c1[p]>c2[p]){
                            arr[k] = R[j];
                            j++;  
                            break;
                        }
                        }
                    }else{
                        arr[k] = L[i]; //here was the problem. I was assigning R[j] instead of L[i] which was pushing larger elements alternatively.
                        i++;       
                    }
                }