Search code examples
javastringcharstringbuilderbiginteger

Access a character in a string that has more than 2^31 characters


I've been trying to access a characters from a string that has a 1 Trillion characters. I am using BigInteger for getting the index number from a string and charAt method to access a character from the String.

What I'm trying to do is to count the occurrence of a specific character in a String given.

For example, String: aaaaaaa...up to 1 trillion characters of 'a'. Then I will count the occurrence of character 'a' (the given character to count)

How can I access the characters from a string that has more than 2,147,483,647 (2^31)? Is there any other way on doing this?

Snippet of code:

BigInteger String_Length = BigInteger.valueOf(n); //1,000,000,000,000
    BigInteger Occurence = BigInteger.valueOf(0);

    StringBuilder sb = new StringBuilder();
    char c; 

    for(BigInteger First_Counter = BigInteger.valueOf(0); First_Counter.compareTo(String_Length) <= 0; First_Counter = First_Counter.add(BigInteger.ONE)){
        for(BigInteger Char_Counter = BigInteger.valueOf(0); Char_Counter.compareTo(String_Length) <= 0; Char_Counter = Char_Counter.add(BigInteger.ONE)){
            c = s.charAt(Char_Counter);
            c = sb.append(c);
        }
    }

    for(BigInteger Second_Counter = BigInteger.valueOf(0); Second_Counter.compareTo(String_Length) <= 0; Second_Counter = Second_Counter.add(BigInteger.ONE)){
        c = sb.charAt(Second_Counter); 

        if(c == 'a')
            Occurence = Occurence.add(BigInteger.ONE);

    }

Errors

  1. error: no suitable method found for charAt(BigInteger) c = s.charAt(Char_Counter);
  2. method CharSequence.charAt(int) is not applicable (argument mismatch; BigInteger cannot be converted to int)
  3. error: incompatible types: StringBuilder cannot be converted to char c = sb.append(c);

Solution

  • java.lang.String is not an appropriate for very large strings. BigInteger would not be an appropriate type to index strings larger than 2^31 - use long.

    Construction of a string requires the data already in memory, so two copies. Further buffers will typically be resized prior to construction, which requires at least twice the actual data size plus any extra capacity in the buffer.

    Further the internal storage of String characters (typically char but others are available) may not be appropriate for your data. Also, do you want to be creating a BigInteger object together with internal array to access each index.

    Even java.nio uses int to index its buffers (currently).

    So you'll want to write your own BigString indexed by long, backed by an array of arrays. Or more likely an array (or List) of memory mapped NIO buffers.