Search code examples
hadoopobject-to-string

Using Hadoop Text Object toString() Method


I understood the difference between String & Text. Difference between Text and String in Hadoop

Question is If we are saying that String maximum storage size is 32767 bytes.

Text t = new Text("Hadoo... 2GB of content");
...
String c = t.toString();

How "c" will hold 2GB of data if it has size limitation?

What am I missing here?


Solution

  • The maximum size of a Java String is not 32k bytes. It is the size needed to store Integer.MAX_VALUE characters, which is 2^31 - 1 (~2 Billion), which is around 4GB (see this post).

    The post that you mention, refers to the size limit of the deprecated UTF-8 class, not Java's String class.

    Anyway, if you need so much space for a single Text instance, I would advise you to reconsider your algorithm. As Peter Lawrey says in the afforementioned post "I suspect all the works of J K Rowling would fit into one string."