Please have a look at the following code
public void createHash() throws IOException
{
System.out.println("Hash Creation Started");
StringBuffer hashIndex = new StringBuffer("");
AmazonS3 s3 = new AmazonS3Client(new ClasspathPropertiesFileCredentialsProvider());
Region usWest2 = Region.getRegion(Regions.US_EAST_1);
s3.setRegion(usWest2);
strBuffer = new StringBuffer("");
try
{
//List all the Buckets
List<Bucket>buckets = s3.listBuckets();
for(int i=0;i<buckets.size();i++)
{
System.out.println("- "+(buckets.get(i)).getName());
}
//Downloading the Object
System.out.println("Downloading Object");
S3Object s3Object = s3.getObject(new GetObjectRequest("JsonBucket", "Articles_4.json"));
System.out.println("Content-Type: " + s3Object.getObjectMetadata().getContentType());
//Read the JSON File
BufferedReader reader = new BufferedReader(new InputStreamReader(s3Object.getObjectContent()));
while (true) {
String line = reader.readLine();
if (line == null) break;
// System.out.println(" " + line);
strBuffer.append(line);
}
JSONTokener jTokener = new JSONTokener(strBuffer.toString());
jsonArray = new JSONArray(jTokener);
System.out.println("Json array length: "+jsonArray.length());
for(int i=0;i<jsonArray.length();i++)
{
JSONObject jsonObject1 = jsonArray.getJSONObject(i);
//Add Title and Body Together to the list
String titleAndBodyContainer = jsonObject1.getString("title")+" "+jsonObject1.getString("body");
//Remove full stops and commas
titleAndBodyContainer = titleAndBodyContainer.replaceAll("\\.(?=\\s|$)", " ");
titleAndBodyContainer = titleAndBodyContainer.replaceAll(",", " ");
titleAndBodyContainer = titleAndBodyContainer.toLowerCase();
//Create a word list without duplicated words
StringBuilder result = new StringBuilder();
HashSet<String> set = new HashSet<String>();
for(String s : titleAndBodyContainer.split(" ")) {
if (!set.contains(s)) {
result.append(s);
result.append(" ");
set.add(s);
}
}
//System.out.println(result.toString());
//Re-Arranging everything into Alphabetic Order
String testString = "acarus acarpous accession absently missy duckweed settling";
String testHash = "058 057 05@ 03o dwr 6ug i^&";
String[]finalWordHolder = (result.toString()).split(" ");
Arrays.sort(finalWordHolder);
//Navigate through text and create the Hash
for(int arrayCount=0;arrayCount<finalWordHolder.length;arrayCount++)
{
Iterator iter = completedWordMap.entrySet().iterator();
while(iter.hasNext())
{
Map.Entry mEntry = (Map.Entry)iter.next();
String key = (String)mEntry.getKey();
String value = (String)mEntry.getValue();
if(finalWordHolder[arrayCount].equals(value))
{
hashIndex.append(key); //Adding Hash Keys
//hashIndex.append(" ");
}
}
}
//System.out.println(hashIndex.toString().trim());
jsonObject1.put("hash_index", hashIndex.toString().trim()); //Add the Hash to the JSON Object
jsonObject1.put("primary_key", i); //Create the primary key
jsonObjectHolder.add(jsonObject1); //Add the JSON Object to the JSON collection
System.out.println("JSON Number: "+i);
}
System.out.println("Hash Creation Completed");
}
catch(Exception e)
{
e.printStackTrace();
}
}
I am not capable of running this code either in my local machine or in Amazon EC2, I get the following error
I am worried because this "test" is running on 6mb JSON file, while the original file will be terabytes. I am using Linux instance in EC2, but I am not a Linux guy. How can I get rid of this?
You are declaring hashIndex outside of the loop
StringBuffer hashIndex = new StringBuffer("");
...
for(int i=0;i<jsonArray.length();i++) {
hashIndex.append(...);
This means that the StringBuffer keeps getting bigger and bigger as you iterate the buckets until it finally explodes!
I think you meant to declare hashIndex
inside the loop.