I have around 100 files in a folder. Each file will have data like this and each line resembles an user id.
960904056
6624084
1096552020
750160020
1776024
211592064
1044872088
166720020
1098616092
551384052
113184096
136704072
And I am trying to keep on merging the files from that folder into a new big file until the total number of user id's become 10 Million in that new big file.
I am able to read all the files from a particular folder and then I keep on adding the user id's from those files in a linkedhashset. And then I was thinking to see whether the size of hashset is 10 Million and if it is 10 million then write all those user id's to a new text file. Is that feasoible solution?
That 10 million number should be configurable. In future, If I need to change that 10 million 1o 50Million then I should be able to do that.
Below is the code I have so far
public static void main(String args[]) {
File folder = new File("C:\\userids-20130501");
File[] listOfFiles = folder.listFiles();
Set<String> userIdSet = new LinkedHashSet<String>();
for (int i = 0; i < listOfFiles.length; i++) {
File file = listOfFiles[i];
if (file.isFile() && file.getName().endsWith(".txt")) {
try {
List<String> content = FileUtils.readLines(file, Charset.forName("UTF-8"));
userIdSet.addAll(content);
if(userIdSet.size() >= 10Million) {
break;
}
System.out.println(userIdSet);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
Any help will be appreciated on this? And any better way to do the same process?
Continuing from where we left. ;)
You can use the FileUtils
to write the file along with the writeLines()
method.
Try this -
public static void main(String args[]) {
File folder = new File("C:\\userids-20130501");
Set<String> userIdSet = new LinkedHashSet<String>();
int count = 1;
for (File file : folder.listFiles()) {
if (file.isFile() && file.getName().endsWith(".txt")) {
try {
List<String> content = FileUtils.readLines(file, Charset.forName("UTF-8"));
userIdSet.addAll(content);
if(userIdSet.size() >= 10Million) {
File bigFile = new File("<path>" + count + ".txt");
FileUtils.writeLines(bigFile, userIdSet);
count++;
userIdSet = new LinkedHashSet<String>();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
If the purpose of saving the data in the LinkedHashSet
is just for writing it again to another file then I have another solution.
EDIT to avoid OutOfMemory exception
public static void main(String args[]) {
File folder = new File("C:\\userids-20130501");
int fileNameCount = 1;
int contentCounter = 1;
File bigFile = new File("<path>" + fileNameCount + ".txt");
boolean isFileRequired = true;
for (File file : folder.listFiles()) {
if (file.isFile() && file.getName().endsWith(".txt")) {
try {
List<String> content = FileUtils.readLines(file, Charset.forName("UTF-8"));
contentCounter += content.size();
if(contentCounter < 10Million) {
FileUtils.writeLines(bigFile, content, true);
} else {
fileNameCount++;
bigFile = new File("<path>" + fileNameCount + ".txt");
FileUtils.writeLines(bigFile, content);
contentCounter = 1;
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}