I want to store some value in map task into local disk in each data node. For example,
public void map (...) {
//Process
List<Object> cache = new ArrayList<Object>();
//Add value to cache
//Serialize cache to local file in this data node
}
How can I store this cache object to local disk in each data node, because if I store this cache in map function like above, then the performance will be terrible because I/O task?
I mean is there any way to wait for map task in this data node run completely and then we will store this cache into local disk? Or does Hadoop have a function to solve this issue?
Please see below example, the created file will be somewhere under the directories used by NodeManager for containers. This is configuration property yarn.nodemanager.local-dirs in yarn-site.xml, or the default inherited from yarn-default.xml, which is under /tmp
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
// do some hadoop stuff, like counting words
String path = "newFile.txt";
try {
File f = new File(path);
f.createNewFile();
} catch (IOException e) {
System.out.println("Message easy to look up in the logs.");
System.err.println("Error easy to look up in the logs.");
e.printStackTrace();
throw e;
}
}