I will be having multiple small text files around size of 10KB, got confused where to store those files in HBase or in HDFS. what will be the optimized storage? Because to store in HBase I need to parse it first then save it against some row key. In HDFS I can directly create a path and save that file at that location. But till now whatever I read, it says you should not have multiple small files instead create less big files. But I can not merge those files, so I can't create big file out of small files.
Kindly suggest.
A large number of small files don´t fit very well with hadoop since each file is a hdfs block and each block require a one Mapper to be processed by default.
There are several options/strategies to minimize the impact of small files, all options require to process at least one time small files and "package" them in a better format. If you are planning to read these files several times, pre-process small files could make sense, but if you will use those files just one time then it doesn´t matter.
To process small files my sugesstion is to use CombineTextInputFormat (here an example): https://github.com/lalosam/HadoopInExamples/blob/master/src/main/java/rojosam/hadoop/CombinedInputWordCount/DriverCIPWC.java
CombineTextInputFormat use one Mapper to process several files but could require to transfer the files to a different DataNode to put files together in the DAtaNode where the map is running and could have a bad performance with speculative tasks but you can disable them if your cluster is enough stable.
Alternative to repackage small files are:
If you can not merge files like in option 2 or 3, my suggestion is to go with option 1