I have a method which reads files from Hdfs and trying to test this method
I first tried the HDFSMini cluster without any success. Can this type of methods are testable. If so what dependency are required to test it and how to mock the Hdfs File system locally without installing hadoop. There should be no dependency of hadoop installation. I cannot ask everyone who thinks to test to install hadoop.
def readFiles(fs: FileSystem,path: Path): String = {
val sb = new mutable.StringBuilder()
var br : BufferedReader =null
var line : String = ""
try{
if(fs.exists(path)){
if(fs.isFile(path)){
br = new BufferedReader(new InputStreamReader(fs.open(path)))
while ((line = br.readLine()) != null)
sb.append(line.trim)
} else {
throw new InvalidPathException(s"${path.toString} is a directory, please provide the full path")
}
}else {
throw new InvalidPathException(s"${path.toString} is an invalid file path ")
}
} catch {
case e: Exception => throw e
} finally {
if (br != null){
try {
br.close()
} catch {
case e: Exception => throw e
}
}
}
sb.toString
}
When dealing with org.apache.hadoop.fs.FileSystem (same goes for Spark) I usually store test data files in:
src/test/resources
For instance
src/test/resources/test.txt
Which is accessible by the local org.apache.hadoop.fs.FileSystem using the path relative to the root of your project, i.e. "src/test/resources/test.txt":
test("Some test") {
val fileSystem = FileSystem.get(new Configuration())
val fileToRead = new Path("src/test/resources/test.txt")
val computedContent = readFiles(fileSystem, fileToRead)
val expectedContent = "todo"
assert(computedContent === expectedContent)
}