I am trying to use test-driven development for an application that has to read a lot of data from disk. The problem is that the data is organized on the filesystem in a somewhat complex directory structure (not my fault). The methods I'm testing will need to see that a large number of files exist in several different directories in order for the methods to complete.
The solution I'm trying to avoid is just having a known folder on the hard drive with all the data in it. This approach sucks for several reasons, one reason being that if we wanted to run the unit tests on another computer, we'd have to copy a large amount of data to it.
I could also generate dummy files in the setup method and clean them up in the teardown method. The problem with this is that it would be a pain to write the code to replicate the existing directory structure and dump lots of dummy files into those directories.
I understand how to unit test file I/O operations, but how do I unit test this kind of scenario?
Edit: I will not need to actually read the files. The application will need to analyze a directory structure and determine what files exist in it. And this is a large number of subdirectories with a large number of files.
This has a Python perspective. You may not be working in Python, but the answer more-or-less applies to most languages.
With unit testing with any external resource (e.g. the os
module) you have to mock out the external resource.
The question is "how do mock out os.walk
?" (or os.listdir
or whatever you're using.)
Write a mock version of the function. os.walk
for example. Each mocked-out version returns a list of directories and files so that you can exercise your application.
How to build this?
Write a "data grabber" that does os.walk
on real data and creates a big-old flat list of responses you can use for testing.
Create a mock directory structure. "it would be a pain to write the code to replicate the existing directory structure" isn't usually true. The mocked directory structure is simply a flat list of names. There's no pain at all.
Consider this
def setUp( self ):
structure= [
"/path/to/file/file.x",
"/path/to/another/file/file.y",
"/some/other/path/file.z",...
]
for p in structure:
path, file = os.path.split( p )
try:
os.makedirs( path )
except OSError:
pass
with open( p, "w" ) as f:
f.write( "Dummy Data" )
That's all that's required for setUp
. tearDown
is similar.