Is there any tool to generate test data based on specific requirements?
e.g. Size, file type
Your question is quite open ended ...
You can find some data useful for testing in many areas, usually related to natural languages, by searching for the word "corpus".
If want to generate random data, go and hack a Perl script... but :
I have seen many people generating tons of data to test the performance of their code while forgetting to check the result was correct.
If you are lucky enough to have the possibility to do a round trip, like for example with compression / decompression, the random data generation can provide useful and catch corner cases you would not have thought of