Search code examples
parsingentity-framework-6storagestrongly-typed-dataset

How can I store large amounts of unorganized, related data and organize it as I receive it?


I'm writing a program for a client. The data they send us is essentially information from a relational database that got flattened, resulting in utterly gigantic comma-delimited text files that consist of extremely redundant information with only a few fields changing per line.

I am reading this into a typed dataset and essentially organizing the data I'm getting into the third normal form, which drastically cuts down on the sheer amount of redundancy. From there, I convert the data in the dataset to XML and send it off to another program to create forms and statements.

However, I'm wondering if there's a better way to go about this. It might not be as bad as I think it is, but I can't shake the feeling that there's a better, faster way to do this. The important thing is that the data is organized and easily understood, and that it is constraint-checked and validated before I convert it to XML.

Since none of the data needs to persist (in fact, it shouldn't), an actual RMDB didn't seem worth it if I was just going to end up clearing it after every use.

The program also needs to run in a myriad of environments; my workstation is Windows 7 64-bit, the testing server is Windows XP 32-bit, and the production server is Windows 7 64-bit or 32-bit depending on which server it's going on.


Solution

  • IMHO then I would start off with SQL Express - it's designed to work its way through those kinds of data volumes, and will adapt itself to the different platforms you're running; it's scalable to the bigger versions if necessary; and in SSMS you have a tool for easily examining intermediate results etc., and importing .csv is straightforward. And it's free. For all the above reasons, I would give SQL Express a try and evaluate its real-world performance. Going back to your original question, my opinion fwiw is that this is a reasonable approach; I don't think you're missing anything.