Search code examples
entity-frameworkdatabase-designlinq-to-xmldata-access-layer

Using XML field Vs. creating a table for unstable organization


I am in the middle of design an application to issue and store invoices for an organization. The problem is the organization is not stable at all. There are many types of invoices and they may increase and change.

First, I tried to use tables in my DAL, one table to store invoices, one for invoice fields, and one for invoice field values. The problem was, this way needs Reflection to detect fields later, and this may slow the application when invoice contains plenty items.

Second, I tried to keep core of invoice data as two tables: invoices and invoice items. Other fields are fully catchable from other tables. I mean, Business Layer should provide different results per invoice type. It should choose the right queries, and process the results according to the type of invoice. Two problems with this solution are:

  1. I still have lots of join between other tables when I want to show an invoice, for per invoice item. I should recalculate everything every time for every invoice item.

  2. What if the invoice items table doesn’t support a new type of invoice? Then I probably should add a new table to store that kind of invoices items.

  3. My client asked me to keep all data related to an invoice, they do not want me to recalculate it every time. They want something like a snapshot of related data at the time invoice is creating.

Now, what I’m think can I use the benefit of XML in table? I can save the invoice, with whatever fields as xml.

  • I can save the different version of their invoices.
  • On changes, I only update my Business.dll and no DAL change required.
  • Linq to XML is not slow.

What do you suggest?


Solution

  • It is common that invoice system must take snapshot of invoice and all its relations. The reason are changes in company. For example:

    1. If product's price change it must not affect already processed invoices
    2. If product is discontinued it must not affect already processed invoices
    3. If client change its address it must not affect already processed invoices
    4. etc.

    Invoice always must show data in the system state from the date the invoice was created. You can achieve that either by some level of denormalization for invoice and invoice items tables or by massive of every record related to invoice.

    Now, what I’m think can I use the benefit of XML in table? I can save the invoice, with whatever fields as xml.

    Yes you can but for entity framework this field will be just a string because it doesn't understand SQL XML data type.

    Linq to XML is not slow.

    Linq-to-XML doesn't matter because Linq-to-XML is .NET feature for querying loaded XML document. It will querying XML documents stored in SQL server unless you load all invoices from your SQL server to your application and make Linq-to-XML query in memory of your application. That is slow, actually that is killer of your application.

    To query XML data in SQL server you must use native SQL with its XML features (XPath, XQuery). It is question for SQL guys how will XML and querying XML affect overall performance of your queries and how much does it suite to your needs.

    On changes, I only update my Business.dll and no DAL change required.

    Hardly. You will have to update your DAL as well because all your SQL queries querying your XMLs will be placed there. Unless you will have some very complex data structure for configuring your application for new invoice type (including complete UI configuration because new fields can have new rules and validations, it can be combo boxes filled from other data source, etc.) you will have to update UI as well.

    What are your options?

    • Use XML and remove EF
    • Use EF and architect your application so that every time new invoice type is created you will have to add small piece of code to DAL, BLL and UI to support it. There can be for example criteria defined that application must be architected in way which will allow new invoice type to be added within 5-10 MD. I used this way in reverse type of application where orders were processed (but at the end we found that it is not possible to fulfill business needs in allowed time because every new order type contained so many new requirements including integration with new external data sources that it required separate project release).
    • Use EF with virtual extensibility. Each your invoice and invoice line item will contain X additional fields (you can use different types for them or just a string - it depends on your expectation). These fields will be used for customization. You will also need some configuration tables which will say which invoice types uses which fields, what are validations for these fields and what are UI names and order of these fields. I saw this system successfully used with 5 string fields in invoice and 5 string fields in invoice line.
    • Use EF and fixed tables for invoice and invoice lines containing only data shared among all invoice types. Use additional tables for containing additional fields and configuration tables to describe field types. In your application don't use reflection but instead work with additional fields as with dictionary (key, value collection). This is quite common structure for simple extensibility.

    Neither of former cases will solve all problems you can have with new invoice type. For example if new field is some relational dependency on other field, advanced business validation or requires some external data it will really not be solved.

    Heaving fully universal system accepting any type of invoice they ever create is possible but the complexity and price of that system will be minimally 10x higher. It will also affect time to deliver the application. The complexity will also have impact on the usability because application will need very careful UX design to be easy to use.

    This is typical failure in project management where price of requirements is not communicated correctly and so the expectations from the project is much bigger than budget allows. Also developing the project for business which doesn't have optimized business processes and just follows chaos is nightmare.

    I saw projects where expectations was to have one application now which will serve (without any additional development) all possible future cases without knowing how these case will be defined. Sure the budget was always just enough to deliver currently known cases. All these projects failed this requirement.

    Btw. to make sure that you covered as much cases as you can you should not work with all your current invoice types from start. Just let your analyst to collect minimal set of fields for every invoice type and work with them. All real invoice types must be configured through your system instead of hardcoded.