From: Turbo-charge Your Report Speed – General Rules & Guidelines
One advantage as the CTO (and creator) at Windward is I have seen just about anything & everything someone can do with reports, document generation, dashboards, etc. And with Windward being free-form, I've probably seen more variations than my opposite number at our competitors. From this perspective I've learned a number of basic do's and don'ts that I think hold across all reporting, docgen, & dashboard systems.
Report Design
- If nothing else follow this rule – do not fight your report designer. If you use a system that assumes banded reports (Crystal, SSRS, etc.) then create a banded report. If you try to work at cross-purposes you will find it difficult, frustrating, and the final design will be a series of compromises.
- Use the appropriate approach. For example with Windward you can design in Word or Excel (or PowerPoint). Yet we have customers designing spreadsheets using Word tables and designing documents with a couple of tables in Excel. When they switch from Word to Excel (or vice-versa), then something that was hard becomes very easy.
- Understand and accept the limitations of your designer. For example, SSRS treats sub-reports as separate reports so you cannot render a sub-report in the context of the report pulling it in. You need to keep this in mind when architecting your template structure.
- K.I.S.S. – Keep It Simple (Stupid). I've had numerous cases where, when diving into the issues a customer had, the core issue was they were not clear about what information they were trying to get across, and what data represented that information. Try to be very clear on what you are presenting and what that information is.
- Pie charts are evil.
System Architecture
- If the reporting engine runs in the context of your application, create multiple threads for multiple simultaneous report requests. Report generation tends to be heavily I/O bound so you get a big win here. Our rule of thumb is have twice as many threads as you have cores on your system. (And Hyper-threading counts as distinct cores for this measure.)
- Buy sufficient memory. If you are generating a 10,000 page report, you need more than ½Gig.
- Watch your dependence on network I/O. If you are pulling in images or sub-reports over the network, that network connection can easily be the gating factor in your speed.
- Run a profiler against your system under load. Generally the largest time hit will be a surprise, often one you can address.
Data Access
- Only pull in the data you need. A SQL select should almost never start with "select * …" because you rarely need every column. Select only the columns you need to display in the report.
- You do not need to select columns you use in your joins, order by, or where criteria (I see this a lot). You may use columns in the select without returning them.
- For a complex join, add it to the database as a view if possible. If that cannot be done, create it as a dataset. A database is optimized to handle complex joins as a view and most reporting systems will build a dataset as a temporary view.
- For XML data, minimize the size of the XML dataset. Using XPath or XQuery requires that the entire XML structure be read in to memory as a DOM or XPath optimized structure. A larger dataset means larger memory usage and more nodes to traverse on a query. (For Windward we've found several hundred megabytes is usually fast, but when you get into gigabytes it starts to have an impact.)
- Don't pull data twice. If you are iterating through rows of data (or nodes in XML), then for each iteration pull all the needed data for that row/node in the main iterator and then access it directly. It should be rare that you have a select statement inside an iterative loop.
- For SQL make sure all columns used for joins and the key parts of a sort or where clause are indexed. (Most people will say all, and disk space is cheap. But if you always sort by last_name, first_name then while last_name must be indexed, first_name should be indexed if space allows. Life is trade-offs.)
- For XML create a schema with data typing. This avoids ambiguity where XPath/XQuery not only have to make assumptions, but they have to do extra processing determining what they should assume.