I am not skilled in the world of statistics, so I hope this will be easy for someone, my lack of skill also made it very hard to find the correct search terms on this topic so I may have missed my answer in searching. anyway. I am looking at arrays of data, say CPU usage for example. how can i capture accurate information in as few data-points as possible on say, a set of data containing 1-second time intervals on cpu usage over the cores of 1 hr, where the first 30mins where 0% and the second 30 mins are 100%. right now, all i will know in one data-point i can think of is the mean, which is 50%, and not useful at all in this case. also, another case is when the usage graph was like a wave, evenly bouncing up and down between 0-100, yet still giving a mean of 50%. how can i capture this data? thanks.
If I understand your question, it is really more of a statistics question than a programming question. Do you mean, what is the best way to capture a population curve with the fewest variables possible?
Firstly, the assumptions with most standard statistics implies that the system is more or less stable (although, if the system is unstable, the numbers you get will let you know because they will be non-sensical).
The main measures that you need to know statistically are the mean, population size and the standard deviation. From this, you can calculate the rough bell curve defining to population curve, and know the accuracy of the curve based on the scale of the standard deviation.
This gives you a three variable schema for a standard bell curve.
If you want to get in further detail, you can add Cpk, Ppk, which are calculated fields.
Otherwise, you may need to get into non-linear regression and curve fitting which is best handled on a case by case basis (not great for programming).
Check out the following sites for calculating the Cp, Cpk, Pp and Ppk:
http://www.qimacros.com/control-chart-formulas/cp-cpk-formula/
http://www.macroption.com/population-sample-variance-standard-deviation/