Search code examples
randomopen-source

Representative sample size calculation.


I want to manually analyze bug reports of three large software projects. Total bug reports of the three projects are 10,000, 12,000, and 8000. I need to examine bug reports, comments, and bug fixing files. Manually analyze all bug reports are a time-consuming and difficult task. For these reasons, I would like to take a sample of bug reports from each project. Would you please suggest me how many bugs reports from each project should I analyze to make a representative sample size.


Solution

  • It depends on the following two things:

    Confidence level: It tells you how sure you can be. The 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95% confidence level.

    Confidence interval (margin of error): It is the plus-or-minus figure that is an acceptable deviation from the actual result. Most researchers use the 5% confidence interval.

    Therefore, you can use a 95% confidence level and 5% confidence interval to generate your sample size.

    For example,

    The population size of project A=10,000
    Confidence Level = 95%
    Confidence Interval =5%
    So, representative sample size=370 (That means you should analyze 370 bug reports for project A)
    

    I usually use the sample size calculator to calculate sample size. (https://www.surveysystem.com/sscalc.htm#one)