I want to manually analyze bug reports of three large software projects. Total bug reports of the three projects are 10,000, 12,000, and 8000. I need to examine bug reports, comments, and bug fixing files. Manually analyze all bug reports are a time-consuming and difficult task. For these reasons, I would like to take a sample of bug reports from each project. Would you please suggest me how many bugs reports from each project should I analyze to make a representative sample size.
It depends on the following two things:
Confidence level: It tells you how sure you can be. The 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95% confidence level.
Confidence interval (margin of error): It is the plus-or-minus figure that is an acceptable deviation from the actual result. Most researchers use the 5% confidence interval.
Therefore, you can use a 95% confidence level and 5% confidence interval to generate your sample size.
For example,
The population size of project A=10,000
Confidence Level = 95%
Confidence Interval =5%
So, representative sample size=370 (That means you should analyze 370 bug reports for project A)
I usually use the sample size calculator to calculate sample size. (https://www.surveysystem.com/sscalc.htm#one)