I have a commercial performance dataset such as the following:
I am measuring volume growth for group of clients over a period of time. The only effect difference is through which distribution channel (A,B) they reach the market. The clients between each cluster are different (a big retailer either goes to market via A or B, never switching) and quite homogeneous within clusters. The table above is just a summary. I do have the full blown dataset with 2000+ clients and their individual respective growths, clusters, channels, etc. My goal is to establish if there are significant differences in growth rate between channels given a client type, i.e., if channel choice has a bearing in performance. For example, is 9% significantly different to 7% for big retailers.
My initial take was a Two-Sample T-Test (independent samples) taking care that the data groups have equal variance and adjusting accordingly (if yes, using the t-test straight; if not, a Welch’s t-test). As a side note, I'm using python's
I am currently unsure because I've always used the t-test for absolute attributes such as weight, size, speed etc. The fact that I am exploring growth rates now certainly makes me a bit uneasy about its correct usage.
Am I correct in using a t-test? is there a better/correct test?
Yes, that is what I would do. I would not check for equality of variances though, since this is a bit of an overkill. I would use Welch's t-test for everything.
I would, though, first look at the distributions per factor (channel, in your case). If they look normal by eye, use the above t-test. Otherwise, use Mann–Whitney U test.