Let's say you've just finished conducting traditional (moderated, one-on-one) user tests on a website. Naturally, you have noted whether or not each of your subjects managed to complete each assigned task. Perhaps you've even timed how long it took each subject to complete each task. My question is this: in writing up the results of the tests, how should you describe the performance results?
I recently read a report that was loaded with references such as “67% of users” did such-and-so, and “83% of users” did this-or-that. Personally, I think it is a mistake to name precise numbers like this.
User tests as I've described above are not intended to be quantitative/inferential in nature. Rather, they are qualitative, employed to gain insight into how real users interact with a website. If you start naming precise numbers (particularly as percentage points) you're implying the numbers are statistically relevant. They aren't. Generally, user tests are conducted on between 5 and 10 users, not nearly enough to gain statistically significant results.
By implying significance where none exists, you risk destroying your credibility. If anyone reading your report has ever taken a course in statistics, such numbers will jump out at them. They'll know right away that, given the small sample size, the numbers you've quoted can't possibly be statistically significant. And even though your report may be full of valuable insights, everything you have written will be tarnished because – in the reader's view – your findings are suspect.
Naturally, in presenting your results, you need to make reference to user test performance. But I think it's much wiser (and safer) to keep such references broad and conversational. For example:
- “In our tests, only our least web-savvy test subject failed to complete this task in a reasonable time. All others breezed through it.”
- “Half of our subjects failed this task.”
- “Four of our test subjects didn't mind the multimedia presentation on the Home page. But two subjects found it very annoying and indicated that in a real scenario, they'd have left the site immediately.”
Note that in some of the examples above, I have in fact named numbers. But by keeping it conversational and not naming percentages, I'm not implying statistical significance. Not only is this more honest, it's also more credible: nobody can dispute my claims of significance, because I haven't made any.
The bottom line is this: in writing up user test reports, focus on the insights gained. Explain where users stumbled, and why they stumbled. Don't risk putting your recommendations into question by implying statistical significance.