The Statisticians’ Way

The role of classically trained Statisticians is to answer questions with data and communicate the logic behind the results. Rarely does a statistician attempt to bridge the gap between statistical logic and practical interpretation unless there is a content expert working closely with the team.  The typical method for communicating statistical findings follows a seven step process called Hypothesis Testing.  There are many great places online to learn more about Hypothesis Testing (http://stattrek.com/hypothesis-test/hypothesis-testing.aspx).

Step 1 – General Question: Someone asks a question and wants an answer based on numerical evidence, and expects the closest thing to fact that is humanly possible. The questions may sound like this. Is there an HR problem in the Company? Do I need to hire new people? Why are sales higher in the Northeast? What does the public think of our new product? How can we improve our public image? None of these questions are statistically measurable until translated into research questions.

Step 2 – Research Question: This step involves translating general questions into a series smaller, measurable questions. General Question: Is there an HR problem in the Company? Research Question: How trustworthy are the employees in Company X as measured by the Employee Trustworthiness Scale? Research Question: Is trustworthiness different between genders in Company X using the same measure?

Step 3 – Hypotheses: Statisticians use data to answer questions. Since 100% certainty is not possible, statistical answers are given within a degree of measurable certainty, and written as Hypotheses. Hypotheses are “plausible” explanations among many. For example, “There is no significant difference in Trustworthiness between genders” is a plausible Hypothesis to consider. (I will write more about the mechanics of Hypothesis testing in a future article).

Step 4 – Analysis Plan: You may have many Hypotheses to test. Each Hypothesis may require a unique calculation. And, each calculation may have a unique set of assumptions to consider. A well written analysis plan is essential to understanding and communicating the statistical findings in a way that is relevant to the audience.

Step 5 – Calculate a Statistic: The Hypothesis, type of data, and sample/population size dictates the appropriate statistical test. With hundreds of test to choose from, there really is no magic for knowing what test to use. However, there are several “cheat sheets” available online (I will write more later about the mechanics of Hypothesis testing and how to use calculated statistics).

Step 6 – Compute Probability: The calculated value of a statistical test “alone” is not very informative. The Hypothesis testing process uses the calculated value to make inferences. The values are compared to computed probabilities that form the basis of the conclusion (I will write more about the mechanics of Hypothesis testing and probability in future articles).

Step 7 – Present Results: Presenting statistical results is very different from interpreting results. Presenting results follow a structure that may vary slightly depending on the statistic, but generally looks like this:

1. Chose a Test: ie: t-test
2. Calculate a Result: ie: t(df) = t-value, p = p-value
3. Significant? Yes / No
4. Null Hypothesis: Reject or Not Reject
5. Therefore: There IS or IS NOT a significant difference between two means
6. Conclusion: Make a statement that summarizes all previous steps

Advertisements

What’s the Difference?

Continued from the previous blog (Bad BI & Lying Charts – https://garfieldfisher.wordpress.com/2013/10/31/bi-dashbaords-need-analytics)

The bar chart, “Average Ranking by Top Performance Categories” compares Males and Females on the following characteristics: “Flexibility”, “Performance”, and “Trustworthy”.  The Female bar is blue and the Male bar is brown.

When you look at the relative size of the bars, the difference between genders is dramatic. On the “Trustworthy” scale, Males appear 4 times more Trustworthy than Females. Yet, the calculated difference between the two is only 0.12.

Misleading Bar Chart

  • Are Male employees really four times more trustworthy than Females?
  • Is a difference of 0.12 significant?
  • What decisions would you make based on this chart?

There’s a Test for That

It’s okay to answer “not sure” to each question. And it’s equally okay to want to know.  To find out, a statistician would use a t-test.  In this example, a t-test  would determine if Male employees (average = 2.60) are significantly more Trustworthy than Female employees (average = 2.48), as the bar chart would suggest.

There are many tutorials online that teach t-test. For instance: StatsCast: What is a t-test. (http://youtu.be/0Pd3dc1GcHc) is very good. And without the original data to perform the t-test, I can only suggest that the bar chart is misleading…well very misleading. In fact, it may even be a lie.

t-test defined:  “In simple terms, the t-test compares the actual difference between two means in relation to the variation in the data” (http://www.britannica.com/EBchecked/topic/569907/Students-t-test).