…you just need to understand why we do it.
…you just need to understand why we do it.
Mostly this post focuses on three specific measure of effect-size. In fact, we will mostly deal with two, but the third is important. Both r (correlation coefficient) and d (Cohen’s d) are broadly familiar to people who use statistics in Psychology. The third, chi-sqr is also familiar but not necessarily in this context.
Cohen’s d, first (d mainly because that letter hadn’t got used for anything else yet). This is a way of giving a standard value for the difference in mean scores between two groups. It is useful because it is standard: I have found that smokers are more interesting than non-smokers and on my own interestingness scale, the difference is 7.42. Is that a lot? Well, you don’t know. If the scale runs from 0 to 10, it’s a lot; if the score runs from 0 to 1000, it’s not worth noticing. However, if I say it has an effect-size d=0.8, you can go and look up how to interpret this and find out this it is quite a big effect. That’s nice. And Cohen’s d is commonly used because of this. Cohen’s d looks at how different the groups are compared to how different the scores are within the groups.
But, suppose I have 3 groups: non-smokers, cigarette smokers and vapers. There isn’t a single number difference between my groups now. There are 3 scores and therefore 3 differences. I could have 3 effect-sizes, but well you know, life is short and we should always try and use single numbers when we can.
Cohen spotted this and invented Cohen’s f (probably short for ffs). A really nice and very old-fashioned way of talking about how different lots of different numbers are form each other, is the standard deviation. So instead of the difference between 2 group scores (d), we can use the standard deviation of the 3 (or more) group scores. The principle is the same as with d, just a slightly different implementation.
Three steps get us somewhere interesting. We do this with the assumption of equal sized groups. If the groups aren’t of equal sizes, then we need a bit more fancy footwork, but the outcome is much the same:
Cohen’s d and his f turn out to be a comparison of model scores and residuals. This diagram shows the idea:
Now we can move very easily on to the older r. This is given by comparing model scores with the DV scores themselves, which we can call the total scores:
r = sd(model_scores)/sd(total_scores)
As you can see, it is really quite similar. One final step finishes this.
We can say that:
total_score = model_score + residual_score
and (since variances add) therefore that:
var(total_score) = var(model_score) + var(residual_score)
var(residual_score) = var(total_score) – var(model_score)
or:
sd(total_score) = sqrt(sd(model_score)^2 + sd(residual_score)^2)
sd(residual_score) = sqrt(sd(total_score)^2 – sd(model_score)^2 )
A bit of playing around with these quantities (for anyone who enjoys algebra) gets us to the point where we can say that:
r^2 = Mvar/(Mvar+Rvar)
f^2 = Mvar/Rvar
so
Rvar = Mvar/f^2
r^2 = Mvar/(Mvar+Mvar/f^2)
we can drop Mvar completely from this (it’s on the top and the bottom of the equation):
r^2 = 1/(1+1/f^2)
and simplified:
r^2 = f^2/(f^2+1)
then, remembering that f = d/2
r^2 = d^2/(d^2+4)
and lastly:
r = d/sqrt(d^2+4)
or:
d = 2r/sqrt(1-r^2)
In stats classes, we are taught that there are 4 types of variable, (and mysteriously given more than 4 names for them):
Then, having got that, ratio simply disappears never to be seen or heard of again. This is a loose-end in statistics teaching and loose-ends are highly undesirable.
The difference between ordinal and interval sounds like it matters, it is certainly the case that combining single and double cream doesn’t make triple cream. But…
The difference is a bogus fact and bogus facts are highly undesirable. It is bogus in two ways and so we can dispense with Ordinal, meaning we can work with just Categorical or Interval (labels or quantities). I hope you will agree that 2 is simpler than 4. It is admirably lazy. Admirably lazy is highly desirable.
Bogus because:
There are another two important things to be said. First, doing statistics with Interval variables is more precise and more general. Numbers matter. If you were given an exam grade as a number (69), that is more meaningful to you, than if you are told that you came 19th in the class. Second, there is a much more interesting distinction lurking beneath the surface about what might count as a typical value of something: what is the typical value for an exam grade?
In practical terms a choice of ordinal or interval variables determines whether you do the statistics on medians (ordinal) or means (interval). The mean is an average, the median is a middle value. This difference is interesting because the mean and the median are rarely the same. If I join a group of young people, then the median age of the group is probably not changed, but my extreme age means that the mean age will go up. So median age and mean age have different meanings. That’s an interesting choice to make.
Here’s a reveal – but please keep it secret. This is very important – if everyone knew what I am about to say, statistics wouldn’t be hard enough.
There are only two statistical tests in Psychology. Everything else is just for pretend. Here’s the real list:
So what about all the others? Well, in BrawStats (and I suspect SPSS, Jamovi, etc etc) the software does just those two tests, but then reports the results as if it had done the “official” test. It hasn’t; it is just pretending.
A bit of an explanation: Psychology has issues. It collects statistical tests, but can’t bring itself to declutter. It’s exactly like internet shopping: it sees a new test and buys it, without asking (i) whether it needs it and (ii) whether it can get rid of older stuff.
The two tests? Think of tests as being about seeing how much of what is going on in the DV can be epxlained by the IV.
That’s it.
Imagine you have a set of data like this:
This is a nice place to be, but the analysis can be quite daunting. In the posts that follow, we will discuss how to approach this.
We will be using BrawStats. See here for details:
BrawStats