Mostly this post focuses on three specific measure of effect-size. In fact, we will mostly deal with two, but the third is important. Both r (correlation coefficient) and d (Cohen’s d) are broadly familiar to people who use statistics in Psychology. The third, chi-sqr is also familiar but not necessarily in this context.
Cohen’s d, first (d mainly because that letter hadn’t got used for anything else yet). This is a way of giving a standard value for the difference in mean scores between two groups. It is useful because it is standard: I have found that smokers are more interesting than non-smokers and on my own interestingness scale, the difference is 7.42. Is that a lot? Well, you don’t know. If the scale runs from 0 to 10, it’s a lot; if the score runs from 0 to 1000, it’s not worth noticing. However, if I say it has an effect-size d=0.8, you can go and look up how to interpret this and find out this it is quite a big effect. That’s nice. And Cohen’s d is commonly used because of this. Cohen’s d looks at how different the groups are compared to how different the scores are within the groups.
But, suppose I have 3 groups: non-smokers, cigarette smokers and vapers. There isn’t a single number difference between my groups now. There are 3 scores and therefore 3 differences. I could have 3 effect-sizes, but well you know, life is short and we should always try and use single numbers when we can.
Cohen spotted this and invented Cohen’s f (probably short for ffs). A really nice and very old-fashioned way of talking about how different lots of different numbers are form each other, is the standard deviation. So instead of the difference between 2 group scores (d), we can use the standard deviation of the 3 (or more) group scores. The principle is the same as with d, just a slightly different implementation.
Three steps get us somewhere interesting. We do this with the assumption of equal sized groups. If the groups aren’t of equal sizes, then we need a bit more fancy footwork, but the outcome is much the same:
- There’s a simple mathematical link between d and f, if you want to see it. If I have two equal sized groups, then the standard deviation of the group mean scores is exactly half of the difference between those means. So:
d = 2 x f - Suppose we give each participant a score which is the mean score for their group. We will call this their model score: it’s the part of their score we think we can explain. Each person then has a model score plus their own personal extra variation from that model – this we will call their residual score: it is the part of their score we can’t explain.
- The standard deviation of group means is the same as the standard deviation of model scores across the whole sample.
- The bit that Cohen calls the pooled standard deviation is just the standard deviation of the residual scores.
- So we have two formulae:
f = sd(model_scores)/sd(residuals)
d = 2 x sd(model_scores)/sd(residuals)
Cohen’s d and his f turn out to be a comparison of model scores and residuals. This diagram shows the idea:
Now we can move very easily on to the older r. This is given by comparing model scores with the DV scores themselves, which we can call the total scores:
r = sd(model_scores)/sd(total_scores)
As you can see, it is really quite similar. One final step finishes this.
We can say that:
total_score = model_score + residual_score
and (since variances add) therefore that:
var(total_score) = var(model_score) + var(residual_score)
var(residual_score) = var(total_score) – var(model_score)
or:
sd(total_score) = sqrt(sd(model_score)^2 + sd(residual_score)^2)
sd(residual_score) = sqrt(sd(total_score)^2 – sd(model_score)^2 )
A bit of playing around with these quantities (for anyone who enjoys algebra) gets us to the point where we can say that:
r^2 = Mvar/(Mvar+Rvar)
f^2 = Mvar/Rvar
so
Rvar = Mvar/f^2
r^2 = Mvar/(Mvar+Mvar/f^2)
we can drop Mvar completely from this (it’s on the top and the bottom of the equation):
r^2 = 1/(1+1/f^2)
and simplified:
r^2 = f^2/(f^2+1)
then, remembering that f = d/2
r^2 = d^2/(d^2+4)
and lastly:
r = d/sqrt(d^2+4)
or:
d = 2r/sqrt(1-r^2)