Effect sizes: relating Cohen’s d to r – Doing Statistics in Psychology

Mostly this post focuses on three specific measure of effect-size. In fact, we will mostly deal with two, but the third is important. Both r (correlation coefficient) and d (Cohen’s d) are broadly familiar to people who use statistics in Psychology. The third, chi-sqr is also familiar but not necessarily in this context.

Cohen’s d, first (d mainly because that letter hadn’t got used for anything else yet). This is a way of giving a standard value for the difference in mean scores between two groups. It is useful because it is standard: I have found that smokers are more interesting than non-smokers and on my own interestingness scale, the difference is 7.42. Is that a lot? Well, you don’t know. If the scale runs from 0 to 10, it’s a lot; if the score runs from 0 to 1000, it’s not worth noticing. However, if I say it has an effect-size d=0.8, you can go and look up how to interpret this and find out this it is quite a big effect. That’s nice. And Cohen’s d is commonly used because of this. Cohen’s d looks at how different the groups are compared to how different the scores are within the groups.

But, suppose I have 3 groups: non-smokers, cigarette smokers and vapers. There isn’t a single number difference between my groups now. There are 3 scores and therefore 3 differences. I could have 3 effect-sizes, but well you know, life is short and we should always try and use single numbers when we can.

Cohen spotted this and invented Cohen’s f (probably short for ffs). A really nice and very old-fashioned way of talking about how different lots of different numbers are form each other, is the standard deviation. So instead of the difference between 2 group scores (d), we can use the standard deviation of the 3 (or more) group scores. The principle is the same as with d, just a slightly different implementation.

Three steps get us somewhere interesting. We do this with the assumption of equal sized groups. If the groups aren’t of equal sizes, then we need a bit more fancy footwork, but the outcome is much the same:

There’s a simple mathematical link between d and f, if you want to see it. If I have two equal sized groups, then the standard deviation of the group mean scores is exactly half of the difference between those means. So:
d = 2 x f
Suppose we give each participant a score which is the mean score for their group. We will call this their model score: it’s the part of their score we think we can explain. Each person then has a model score plus their own personal extra variation from that model – this we will call their residual score: it is the part of their score we can’t explain.
The standard deviation of group means is the same as the standard deviation of model scores across the whole sample.
The bit that Cohen calls the pooled standard deviation is just the standard deviation of the residual scores.
So we have two formulae:
f = sd(model_scores)/sd(residuals)
d = 2 x sd(model_scores)/sd(residuals)

Cohen’s d and his f turn out to be a comparison of model scores and residuals. This diagram shows the idea:

Now we can move very easily on to the older r. This is given by comparing model scores with the DV scores themselves, which we can call the total scores:
r = sd(model_scores)/sd(total_scores)
As you can see, it is really quite similar. One final step finishes this.

We can say that:
total_score = model_score + residual_score
and (since variances add) therefore that:
var(total_score) = var(model_score) + var(residual_score)
var(residual_score) = var(total_score) – var(model_score)
or:
sd(total_score) = sqrt(sd(model_score)^2 + sd(residual_score)^2)
sd(residual_score) = sqrt(sd(total_score)^2 – sd(model_score)^2 )
A bit of playing around with these quantities (for anyone who enjoys algebra) gets us to the point where we can say that:
r^2 = Mvar/(Mvar+Rvar)
f^2 = Mvar/Rvar
so
Rvar = Mvar/f^2
r^2 = Mvar/(Mvar+Mvar/f^2)
we can drop Mvar completely from this (it’s on the top and the bottom of the equation):
r^2 = 1/(1+1/f^2)
and simplified:
r^2 = f^2/(f^2+1)
then, remembering that f = d/2
r^2 = d^2/(d^2+4)
and lastly:
r = d/sqrt(d^2+4)
or:
d = 2r/sqrt(1-r^2)

Leave a Reply Cancel reply