The Campaign to Abolish Ordinal Variables

In stats classes, we are taught that there are 4 types of variable, (and mysteriously given more than 4 names for them):

  • Categorical (aka Nominal): categories, groups or labels (eg. dog, cat, goat, parrot)
  • Ordinal: ordered values (eg single cream, double cream)
  • Interval: numbers where doing addition and subtraction makes sense (eg age – in 5 years time my age will be what it is now plus 5)
  • Ratio: numbers where doing addition/subtraction and also multiplication/division make sense, so zero corresponds to nothing and negative numbers mean something different from positive numbers (eg my bank balance – the sign positive/negative tells you whether I am owed money by the bank or vice versa)

Then, having got that, ratio simply disappears never to be seen or heard of again. This is a loose-end in statistics teaching and loose-ends are highly undesirable.

The difference between ordinal and interval sounds like it matters, it is certainly the case that combining single and double cream doesn’t make triple cream. But…

The difference is a bogus fact and bogus facts are highly undesirable. It is bogus in two ways and so we can dispense with Ordinal, meaning we can work with justĀ  Categorical or Interval (labels or quantities). I hope you will agree that 2 is simpler than 4. It is admirably lazy. Admirably lazy is highly desirable.

Bogus because:

  1. Reason number 1: no interesting quantity is ever truly interval. Take age and think about these two things. My experience is that time speeds up as I have got older so the experience of 5 years when I was a teenager (last century) was much longer than now I am a pensioner. So teenager+5 isn;t the same interval in age as pensioner+5. Then second, right now add 5 years to my age and my health status will be much reduced (eg COVID-19), but add 5 years to my age as a teenager and nothing has changed: suddenly at my end of life, 5 years really matter.
  2. Reason number 2: in statistics, whether you treat some quantity as ordinal or interval doesn’t normally matter (although people will tell you otherwise…).

There are another two important things to be said. First, doing statistics with Interval variables is more precise and more general. Numbers matter. If you were given an exam grade as a number (69), that is more meaningful to you, than if you are told that you came 19th in the class. Second, there is a much more interesting distinction lurking beneath the surface about what might count as a typical value of something: what is the typical value for an exam grade?

In practical terms a choice of ordinal or interval variables determines whether you do the statistics on medians (ordinal) or means (interval). The mean is an average, the median is a middle value. This difference is interesting because the mean and the median are rarely the same. If I join a group of young people, then the median age of the group is probably not changed, but my extreme age means that the mean age will go up. So median age and mean age have different meanings. That’s an interesting choice to make.

 

Leave a Reply

Your email address will not be published. Required fields are marked *