Wulf's Webden

The Webden on WordPress

Long-tailed data

| 0 comments

A common pattern in data is the long tail. In other words, a few things come up a lot and a lot more things come up infrequently. For example, if you asked people to name their favourite colour, you’d expect top choices to be things like red, green and blue (probably also black, white and gray – even if you don’t count those as colours, I wouldn’t bet against them being relatively popular choices). You’d then start to get a few instances of colours like orange and, in a big enough sample set, you’d probably have a smattering of additional options like puce (a dark, reddish brown, if you weren’t sure).

I’m pondering this because I maintain a list of songs sung at St Clement’s and I’ve just entered my set from Sunday morning. After getting on for three years, we’re finally beginning to reach the point where once in a while, we have a collection like that where all the songs had previously been used and I didn’t have to put in any new entries on the list of songs. Meanwhile the data displays that same long tail form: a few songs that have seen heavy rotation (in the last year, “Reckless Love (Before I Spoke a Word)” wins out with 16 hits) while there are about 100 that have only been used once apiece during the same period.

Even within my ‘nothing brand new here’ set, there were a couple of songs that we being used for the first time in more than a year and the long tail pattern was discernible if not as pronounced.

Leave a Reply

Required fields are marked *.


This site uses Akismet to reduce spam. Learn how your comment data is processed.