Data Cube – Data Commons Scotland

Working on his human geography homework, Rory asks…

Which areas in Scotland are reducing their household waste?

This week, in a step towards supporting the above scenario, I investigated how we might generate choropleths to help us visualise the variations in the amounts of household-generated waste across geographic areas in Scotland.

The cube-to-chart executable notebook steps through the nitty-gritty of this experiment. The steps include:

1. Running a SPARQL query against statistics.gov.scot’s very useful data cubes to find the waste tonnage generated per council citizen per year.
2. For each council area, derive the 3 values:
  - recent – 2018’s tonnage of waste generated per council citizen.
  - average – 2011-2018’s average (mean) tonnage of waste generated per council citizen.
  - trend – 2011-2018’s trend in tonnage of waste generated per council citizen. Each trend value is calculated as the gradient of a linear approximation to the tonnage over the years. (A statistician might well suggest a more appropriate method for computing this trend value.)
  The derived data can be seen in this file.
3. Use Vega to generate 3 choropleths which help visualise the statistical values from the above step, against the council-oriented geography of Scotland. (The geography data comes from Martin Chorely’s good curation work.)

The resulting choropleths can be seen on >> this page <<

Rory looks at the “2011-2018 trend in tonnage” choropleth, and thinks…

It’s good to see that most areas are reducing waste generation but why not all…?

Looking at the “2018 tonnage” and 2011-2018 average tonnage” choropleths, Niamh wonders…

I wonder why urban populations seem to generate less waste than rural ones?

Kudos to Stirling Council for being the only Scottish local authority to have published household waste collection data as open data. This data is contained in their waste-management dataset. It consists of:

Core data, per year CSV files.
Metadata that includes a basic schema for the CSV files, maintenance information and a descriptive narrative.

For that, Stirling Council have attained 3 stars on this openness measure.

To reach 5 stars, that data would have to be turned into linked open data, i.e. gain the following:

URIs denoting things. E.g. have a URI for each waste type, each collection route and each measurement.
Links to other data to provide context. E.g. reference commonly accepted identifiers/URIs for dates, waste types and route geographies.

This week I investigated aspects of what would be involved in gaining those extra two stars.

This executable notebook steps through the nitty-gritty of doing that. The steps include:

Mapping the data into the vocabulary for the statistical data cube structure – as defined by the W3C and used by the Scottish government’s statistic office.
Mapping the date values to the date-time related vocabulary – as defined by the UK government.
Defining placeholder vocabularies for waste type and collection routes. Future work would be to: map waste types to (possibly “rolled-up” values) in a SEPA defined vocabulary; and map collection routes to a suitable geographic vocabulary.
Converting the CSV source data into RDF data in accordance to the above mappings. This results in a set of .ttl – RDF Turtle syntax – files.
Loading the .ttl files into a triplestore database so that their linked data graph can be queried easily.
Running a few SPARQL queries against the triplestore to sanity-check the linked data graph.
Creating an example infographic (showing the downward trend in missing bins) from the linked data graph:

Conclusions

It took a not insignificant amount of consideration to convert the 3-star non-linked data to (almost) 5-star linked data. But I expect that the effort involved will tail off if we similarly converted further datasets, because of the experience and knowledge gained along the way.
Having a linked data version of the waste-management dataset promises to make its information more explicit and more compostable. But for the benefits to be fully realised, more cross-linking needs to be carried out. In particular, we need to map waste types to a common (say, SEPA controlled) vocabulary; and map collection routes to a common geographic vocabulary.
We might imagine that if such a linked dataset were to be published & maintained – with other local authorities contributing data into it – then SEPA would be able to directly and constantly harvest its information so, making period report preparation unnecessary.
JimT and I have discussed how the Open Data Phase2 project might push for the publication of linked open data about waste, using common vocabularies, and how our Data Commons Project could aim to fuel its user interface using that linked open data. In order words, the linked open data layer is where the two project meet.

Tag: Data Cube

The geography of household waste generation

Stirling Council’s waste-management dataset as linked open data

Conclusions

Conclusions

Sidebar