Building linked open data about carbon savings

linked open data for carbon savings

We have written a research report which walks through how we might build linked open data (LoD) about carbon savings from dissimilar data sources.

It outlines (using small samples from the datasets) how the data pipeline that feeds our prototype-6 webapp, works.

Building LoD about carbon savings - research report - coversheet

“What are my neighbours putting into their bins?!”

What do households put into their bins and and how appropriate are their disposal decisions?

To help provide an answer to that question, Zero Waste Scotland (ZWS) occasionally asks each of the 32 Scottish councils to sample their bin collections and to analyse their content. This compositional analysis uncovers the types and weights of the disposed of materials, and assesses the appropriateness of the disposal decisions (i.e. was it put into the right bin?).

Laudably, ZWS is considering publishing this data as open data. Click on the image below to see a web page that is based on an anonymised subset of this data.

household waste analysis

The Data Lab MSc data challenge event 2021

With Glasgow City hosting the UN Climate Change conference (COP26) later this year, it was appropriate that this year’s The Data Lab data analysis hackathon (held last week) had the theme “pollution reduction”.

Three organisations provided challenge projects for the hackathon teams: we provided a “waste management” project based on our easier-to-use datasets; Code the City provided an “air quality” project; and Scottish Power an “electric vehicle charging” project.

The hackathon was lead by a young Scottish tech start-up company called Filament. They have an interesting product that is basically a sharable, cloud-hosted Jupyter Notebook.

Each day a new cohort of teams would tackle the project challenges. We helped by answering their questions about our datasets, and by suggesting ideas for investigation.
At the end of each day the teams presented their findings.

It was informative to see how the teams (each with a mix of skills that included programming, data analysis and business acumen) organised themselves for group working, handled the data, and applied learned analysis techniques.

The teams had a relatively short amount of time to work on their projects so having easy to use datasets was a deciding factor in how much they could achieve. Therefore one take-away is clear, and helps substantiate an aim of our DCS project… open data needs to be easy to use, not just be accessible. Making data easier to use for non-experts, opens it to a much wider audience and to much more creativity.

The prototype’s architecture – revised

“Trialling Wikibase for our data layer” described how we evaluated the use of Wikibase as a key implementation component in our bi-layer architecture. The conclusion was that Wikibase, although a brilliant product, does not fit our immediate purpose.

In our revised architecture…​

Wikibase is replaced with (dcs-easier-open-data) a simple set of data files (CSV and JSON) hosted in a public repository (GitHub). These data files are generated by the Waste Data Tool (dcs-wdt). Together, dcs-easier-open-data and dcs-wdt implement the architecture’s data layer.

In the architecture’s revised presentation layer, the webapp reads (CSV/JSON formatted) data from the dcs-easier-open-data respository, instead of reading (via SPARQL) data from the Wikibase.

The prototype’s bi-layered architecture - revised

Stirling’s bin collection data – revisited

Stirling Council set a precedent by being the first (and still only) Scottish local authority to have published open data about their bin collection of household waste.

The council are currently working on increasing the fidelity of this dataset, e.g. by adding spatial data to describe collection routes. However, we can still squeeze from its current version, several interesting pieces of information. For details, visit the Stirling bin collection page on our website mockup.

“How is waste in my area?” – a regional dashboard

Introduction

Our aim in this piece of work is:

to surface facts of interest (maximums, minimums, trends, etc.) about waste in an area, to non-experts.

Towards that aim, we have built a prototype regional dashboard which is directly powered by our ‘easier datasets’ about waste.

The prototype is a webapp and it can be accessed here.

our prototype regional dashboard

Curiosities

Even this early prototype manages to surface some curiosities [1] …​

Inverclyde

Inverclyde is doing well.

Inverclyde’s household waste positions Inverclyde’s household waste generation Inverclyde’s household waste CO2e

In the latest data (2019), it generates the fewest tonnes of household waste (per citizen) of any of the council areas. And its same 1st position for CO2e indicates the close relation between the amount of waste generated and its carbon impact.

…​But why is Inverclyde doing so well?

Highland

Highland isn’t doing so well.

Highland’s household waste positions Highland’s household waste generation Highland’s household waste % recycled

In the latest data (2019), it generates the most (except for Argyll & Bute) tonnes of household waste (per citizen) of any of the council areas. And it has the worst trend for percentage recycled.

…​Why is Highland’s percentage recycled been getting worse since 2014?

Fife

Fife has the best trend for household waste generation. That said, it still has been generating an above the average amount of waste per citizen.

Fife’s household waste positions Fife’s household waste generation

The graphs for Fife business waste show that there was an acute reduction in combustion wastes in 2016.

Fife’s business waste

We investigated this anomaly before and discovered that it was caused by the closure of Fife’s coal fired power station (Longannet) on 24th March 2016.

Angus

In the latest two years of data (2018 & 2019), Angus has noticibly reduced the amount of household waste that it landfills.

Angus' household waste management

During the same period, Angus has increased the amount household waste that it processes as ‘other diversion’.

…​What underlies that difference in Angus’ waste processing?

Technologies

This prototype is built as a ‘static’ website with all content-dynamics occurring in the browser. This makes it simple and cheap to host, but results in heavier, more complex web pages.

  • The clickable map is implemented on Leaflet – with Open Street Map map tiles.
  • The charts are constructed using Vega-lite.
  • The content-dynamics are coded in ClojureScript – with Hiccup for HTML, and Reagent for events.
  • The website is hosted on GitHub.

Ideas for evolving this prototype

  1. Provide more qualitative information. This version is quite quantitative because, well, that is nature of the datasets that currently underlay it. So there’s a danger of straying into the “managment by KPI” approach when we should be supporting the “management by understanding” approach.
  2. Include more localised information, e.g. about an area’s re-use shops, or bin collection statistics.
  3. Support deeper dives, e.g. so that users can click on a CO2e trend to navigate to a choropleth map for CO2e.
  4. Allow users to download any of the displayed charts as (CSV) data or as (PNG) images.
  5. Enhance the support of comparisons by allowing users to multi-select regions and overlay their charts.
  6. Allow users to choose from a menu, what chart/data tiles to place on the page.
  7. Provide a what-if? tool. “What if every region reduced by 10% their landfilling of waste material xyz?” – where the tool has a good enough waste model to enable it to compute what-if? outcomes.

1. One of the original sources of data has been off-line due to a cyberattack so, at the time of writing, it has not been possible to double-check all figures from our prototype against original sources.

A mock-up website for functionality & navigation

Introduction

A prototype website will be one of the outcomes of this research project. The website should help non-experts discover, learn about and understand the open data about waste in Scotland.

To date, we have build a couple of mock-ups [1]:

  1. functionality & navigation mock-up for exploring ideas about functionality and navigation for our eventual website.
  2. look’n’feel mock-up for exploring looks/visual aesthetics.

This document concentrates on the functionality & navigation mock-up…​

The splash page of the functionality & navigation mock-up

Functionality

This mock-up ties together a lot of the elements we’ve been working on:

Data Direct access to download the underlying datasets.
A simple, consistent set of CSV and JSON files.
Maps Interactive, on-map depictions of the information from the datasets.
Data grids with graphs A tool for slicing’n’dicing the datasets and visualising the result as a graph.
To make this easier, this tool will provide useful slicing’n’dicing presets: starting points from which users can explore.
SPARQL A query interface to a semantic web representation of the datasets.
This is unlikely to be of use to our target audience, so we’ll probably remove it from the UI but may use its semantic graph internally.
Articles Themed articles and tutorials that are based on evidence from the datasets.
Uses Asciidoc mark-up to make the articles easy to format.
The articles may incorporate data visualisations that are backed by our datasets.

Navigation

The mock-up provides 3 routes to information:

Themes The clickable blocks on the splash page allows users to explore a waste theme by taking the user to a specific set of of articles and tutorials.
Navbar The menu bar at the top of each page, provides an orthogonal, more ‘functional’ classification of the website’s contents.
Search At present, this is a very basic text & tag search. In the future, a predicative/auto-suggestion search based on a semantic graph of the contents, will be provided.

Users navigation histories may help power a further-reading recommender subsystem.

Architecture

Building this mock-up has required some architectural decisions that may help inform the design of our eventual website.

Static website The mock-up has been implemented as a so-called ‘static website’. This means that page content is not dynamically generated by (or saved to) the server-side. The server-side simply serves ‘static content files’.

Pros Implementation-wise, it is an order of magnitude simpler and more scalable than a ‘dynamic’ website.
There are several good, free, open source ‘static website generators/frameworks’.
Static websites can be served for free on hosting platforms such as GitHub (as used for this mock-up).
Cons It can’t support a whole class of functionality, including user uploads, and on-line content editing.
Computation is forced towards the client-side (i.e. into users’ web browsers) which sometimes can have a negative impact on the speed of the UI.
Off-line updates The content of the website can be updated – just not updated on-line. The website maintainers can add new/edit existing datasets, articles, etc. via off-line means.
For off-line updates to this mock-up we use: (i) WDT – a rough’n’ready software script that helps us to curate the datasets that underlay this mock-up; (ii) Cryogen – a static website generator; (iii) Git – to upload updates to our GitHub hosting service.
Client-side computation Page content is dynamically manipulated (e.g. datasets are slice’n’diced) on the client-side (in users’ web browsers) using JavaScript. This enables, for example, the mock-up’s web pages to take the static content that is served by the server-side, and manipulate it so that it can support interactive data visualisations.
Progress in client-side technology even makes it possible to implement a semantic graph supporting triple store in a web browser!

Conclusion

This mock-up website…​

  • provides concrete test-bed for evolving the functionality & navigation aspects of our eventual website, and
  • forces us to think about architectural trade-offs.

1. We use the term “mock-up” to mean an incomplete representation/model – useful for demonstration, design evaluation and acquiring user feedback.

‘Easier’ open data about waste in Scotland

Objective

Several organisations are doing a very good job of curating & publishing open data about waste in Scotland but, the published data is not always “easy to use” for non-experts. We have see several references to this at open data conference events and on social media platforms:

Whilst statisticians/coders may think that it is reasonably simple to knead together these somewhat diverse datasets into a coherent knowledge, the interested layman doesn’t find it so easy.

One of the objectives of the Data Commons Scotland project is to address the “ease of use” issue over open data. The contents of this repository are the result of us re-working some of the existing source open data so that it is easier to use, understand, consume, parse, and all in one place. It may not be as detailed or have all the nuances as the source data – but aims to be better for the purposes of making the information accessible to non-experts.

We have processed the source data just enough to:

  • provide value-based cross-referencing between datasets
  • add a few fields whose values are generally useful but not easily derivable by a simple calculation (such as latitude & longitude)
  • make it available as simple CSV and JSON files in a Git repository.

We have not augmented the data with derived values that can be simply calculated, such as per-population amounts, averages, trends, totals, etc.

The 10 easier datasets

dataset (generated February 2021) source data (sourced January 2021)
name description file number of records creator supplier licence
household-waste The categorised quantities of the (‘managed’) waste generated by households. CSV JSON 19008 SEPA statistics.gov.scot URL OGL v3.0
household-co2e The carbon impact of the waste generated by households. CSV JSON 288 SEPA SEPA URL OGL v2.0
business-waste-by-region The categorised quantities of the waste generated by industry & commerce. CSV JSON 8976 SEPA SEPA URL OGL v2.0
business-waste-by-sector The categorised quantities of the waste generated by industry & commerce. CSV JSON 2640 SEPA SEPA URL OGL v2.0
waste-site The locations, services & capacities of waste sites. CSV JSON 1254 SEPA SEPA URL OGL v2.0
waste-site-io The categorised quantities of waste going in and out of waste sites. CSV 2667914 SEPA SEPA URL OGL v2.0
material-coding A mapping between the EWC codes and SEPA’s materials classification (as used in these datasets). CSV JSON 557 SEPA SEPA URL OGL v2.0
ewc-coding EWC (European Waste Classification) codes and descriptions. CSV JSON 973 European Commission of the EU Publications Office of the EU URL CC BY 4.0
households Occupied residential dwelling counts. Useful for calculating per-household amounts. CSV JSON 288 NRS statistics.gov.scot URL OGL v3.0
population People counts. Useful for calculating per-citizen amounts. CSV JSON 288 NRS statistics.gov.scot URL OGL v3.0

(The fuller, CSV version of the table above.)

The dimensions of the easier datasets

One of the things that makes these datasets easier to use, is that they use consistent dimensions values/controlled code-lists. This makes it easier to join/link datasets.

So we have tried to rectify the inconsistencies that occur in the source data (in particular, the inconsistent labelling of waste materials and regions). However, this is still “work-in-progress” and we yet to tease out & make consistent further useful dimensions.

dimension description dataset example value of dimension count of values of dimension min value of dimension max value of dimension
region The name of a council area. household-waste Falkirk 32
household-co2e Aberdeen City 32
business-waste-by-region Falkirk 34
waste-site North Lanarkshire 32
households West Dunbartonshire 32
population West Dunbartonshire 32
business-sector The label representing the business/economic sector. business-waste-by-sector Manufacture of food and beverage products 10
year The integer representation of a year. household-waste 2011 9 2011 2019
household-co2e 2013 9 2011 2019
business-waste-by-region 2011 8 2011 2018
business-waste-by-sector 2011 8 2011 2018
waste-site 2019 1 2019 2019
waste-site-io 2013 14 2007 2020
households 2011 9 2011 2019
population 2013 9 2011 2019
quarter The integer representation of the year’s quarter. waste-site-io 4 4
site-name The name of the waste site. waste-site Bellshill H/care Waste Treatment & Transfer 1246
permit The waste site operator’s official permit or licence. waste-site PPC/A/1180708 1254
waste-site-io PPC/A/1000060 1401
status The label indicating the open/closed status of the waste site in the record’s timeframe. waste-site Not applicable 4
latitude The signed decimal representing a latitude. waste-site 55.824871489601804 1227
longitude The signed decimal representing a longitude. waste-site -4.035165962797409 1227
io-direction The label indicating the direction of travel of the waste from the PoV of a waste site. waste-site-io in 2
material The name of a waste material in SEPA’s classification. household-waste Animal and mixed food waste 22
business-waste-by-region Spent solvents 33
business-waste-by-sector Spent solvents 33
material-coding Acid, alkaline or saline wastes 34
management The label indicating how the waste was managed/processed (i.e. what its end-state was). household-waste Other Diversion 3
ewc-code The code from the European Waste Classification hierarchy. waste-site-io 00 00 00 787
material-coding 11 01 06* 557
ewc-coding 01 973
ewc-description The description from the European Waste Classification hierarchy. ewc-coding WASTES RESULTING FROM EXPLORATION, MINING, QUARRYING, AND PHYSICAL AND CHEMICAL TREATMENT OF MINERALS 774
operator The name of the waste site operator. waste-site TRADEBE UK 753
activities The waste processing activities supported by the waste site. waste-site Other treatment 50
accepts The kinds of clients/wastes accepted by the waste site. waste-site Other special 42
population The population count as an integer. population 89800 21420 633120
households The households count as an integer. households 42962 9424 307161
tonnes The waste related quantity as a decimal. household-waste 0 0 183691
household-co2e 251386.54 24768.53 762399.92
business-waste-by-region 753 0 486432
business-waste-by-sector 54 0 1039179
waste-site-io 0 -8.56 2325652.83
tonnes-input The quantity of incoming waste as a decimal. waste-site 154.55 0 1476044
tonnes-treated-recovered The quantity of waste treated or recovered as a decimal. waste-site 133.04 0 1476044
tonnes-output The quantity of outgoing waste as a decimal. waste-site 152.8 0 235354.51

(The CSV version of the table above.)

Waste sites and the quantities of incoming materials

The dataset

SEPA publish a “Site returns” dataset (accessible via their Waste sites and capacity tool) that says…​

  • how many tonnes
  • of each (EWC coded) waste material
  • was moved in or out
  • of each authorised waste site in Scotland.

Here is an extract…​

SEPA Site returns sample

This is impressive, ongoing data collection and curation by SEPA.

But might some of its information be made more understandable to the general public by depicting it on a map?

Towards answering that, we built a prototype webapp. (For speed of development, we considered only the materials incoming to waste sites during the year 2019.)

Data mapping

To aid comprehension, SEPA often sorts waste materials into 33 categories. We do the same in our prototype, mapping each EWC coded waste material into 1 of the 33 categories…​

33 materials, categorised

The “Site returns” dataset identifies waste sites by their Permit/Licence code. We want our prototype to show additional information about each waste site. Specifically, its name, council area, waste processing activities, client types, and location – very important for our prototype’s map-based display!

SEPA holds that additional information about waste sites, in a 2nd dataset: “Waste sites and capacity summary” (also accessible via their Waste sites and capacity tool). Our prototype uses the Permit/Licence codes to cross-reference between the 2 SEPA datasets.

SEPA provides the waste site locations as National Grid eastings and northings. However, it is easier to use latitude & longitude coordinates in our chosen map display technology so, our prototype uses Colantoni’s library to perform the conversion.

The prototype webapp

A ‘live’ instance of the resulting prototype webapp can be accessed here.

Below is an animated image of it…​

our prototype webapp

UI & controls

  • Each pie chart depicts the amounts of materials incoming to a single waste site, or the aggregation of waste sites within a map area.
    • single waste site pie Depicts a single waste site.
    • multiple waste sites pie Depicts an aggregation of 26 waste sites.
  • no pie (I.e. a number without a surrounding pie chart) depicts a waste site with no incoming materials (probably because the site was not operational during 2019).
  • material details pop-up Hovering the cursor over a pie segment will pop-up details about incoming tonnes of the material depicted by the segment.
  • area highlighting Hovering the cursor over a pie that depicts an aggregation will highlight the map area in which the aggregated waste sites are located.
  • waste site pop-up Clicking on a single waste site will pop-up details about that waste site.
  • zoom control The webapp supports the usual zoom and pan controls. The user can also double-click on an aggregation pie to zoom into the area that it covers.
  • attributions Clicking on ‘attributions’ will display a page that credits:

Closing thoughts

But might some of its information be made more understandable to the general public by depicting it on a map?

For any good solution, the answer will be an obvious ‘yes’. But what about for our prototype webapp solution?…​

We think that it could help pique interest in the differences in the amounts & types of waste materials that are being disposed in different areas of the country. For example…​

splash view

Glancing at our prototype’s map (image left; at the default zoom level), the seemingly disproportionate amount of soils & stones coming into north west Scotland waste sites catches our attention.

So we zoom in (right image) to find that almost all of it is accounted for by one landfill site on the the Isle of Lewis.

Bennadrove landfill site

Future work could increase the utility of this prototype webapp by:

  • allowing the user to browse over the time-series aspect of this dataset using a time slider control (like our through time on a map prototype)
  • providing a means to switch the focus of interest from incoming material to: outgoing material, processing activities (landfill, composting, metal recycling, etc.), or facilities offered (household, commercial, special disposals, etc.)
  • supporting filtering over the various dimensions
  • providing the means for a user to open their current data selection in a tool (like our data grid & graph prototype) that allows them to explore the data in more detail.