Here’s a lightning demo of our proof-of-concept, PASI knowledge graph, being queried using SPARQL.
Annotating data points on our prototype website
On our requirements list is, to weave interest-based navigation maps through our data site. And feedback from the recent SODU 2021 conference, affirmed this:
I like the site’s tools and visualisations, but more needs to be done to help me navigate my path of interest through the prototype website.
In an exploratory step towards fulfilling that requirement, we have annotated some data points with explanations/narrative. The idea is that that these annotations could become waymarks in navigation maps, to guide users between the datapoints which underpin data-based stories. We might even imagine how clicking a ‘next’ button on a waymark would visually ‘fly’ the user to the next datapoint in the story (which is, perhaps, on a different graph or different page). But(!) back to our present, very simple proof-of-concept implementation…
Here’s how the annotations look in our present, proof-of-concept implementation:
Each annotation is depicted by an emoji which is plotted beside a datapoint (on a graph, or in a table). When the user hovers over (or clicks on) an annotation’s emoji, a pop-up will display some informative text.
We want to code annotations just as we would any other dataset – as a straighforward CSV file. So we have built a data-drive annotation mechanism. This has allowed us to specify annotations, as data, in a CSV file like this:
Each annotation record contains
datapoint coordinates which specify the datapoint against which the annotation is to be plotted. The
datapoint coordinates include a
record-type which specifies the dataset against which the annotation is to be plotted. (In this example, the specified dataset
household-waste-derivation-generation is a derived dataset, based on the
This proof-of-concept, data-driven, annotation mechanism has been useful because it has:
given us a model with moving parts to learn from,
provided hints about how annotations can be used to help users understand and navigate the data,
shown us that we need more structure around the naming and storage of derived datasets (and their annotations), and
uncovered the difficultlies of retro-fitting an annotations mechanism into our
prototype-6website. (Annotations are displayed using off-the-shelf Vega-lite tooltips and Bulma CSS dropdowns, but these don’t provide a satisfactory level of placement/control/interactivity. More customised webpage components will be needed to provide a better user experience.)
The Fair Share – the CO2e saved by this university based, reuse store
Discover how many cars worth of CO2e is avoided each year because of this university based, reuse store
The Fair Share is a university based, reuse store. It accepts donations of second-hand books, clothes, kitchenware, electricals, etc. and sells these to students. It is run by the Student Union at the University of Stirling. It meets the Revolve quality standard for second-hand stores.
The Fair Share is in the process of publishing its data as open data. Click on the image below to see a web page that is based on an draft of that work.
Stirling’s bin collection data – revisited
Stirling Council set a precedent by being the first (and still only) Scottish local authority to have published open data about their bin collection of household waste.
The council are currently working on increasing the fidelity of this dataset, e.g. by adding spatial data to describe collection routes. However, we can still squeeze from its current version, several interesting pieces of information. For details, visit the Stirling bin collection page on our website mockup.
“How is waste in my area?” – a regional dashboard
Our aim in this piece of work is:
to surface facts of interest (maximums, minimums, trends, etc.) about waste in an area, to non-experts.
Towards that aim, we have built a prototype regional dashboard which is directly powered by our ‘easier datasets’ about waste.
The prototype is a webapp and it can be accessed here.
Even this early prototype manages to surface some curiosities  …
Inverclyde is doing well.
In the latest data (2019), it generates the fewest tonnes of household waste (per citizen) of any of the council areas. And its same 1st position for CO2e indicates the close relation between the amount of waste generated and its carbon impact.
…But why is Inverclyde doing so well?
Highland isn’t doing so well.
In the latest data (2019), it generates the most (except for Argyll & Bute) tonnes of household waste (per citizen) of any of the council areas. And it has the worst trend for percentage recycled.
…Why is Highland’s percentage recycled been getting worse since 2014?
Fife has the best trend for household waste generation. That said, it still has been generating an above the average amount of waste per citizen.
The graphs for Fife business waste show that there was an acute reduction in combustion wastes in 2016.
We investigated this anomaly before and discovered that it was caused by the closure of Fife’s coal fired power station (Longannet) on 24th March 2016.
In the latest two years of data (2018 & 2019), Angus has noticibly reduced the amount of household waste that it landfills.
During the same period, Angus has increased the amount household waste that it processes as ‘other diversion’.
…What underlies that difference in Angus’ waste processing?
This prototype is built as a ‘static’ website with all content-dynamics occurring in the browser. This makes it simple and cheap to host, but results in heavier, more complex web pages.
- The clickable map is implemented on Leaflet – with Open Street Map map tiles.
- The charts are constructed using Vega-lite.
- The content-dynamics are coded in ClojureScript – with Hiccup for HTML, and Reagent for events.
- The website is hosted on GitHub.
Ideas for evolving this prototype
- Provide more qualitative information. This version is quite quantitative because, well, that is nature of the datasets that currently underlay it. So there’s a danger of straying into the “managment by KPI” approach when we should be supporting the “management by understanding” approach.
- Include more localised information, e.g. about an area’s re-use shops, or bin collection statistics.
- Support deeper dives, e.g. so that users can click on a CO2e trend to navigate to a choropleth map for CO2e.
- Allow users to download any of the displayed charts as (CSV) data or as (PNG) images.
- Enhance the support of comparisons by allowing users to multi-select regions and overlay their charts.
- Allow users to choose from a menu, what chart/data tiles to place on the page.
- Provide a what-if? tool. “What if every region reduced by 10% their landfilling of waste material xyz?” – where the tool has a good enough waste model to enable it to compute what-if? outcomes.
Waste sites and the quantities of incoming materials
SEPA publish a “Site returns” dataset (accessible via their Waste sites and capacity tool) that says…
- how many tonnes
- of each (EWC coded) waste material
- was moved in or out
- of each authorised waste site in Scotland.
Here is an extract…
This is impressive, ongoing data collection and curation by SEPA.
But might some of its information be made more understandable to the general public by depicting it on a map?
Towards answering that, we built a prototype webapp. (For speed of development, we considered only the materials incoming to waste sites during the year 2019.)
To aid comprehension, SEPA often sorts waste materials into 33 categories. We do the same in our prototype, mapping each EWC coded waste material into 1 of the 33 categories…
The “Site returns” dataset identifies waste sites by their Permit/Licence code. We want our prototype to show additional information about each waste site. Specifically, its name, council area, waste processing activities, client types, and location – very important for our prototype’s map-based display!
SEPA holds that additional information about waste sites, in a 2nd dataset: “Waste sites and capacity summary” (also accessible via their Waste sites and capacity tool). Our prototype uses the Permit/Licence codes to cross-reference between the 2 SEPA datasets.
SEPA provides the waste site locations as National Grid eastings and northings. However, it is easier to use latitude & longitude coordinates in our chosen map display technology so, our prototype uses Colantoni’s library to perform the conversion.
The prototype webapp
A ‘live’ instance of the resulting prototype webapp can be accessed here.
Below is an animated image of it…
UI & controls
- Each pie chart depicts the amounts of materials incoming to a single waste site, or the aggregation of waste sites within a map area.
- (I.e. a number without a surrounding pie chart) depicts a waste site with no incoming materials (probably because the site was not operational during 2019).
- Hovering the cursor over a pie segment will pop-up details about incoming tonnes of the material depicted by the segment.
- Hovering the cursor over a pie that depicts an aggregation will highlight the map area in which the aggregated waste sites are located.
- Clicking on a single waste site will pop-up details about that waste site.
- The webapp supports the usual zoom and pan controls. The user can also double-click on an aggregation pie to zoom into the area that it covers.
- Clicking on ‘attributions’ will display a page that credits:
- SEPA for its “Site returns” dataset.
- Open Street Map for the map data.
- Leaflet, Leaflet.markercluster and Bård Romstad for the software libraries used to build the prototype webapp.
But might some of its information be made more understandable to the general public by depicting it on a map?
For any good solution, the answer will be an obvious ‘yes’. But what about for our prototype webapp solution?…
We think that it could help pique interest in the differences in the amounts & types of waste materials that are being disposed in different areas of the country. For example…
Future work could increase the utility of this prototype webapp by:
- allowing the user to browse over the time-series aspect of this dataset using a time slider control (like our through time on a map prototype)
- providing a means to switch the focus of interest from incoming material to: outgoing material, processing activities (landfill, composting, metal recycling, etc.), or facilities offered (household, commercial, special disposals, etc.)
- supporting filtering over the various dimensions
- providing the means for a user to open their current data selection in a tool (like our data grid & graph prototype) that allows them to explore the data in more detail.
How I chanced on Longannet in the data
When I watched the waste amounts change through time on this map, Fife’s amounts really stood out…
Fife was generating so much more waste from business, than the other council areas. But why?
To look at the data in more detail, I loaded it into the data grid & graph tool that we built a couple of months ago.
First, I filtered the data grid to show me: Fife’s four largest, business wastes vs their averages link.
Fife’s combustion waste stands out from the average.
Secondly, I filtered the data grid to show me: the business combustion waste quantities by sector link.
Unfortunately this data isn’t broken down by council area, but it clearly shows that most of the combustion wastes are generated by the power industry.
An internet search with this information – i.e. “Fife combustion power” – returns a page full of references to Longannet – the coal fuelled power station.
According to Wikipedia, Longannet power station was the 21st most polluting in Europe when it closed, so no wonder that its signature in the data is so obvious! It was closed on 24th March 2016, which correlates with the sharp return towards the average in 2016, of the combustion wastes graph line for Fife.
Of course this isn’t a real discovery – SEPA, Scottish Power and the people who lived around the power station will be very familiar with this data anomaly and its cause. But I think that its mildly interesting that a data lay person like me could discover this from looking at these simple data visualisations.
Waste quantities through time, on a map
Shortly before the end of 2020, I attended the Code The City 21: Put Your City on the Map hack weekend which explored ideas for putting open data onto geographic maps.
It ran several interesting projects. There was one was especially inspiring to me: the Bioregion Dashboard. Its idea is to tell an evidence-backed story-through-the-years, involving interactive data displays against a map. James Littlejohn introduces it in this YouTube video.
This got me thinking about new ways to depict the information that is bound up in the data about waste…
In particular, thinking about a means to convey at-a-glance, to the lay person, how councils areas compare through time in respect of the amounts of (household solid) waste that they process. Now, the grid & graph prototype that we built a couple of months back, conveys that same information very well (and with a greater fidelity than we will mange in this work) but, to the lay parson like me, it isn’t attention grabbing. I like seeing something with movement and with features that I can relate to, such as animated charts and a geographical map.
The prototype webapp
Leveraging what I learnt at the Code the City 21 hack weekend, I hacked together a prototype webapp that shows how waste quantities change through time, on a geographic map.
The below, animated image of the webapp, it conveys that landfilled-waste is reducing over time whilst total-waste is remaining fairly constant.
- The dataset of interest is chosen through the control, either:
- Tonnes of managed solid household waste per person per year.
- Tonnes of C02 equivalent from household waste per person per year.
- Use the control to travel through time.
- Each chart depicts the waste-related quantities for a council area.
- The sizes of its slices and its overall size, are related to the quantities that it depicts.
- Hover over a council area to see detailed metrics in the panel.
- The usual map zoom and pan controls are supported.
Software and datasets
- The open source Leaflet and Minichart libraries take care of most of the heavy lifting (interactive graphics).
- The map’s base layer comes from Esri ArcGIS (although the images in this document contain a Stadia Maps base layer – but this can’t be used in a runtime without a licence.)
- The map’s council area boundary data originates from the ONS, and has been curated by Martin Chorely.
- The datasets for the pie charts, are:
- “Population Estimates (Current Geographic Boundaries)” curated in the Scottish government’s linked-data store, and authored by NRS in 2020.
- “Generation and Management of Household Waste” curated in the Scottish government’s linked-data store, and authored by SEPA in 2020.
- “Carbon footprint” authored by SEPA in 2020.
A ‘live’ instance of this webapp can be accessed here .
I haven’t seen these datasets about waste shown in this way before, and I think that it usefully conveys aspects of the datasets in a catchy and easy to understand way. It is low fidelity when compared to a full data grid with graph solution, but the idea is to hold the attention of the average person in the street.
Future work could integrate additional waste-relevant datasets that have geography and time dimensions. Also we should consider alternative metrics (such as ratios), alternative charts (such as bar or polar) and alternative statistics (such as deviation or trend). I went with the ‘most straightforward’ but user-testing might indicate that an alternative is better.
A prototype data grid & graph over data about waste
The interactive data grid with a linked graph is a tool that is often used to aggregate, dissect, explore, compare & visualise datasets. Might such a tool help our users explore and understand open data about waste? To help answer this, I have hacked together a web-based prototype…
The working prototype
The working prototype can be accessed via this link.
The prototype pulls together 4 datasets:
- “Generation and Management of Household Waste” (SEPA).
- “Carbon footprint [CO2e]” (SEPA)..
- “Population Estimates (Current Geographic Boundaries)” (NRS).
- “Mid-Year Household Estimates” (NRS).
The datasets are fetched from statistics.gov.scot and Wikidata, using SPARQL; then matched; and finally, the per-citizen and per-household values are calculated.
The result is 17,490 data records.
The data was assembled using this executable Jupyter notebook. For a production-class implementation, that could easily be coded as automated, periodic process.
The web app containing the interactive data grid with a linked graph, was built using the DevExtreme web component library. Alternative libraries are viable, but the DevExtreme one is modern and free for non-commercial use.
The resulting data assembly and web app are stored as static files in the project’s GitHub repositories.
The prototype’s web page contains a graph and a configurable data grid. The graph automatically reflects the data selected in the data grid.
Detailed information about a graph’s data point is shown when the user hovers over it with the cursor.
The graph can be zoomed/unzoomed, and its current contents can be printed or saved as PNG, PDF, etc.
The data grid’s expand/collapse arrow-head icons allow the user to drilldown into slices of data. Below, we’ve expanded the
Recycled slice to reveal the data values
The data grid’s “Show Filed Chooser” icon pops up a control panel to allow the user to select data dimensions, axis assignments, value ranges, value filters, display order, etc., etc.
The data grid’s “Export to Excel file” icon will export the grid’s currently selected data to an Excel spreadsheet.
The resulting Excel files are nice because the export functionality preserves user-friendly fixed headers and some other formatting.
Finally, the prototype operates well on phones and tablets (although there is a sizing issue with pop-up panels that I haven’t investigated).
But, is it useful?
So, might (a production-class version of) such a tool, help our users to explore and understand open data about waste? Well, we won’t know until we have user tested it, but my guess is that:
- users with no data analysis experience will find its configurability difficult to navigate.
- users with low-to-medium data analysis experience may find it a useful as a single tool containing multiple datasets.
- users with medium-to-high data analysis experience will prefer to use their own tools.
A presets feature has been added to the tool so that users can go to a particular configuration & data selection by simply clicking on a hyperlink. This supports an easy-access route to the tool for users with no data analysis experience, by answering their potential questions through presets such as:
- How does Aberdeen City compare with Dundee (and Scotland as a whole) for the amounts of household waste per citizen that it landfills?
- How many tonnes of each household waste material ended up recycled, landfilled, etc. in Stirling in 2018?
- What proportion of a tonne of household waste has ended up recycled, landfilled, etc. in Edinburgh through the years?
- What does the correlation look like between the amounts of household waste solids and their calculated carbon impacts?
The usefulness of putting datasets into Wikidata?
A week ago, I attended Ian Watt‘s workshop on Wikidata at the Scottish Open Data Unconference 2020. It was an interesting session and it got me thinking about how we might upload some our datasets of interest (e.g. amounts of waste generated & recycled per Scottish council area, ‘carbon impact’ figures) into Wikidata. Would having such datasets in Wikidata, be useful?
There is interest in “per council area” and “per citizen“ waste data so I thought that I’d start by uploading into Wikidata, a dataset that describes the populations per Scottish council area per year (source: the Population Estimates data cube at statistics.gov.scot).
This executable notebook steps through the nitty-gritty of doing that. SPARQL is used to pull data from both Wikidata and statistics.gov.scot; the data is compared and the QuickStatements tool is used to help automate the creation and modification of Wikidata records. 2232 edits were executed against Wikidata through QuickStatements (taking about 30 mins). Unfortunately QuickStatements does not yet support a means to set the
rank of a statement so I had to individually edit the 32 council area pages to mark, in each, its 2019 population value as the
Preferred rank population value …indicating that it is the most up-to-date population value.
But, is having this dataset in Wikidata useful?
The uploaded dataset can be pulled (de-referenced) into Wikipedia articles quite easily. As an example, I edited the Wikipedia article Council areas of Scotland to insert into its main table, the new column “Number of people (latest estimate)” whose values are pulled (each time the page is rendered) directly from the data that I uploaded into Wikidata:
Visualisations based on the upload dataset can be embedded into web pages quite easily. Here’s an example that fetches our dataset from Wikidata and renders it as a line graph, when this web page is loaded into your web browser:
Concerns, next steps, alternative approaches.
Interestingly, there is some discussion about the pros & cons of inserting Wikidata values into Wikipedia articles. The main argument against is the immaturity of Wikidata’s structure: therefore a concern about the durability of the references into its data structure. The counter point is that early use & evolution might be the best path to maturity.
The case study for our Data Commons Scotland project, is open data about waste in Scotland. So a next step for the project might be to upload into Wikidata, datasets that describe the amounts of household waste generated & recycled, and ‘carbon impact’ figures. These could also be linked to council areas – as we have done for the population dataset – to support per council area/per citizen statistics and visualisations. Appropriate properties do not yet exist in Wikidata for the description of such data about waste, so new ones would need to be ratified by the Wikidata community.
Should such datasets actually be uploaded into Wikidata?…These are small datasets and they seem to fit well enough into Wikidata’s knowledge graph. Uploading them into Wikidata may make them easier to access, de-silo the data and help enrich Wikidata’s knowledge graph. But then, of course, there is the keeping it up-to-date issue to solve. Alternatively, those datasets could be pulled dynamically and directly from statistics.gov.scot into Wikipedia articles with the help of some new MediaWiki extensions.