The prototype’s architecture – revised

“Trialling Wikibase for our data layer” described how we evaluated the use of Wikibase as a key implementation component in our bi-layer architecture. The conclusion was that Wikibase, although a brilliant product, does not fit our immediate purpose.

In our revised architecture…​

Wikibase is replaced with (dcs-easier-open-data) a simple set of data files (CSV and JSON) hosted in a public repository (GitHub). These data files are generated by the Waste Data Tool (dcs-wdt). Together, dcs-easier-open-data and dcs-wdt implement the architecture’s data layer.

In the architecture’s revised presentation layer, the webapp reads (CSV/JSON formatted) data from the dcs-easier-open-data respository, instead of reading (via SPARQL) data from the Wikibase.

The prototype’s bi-layered architecture - revised

“How is waste in my area?” – a regional dashboard

Introduction

Our aim in this piece of work is:

to surface facts of interest (maximums, minimums, trends, etc.) about waste in an area, to non-experts.

Towards that aim, we have built a prototype regional dashboard which is directly powered by our ‘easier datasets’ about waste.

The prototype is a webapp and it can be accessed here.

our prototype regional dashboard

Curiosities

Even this early prototype manages to surface some curiosities [1] …​

Inverclyde

Inverclyde is doing well.

Inverclyde’s household waste positions Inverclyde’s household waste generation Inverclyde’s household waste CO2e

In the latest data (2019), it generates the fewest tonnes of household waste (per citizen) of any of the council areas. And its same 1st position for CO2e indicates the close relation between the amount of waste generated and its carbon impact.

…​But why is Inverclyde doing so well?

Highland

Highland isn’t doing so well.

Highland’s household waste positions Highland’s household waste generation Highland’s household waste % recycled

In the latest data (2019), it generates the most (except for Argyll & Bute) tonnes of household waste (per citizen) of any of the council areas. And it has the worst trend for percentage recycled.

…​Why is Highland’s percentage recycled been getting worse since 2014?

Fife

Fife has the best trend for household waste generation. That said, it still has been generating an above the average amount of waste per citizen.

Fife’s household waste positions Fife’s household waste generation

The graphs for Fife business waste show that there was an acute reduction in combustion wastes in 2016.

Fife’s business waste

We investigated this anomaly before and discovered that it was caused by the closure of Fife’s coal fired power station (Longannet) on 24th March 2016.

Angus

In the latest two years of data (2018 & 2019), Angus has noticibly reduced the amount of household waste that it landfills.

Angus' household waste management

During the same period, Angus has increased the amount household waste that it processes as ‘other diversion’.

…​What underlies that difference in Angus’ waste processing?

Technologies

This prototype is built as a ‘static’ website with all content-dynamics occurring in the browser. This makes it simple and cheap to host, but results in heavier, more complex web pages.

  • The clickable map is implemented on Leaflet – with Open Street Map map tiles.
  • The charts are constructed using Vega-lite.
  • The content-dynamics are coded in ClojureScript – with Hiccup for HTML, and Reagent for events.
  • The website is hosted on GitHub.

Ideas for evolving this prototype

  1. Provide more qualitative information. This version is quite quantitative because, well, that is nature of the datasets that currently underlay it. So there’s a danger of straying into the “managment by KPI” approach when we should be supporting the “management by understanding” approach.
  2. Include more localised information, e.g. about an area’s re-use shops, or bin collection statistics.
  3. Support deeper dives, e.g. so that users can click on a CO2e trend to navigate to a choropleth map for CO2e.
  4. Allow users to download any of the displayed charts as (CSV) data or as (PNG) images.
  5. Enhance the support of comparisons by allowing users to multi-select regions and overlay their charts.
  6. Allow users to choose from a menu, what chart/data tiles to place on the page.
  7. Provide a what-if? tool. “What if every region reduced by 10% their landfilling of waste material xyz?” – where the tool has a good enough waste model to enable it to compute what-if? outcomes.

1. One of the original sources of data has been off-line due to a cyberattack so, at the time of writing, it has not been possible to double-check all figures from our prototype against original sources.

A mock-up website for functionality & navigation

Introduction

A prototype website will be one of the outcomes of this research project. The website should help non-experts discover, learn about and understand the open data about waste in Scotland.

To date, we have build a couple of mock-ups [1]:

  1. functionality & navigation mock-up for exploring ideas about functionality and navigation for our eventual website.
  2. look’n’feel mock-up for exploring looks/visual aesthetics.

This document concentrates on the functionality & navigation mock-up…​

The splash page of the functionality & navigation mock-up

Functionality

This mock-up ties together a lot of the elements we’ve been working on:

Data Direct access to download the underlying datasets.
A simple, consistent set of CSV and JSON files.
Maps Interactive, on-map depictions of the information from the datasets.
Data grids with graphs A tool for slicing’n’dicing the datasets and visualising the result as a graph.
To make this easier, this tool will provide useful slicing’n’dicing presets: starting points from which users can explore.
SPARQL A query interface to a semantic web representation of the datasets.
This is unlikely to be of use to our target audience, so we’ll probably remove it from the UI but may use its semantic graph internally.
Articles Themed articles and tutorials that are based on evidence from the datasets.
Uses Asciidoc mark-up to make the articles easy to format.
The articles may incorporate data visualisations that are backed by our datasets.

Navigation

The mock-up provides 3 routes to information:

Themes The clickable blocks on the splash page allows users to explore a waste theme by taking the user to a specific set of of articles and tutorials.
Navbar The menu bar at the top of each page, provides an orthogonal, more ‘functional’ classification of the website’s contents.
Search At present, this is a very basic text & tag search. In the future, a predicative/auto-suggestion search based on a semantic graph of the contents, will be provided.

Users navigation histories may help power a further-reading recommender subsystem.

Architecture

Building this mock-up has required some architectural decisions that may help inform the design of our eventual website.

Static website The mock-up has been implemented as a so-called ‘static website’. This means that page content is not dynamically generated by (or saved to) the server-side. The server-side simply serves ‘static content files’.

Pros Implementation-wise, it is an order of magnitude simpler and more scalable than a ‘dynamic’ website.
There are several good, free, open source ‘static website generators/frameworks’.
Static websites can be served for free on hosting platforms such as GitHub (as used for this mock-up).
Cons It can’t support a whole class of functionality, including user uploads, and on-line content editing.
Computation is forced towards the client-side (i.e. into users’ web browsers) which sometimes can have a negative impact on the speed of the UI.
Off-line updates The content of the website can be updated – just not updated on-line. The website maintainers can add new/edit existing datasets, articles, etc. via off-line means.
For off-line updates to this mock-up we use: (i) WDT – a rough’n’ready software script that helps us to curate the datasets that underlay this mock-up; (ii) Cryogen – a static website generator; (iii) Git – to upload updates to our GitHub hosting service.
Client-side computation Page content is dynamically manipulated (e.g. datasets are slice’n’diced) on the client-side (in users’ web browsers) using JavaScript. This enables, for example, the mock-up’s web pages to take the static content that is served by the server-side, and manipulate it so that it can support interactive data visualisations.
Progress in client-side technology even makes it possible to implement a semantic graph supporting triple store in a web browser!

Conclusion

This mock-up website…​

  • provides concrete test-bed for evolving the functionality & navigation aspects of our eventual website, and
  • forces us to think about architectural trade-offs.

1. We use the term “mock-up” to mean an incomplete representation/model – useful for demonstration, design evaluation and acquiring user feedback.

Trialling Wikibase for our data layer

Introduction

The architectural proposal for our WCS platform, contains a data layer for collecting, linking, caching and making accessible the source datasets…​

bilayered architecture
Note
Our assumption is that, for our near-term aims, linked data provides the most useful foundation.

 

The idea that we’re trialling here, is to use Wikibase as the core component in our data layer…​

implementation using wikibase

The case for using Wikibase

Wikibase is a proven off-the-shelf solution that makes it easier to work with linked data. It provides:

  • A linked data store.
  • An interface for humans to view and manually edit linked data.
  • An API that can be used by computer programs to (bulk) edit linked data.
  • SPARQL support.

So why not just use Wikidata? (Wikidata is Wikibase’s common, public instance.) …​Ideally we would but:

  • Our domain specifics aren’t supported in Wikidata.
    • E.g. Wikidata doesn’t (yet) have a full vocabulary to describe waste management.
    • We want to experiment and move fast. Wikidata is sensibly cautious about change, therefore too slow for us.
  • We can still make use of Wikidata for some aspects of our work by referencing its data, using its vocabulary, and using it to store very general data.

Novelty…​  The use of a customised Wikibase instance is not novel but our intended specific customisation and application does have some novelty:

  • Wikibase provides easier to use, human-friendly access to linked data than typical triple stores.

    Will this facilitate more engagement and use, compared with sites with less human-oriented surface area? …Perhaps a worthwhile study.

    Also, the greater human-oriented surface area in this solution, should be direct help when it comes to implementing user-based features such as a recommender system and community forums.

  • By their nature, wiki solutions support crowd sourcing.

    Our platform could support a limited form of this by encouraging councils, recycling shops, etc. to contribute their data about waste; data which currently isn’t open or linked.

  • Our platform will be built using open & inexpensive (often free) components and services.

    It should be straightforward to apply the approach to other domains of open data for Scotland.

Hosting on WBStack

WBStack is an alpha (software-as-a-service) platform created by by Adam Shorland. It allows invitees to create their own, publicly accessible Wikibase instances.

Adam invited us to create our own Wikibase instance on his platform.
Our Wikibase is at https://waste-commons-scotland.wiki.opencura.com

screenshot wcs wikibase

Populating our Wikibase with data about waste

The datasets

In this trial, we want to populate our Wikibase with 4 datasets:

  1. area – reference data describing administrative areas
  2. population – reference data describing populations
  3. household waste – describing the tonnes of solid waste generated by households
  4. co2e – describing the tonnes of carbon equivalent from household waste

The data model

Representing a dataset record in Wikibase

Let’s consider a couple of records from the population dataset:

Aberdeen City 2018 227,560
Aberdeen City 2017 228,800

In Wikbase, we could represent each of those records as a statement on the “Aberdeen City “item. This is the approach that we took in our previous work about The usefulness of putting datasets into Wikidata?. This screenshot shows the resulting Wikidata statements …​

screenshot population statements wikidata

The problem with this approach is that it can result in an unwieldy amount of statements per single item.

The alternative approach we’ve taken for our Wikibase, is to represent each of those records as an item in its own right. So that first record is represented as the Wikibase item…​

screenshot popAc2018 wikibase

Some predicates and dimensions are common, they are used across most of the datasets.

Some predicates and dimensions are dataset specific. For example: the predicate has UK government code is used only to describe the area dataset; while the dimension end-state is used only to describe the household waste dataset.

Loading the data

I’ve hacked together a software script – dcs-wdt – which writes the datasets into our Wikibase. It is very rough’n’ready (however, it might be the seed of something more generic for automatically re-loading our datasets of interest). Its outline is:

/* order datasets & dataset-aspects, most independent first */
for each dataset in [base, area, population, household-waste, co2e]
  for each dataset-aspect in [class-item, predicates, supporting-dimensions, measurements]
    for each record in the dataset-aspect
      if the record is not already represented in the Wikibase
        write-to-wikibase a property or item to represent the record

Assessment

So, should we use a Wikibase as the core component in our data layer?

Pros

  • The bundled SPARQL query service and UI work well.

    There is an oddity w.r.t. implicit prefixes but this can be worked around by explicitly declaring the prefixes.

  • It has straight out-of-the-box search functionality which automatically indexes content, and provides a search feature (with ‘completion-suggestion’).
    screenshot search wikibase

    It is primarily configured for searching items by their labels but it does fall-through to providing a more full-text type search capability.

  • It has a baked-in API (in addition to the programmatically accessible SPARQL query service) which provides a very full and well documented HTTP-based API for reading & writing data.

    (The dcs-wdt script makes use of both its SPARQL query service and API.)

  • Its human-oriented web pages (UI) are sort of nice – making it easy to explore the data, and to perform data management tasks.
  • It comes with a raft of features for supporting community-contributed content, including: user accounts and permissions, discussion forums, and easy-ish to use bulk data uploads via QuickStatements. I haven’t explored these in any depth, but they are potentially useful if the project decides that supporting user content on the WCS platform, is in-scope.

Cons

  • It doesn’t come with all the bells’n’whistles I thought it would…​

    I think that I’ve been naive in thinking that many of the easy-to-use MediaWiki rendering features (especially over SPARQL queries) that I’ve read about (particularly those of LinkedWiki), would just-be-there. Unfortunately those are all extras…​the LinkedWiki extension and its transitive dependencies need to be installed; the relevant templates imported; OpenStreetMap etc. access keys must be configured.

    Those bells’n’whistles are not supported by WBStack and the installation of them would take some expertise.

  • WbStack’s service has been running for one year now but, as a free alpha, it provides no guarantees.

    For example, a recent update of some of its software stack caused a short outage and an ongoing problem with label rendering on our Wikibase instance.

Conclusions

For the project, the main reason for using Wikibase is two-fold:

  1. Out-of-the-box support for a simple linked data model that can be SPARQL-ed.
  2. The use of the wiki’s data-table, graphing & mapping widgets for the rapid prototyping of and inclusion in WCS web pages.

As it stands, the WBStack Wikibase is useful for (a) but not (b).

I’m thinking that we should keep it on the back burner for now – while we find out what the front-end needs. Its support of (a) might turn out to be a good enough reason to use it, although there are alternatives – including use of a standalone triple store; or, if we have just a few datasets, building our own linking software and file-based store. Not having (b) means extra work for us to build/configure widgets for graphing, mapping, etc.

A prototype data grid & graph over data about waste

The interactive data grid with a linked graph is a tool that is often used to aggregate, dissect, explore, compare & visualise datasets. Might such a tool help our users explore and understand open data about waste? To help answer this, I have hacked together a web-based prototype…​

The working prototype

The working prototype can be accessed via this link.

The data

The prototype pulls together 4 datasets:

  1. “Generation and Management of Household Waste” (SEPA).
  2. “Carbon footprint [CO2e]” (SEPA)..
  3. “Population Estimates (Current Geographic Boundaries)” (NRS).
  4. “Mid-Year Household Estimates” (NRS).

The datasets are fetched from statistics.gov.scot and Wikidata, using SPARQL; then matched; and finally, the per-citizen and per-household values are calculated.

The result is 17,490 data records.

The build

The data was assembled using this executable Jupyter notebook. For a production-class implementation, that could easily be coded as automated, periodic process.

The web app containing the interactive data grid with a linked graph, was built using the DevExtreme web component library. Alternative libraries are viable, but the DevExtreme one is modern and free for non-commercial use.

The resulting data assembly and web app are stored as static files in the project’s GitHub repositories.

Its features

The prototype’s web page contains a graph and a configurable data grid. The graph automatically reflects the data selected in the data grid.

700

Detailed information about a graph’s data point is shown when the user hovers over it with the cursor.

screenshot graph hover

The graph can be zoomed/unzoomed, and its current contents can be printed or saved as PNG, PDF, etc.

screenshot graph saving

The data grid’s expand/collapse arrow-head icons allow the user to drilldown into slices of data. Below, we’ve expanded the Stirling → Recycled slice to reveal the data values per-material.

screenshot drilldown

The data grid’s “Show Filed Chooser” icon pops up a control panel to allow the user to select data dimensions, axis assignments, value ranges, value filters, display order, etc., etc.

50
screenshot field chooser panel

The data grid’s “Export to Excel file” icon will export the grid’s currently selected data to an Excel spreadsheet.

50

The resulting Excel files are nice because the export functionality preserves user-friendly fixed headers and some other formatting.

700

Finally, the prototype operates well on phones and tablets (although there is a sizing issue with pop-up panels that I haven’t investigated).

200
400

But, is it useful?

So, might (a production-class version of) such a tool, help our users to explore and understand open data about waste? Well, we won’t know until we have user tested it, but my guess is that:

  1. users with no data analysis experience will find its configurability difficult to navigate.
  2. users with low-to-medium data analysis experience may find it a useful as a single tool containing multiple datasets.
  3. users with medium-to-high data analysis experience will prefer to use their own tools.

presets feature has been added to the tool so that users can go to a particular configuration & data selection by simply clicking on a hyperlink. This supports an easy-access route to the tool for users with no data analysis experience, by answering their potential questions through presets such as:

Mocking-up features in a placeholder WCS web application

The narratives in Anna and Hannah’s Scenariosdocument, tantalise with mentions of the features supported by their fictional Waste Commons Scotland (WCS) web application. This week, mocked versions of some of those features have been added to the placeholder WCS web application (source code) – with the idea that their animation will make the features easier to understand and assess.