This project needs more presence on-line. Thanks for prompting this Ian! (And be sure to read Ian’s posting on the state of open data.) So, this week…
- Anna & I have made a start on revamping this, the project’s public WordPress site, which Anna created late last year. This site is accessible at https://campuspress.stir.ac.uk/datacommonsscotland. The idea is that we’ll publish on it limited-lifespan information such as relevant happenings & blog postings and take feedback comments.
- Also, I’ve create a public GitHub site at https://github.com/data-commons-scotland for some of the project’s longer-lifespan outputs such as concepts/models, standards, research output and open source code. GitHub will continue to preserve these (hopefully useful) outputs beyond the lifespan of this project. I’ve made a start by adding some investigation reports (dcs-shorts) and example web application source code (dcs-wcs).
The narratives in Anna and Hannah’s “Scenarios” document, tantalise with mentions of the features supported by their fictional Waste Commons Scotland (WCS) web application. This week, mocked versions of some of those features have been added to the placeholder WCS web application (source code) – with the idea that their animation will make the features easier to understand and assess.
Kudos to Stirling Council for being the only Scottish local authority to have published household waste collection data as open data. This data is contained in their waste-management dataset. It consists of:
- Core data, per year CSV files.
- Metadata that includes a basic schema for the CSV files, maintenance information and a descriptive narrative.
For that, Stirling Council have attained 3 stars on this openness measure.
To reach 5 stars, that data would have to be turned into linked open data, i.e. gain the following:
- URIs denoting things. E.g. have a URI for each waste type, each collection route and each measurement.
- Links to other data to provide context. E.g. reference commonly accepted identifiers/URIs for dates, waste types and route geographies.
This week I investigated aspects of what would be involved in gaining those extra two stars.
This executable notebook steps through the nitty-gritty of doing that. The steps include:
- Mapping the data into the vocabulary for the statistical data cube structure – as defined by the W3C and used by the Scottish government’s statistic office.
- Mapping the date values to the date-time related vocabulary – as defined by the UK government.
- Defining placeholder vocabularies for waste type and collection routes. Future work would be to: map waste types to (possibly “rolled-up” values) in a SEPA defined vocabulary; and map collection routes to a suitable geographic vocabulary.
- Converting the CSV source data into RDF data in accordance to the above mappings. This results in a set of .ttl – RDF Turtle syntax – files.
- Loading the .ttl files into a triplestore database so that their linked data graph can be queried easily.
- Running a few SPARQL queries against the triplestore to sanity-check the linked data graph.
- Creating an example infographic (showing the downward trend in missing bins) from the linked data graph:
- It took a not insignificant amount of consideration to convert the 3-star non-linked data to (almost) 5-star linked data. But I expect that the effort involved will tail off if we similarly converted further datasets, because of the experience and knowledge gained along the way.
- Having a linked data version of the waste-management dataset promises to make its information more explicit and more compostable. But for the benefits to be fully realised, more cross-linking needs to be carried out. In particular, we need to map waste types to a common (say, SEPA controlled) vocabulary; and map collection routes to a common geographic vocabulary.
- We might imagine that if such a linked dataset were to be published & maintained – with other local authorities contributing data into it – then SEPA would be able to directly and constantly harvest its information so, making period report preparation unnecessary.
- JimT and I have discussed how the Open Data Phase2 project might push for the publication of linked open data about waste, using common vocabularies, and how our Data Commons Project could aim to fuel its user interface using that linked open data. In order words, the linked open data layer is where the two project meet.
As the Open Data Institute says in its Guide, What is ‘open data’ and why should we care?:
You can’t go 10 minutes without hearing about data these days. “Data blogs.” “Big data.” “Data protection.” “Data.” “Daahta.”
Data are described as the “new oil”, promising improvements in economic growth; transparency, accountability and governance; health; and more. The Open Data movement aims to put data, and therefore the associated benefits, into the “commons” – that is, the common wealth of resources that belong to all of us. Open Data are data that’s available to everyone to access, use and share.
But although moves to make community, commercial, health, governance and scientific/social research data freely and publicly available hold out the possibility of improvements in many spheres, for many people, the number and range of people actually making use of Open Data remains limited. We believe this is at least in part because making data available is not enough to make it truly Open – data also need to be made useable.
Data Commons Scotland is a collaborative, interdisciplinary partnership, funded by the EPSRC, that is undertaking research and design work to try to find ways of addressing this problem.
We’re undertaking a case study around waste management and recycling data with an aim to better understanding the issues surrounding making data open in an effective way.