New tools and approaches for Coverage data on the Web

In the MELODIES project we are using environmental data of many different kinds, from individual point measurements of soil moisture to satellite imagery. Although these data are highly diverse, they can all be modelled as coverages. In this blog post, I’ll explain what a coverage is, outline some of the software we’re developing in MELODIES and talk about some of the challenges we still face.

What is a coverage?

A coverage is simply a data structure that maps points in space and time to values (this is defined in the ISO 19123 specification). For example, an aerial photograph can be thought of as a coverage that maps positions on the ground to colours. A river gauge maps points in time to flow values. A weather forecast maps points in space and time to values of temperature, wind speed, humidity and so forth. One way to think of a coverage is as a mathematical function, where data values are a function of coordinates in space and time.

(N.B. Sometimes you’ll hear the word “coverage” used synonymously with “gridded data” or “raster data” but this isn’t really accurate. You can see from the above paragraph that non-gridded data (like a river gauge measurement) can also be modelled as coverages. Nevertheless, you will often find a bias toward gridded data in discussions (and software) that concern coverages.)

A coverage can be defined using three main pieces of information:

  • The domain of the coverage is the set of points in space and time for which we have data values. For example, in a river gauge measurement, the domain is the set of times at which the flow was measured. In a satellite image, the domain is the set of pixels. In a weather forecast, the domain is a set of grid cells.
  • The range of the coverage is the set of measured, simulated or observed data values. A single coverage may record values for lots of different quantities; for example a weather forecast predicts values for many things (temperature, humidity etc) on the same domain. So the range of a coverage often consists of a number of lists of data values, one for each measured variable. Each element within each list corresponds with one of the elements of the domain (e.g. a pixel or grid cell).
  • The range metadata describes the range of the coverage, to help users to understand what the data values mean. This may include links to definitions of variables, units of measure and other bits of useful information.

Usually, the most complex piece of information in the coverage is the definition of the domain. This can vary quite widely from coverage type to coverage type, as the list above shows. For this reason, coverages are often defined by the spatiotemporal geometry of their domain. You will hear people talking about “multidimensional grid coverages” or “timeseries coverages” or “vertical profile coverages” for example.

There is much more to say about coverages, but I’ll leave that for future posts! For now, you can just think of a coverage as an overarching concept that describes lots of different kinds of scientific data.

Coverage tools in MELODIES

One of our goals in MELODIES is to bring environmental data to a wider community, using the Web. We’re devising a means to encode coverage data in the JSON format (JavaScript Object Notation), which is probably the primary format for data exchange favoured by modern web developers. We have produced a draft specification for this format, which is on GitHub, here. This is still very much in development but comments are welcome! The key features of this format include:

  • Homogeneous support for a range of coverage types
  • Support for both continuous (e.g. temperature) and categorical (e.g. land cover classification) data types
  • Use of JSON-LD to enable semantic metadata to be embedded where appropriate
  • Semantic grouping of range members (e.g. to identify the different components of a vector or tensor field)

It’s our hope that this will become a standard format for coverage data exchange on the web, simplifying the job of developing web applications using this kind of data. I’m not going to go into the details here, but the specification uses ideas from both the OGC Coverage Implementation Schema (CIS) and the Climate and Forecast conventions for NetCDF files, in an attempt to create a result that works for a wide community.

We’re building a suite of tools and specifications around this format. These are currently highly experimental, but include:

The video below gives a demonstration of these in action on the MELODIES technology demonstrator portal (source code here). You will see a MELODIES land cover dataset being loaded, reclassified, subsetted and analysed, all in the web browser using JavaScript operations.

We’re working with the OGC/W3C Spatial Data on the Web Working Group to promote and formalise these new approaches, and encourage others to adopt and experiment with us!

Some challenges

There’s a lot of work to do in this field, and a lot of it will be concerned with refinement, standardisation and adoption. There are quite a few conceptual hurdles to get over when dealing with some of the finer points of environmental data (another potential future blog post!). However, MELODIES is primarily a practical project there are some specific challenges that we’re working on to improve our coverage-related software stack:

  • Efficient (and accurate) reprojection of gridded data within the web browser
  • Automated intercomparison of different datasets, including the problem of automating unit conversions
  • Visualisation of uncertain data
  • Effective use of semantic metadata to enable interoperability with other datasets

We’re recruiting a software developer to help us with these kinds of tasks in the final months of the project. Click here if you’re interested in applying! (Note that the closing date is 21st March 2016.)

Add new comment