Linking geospatial raster data
In the Ocean Status Assessment Service, our objective is to support users in evaluating the compliance of coastal waters to the Marine DIrective Framework, using open data to answer questions like:
- what data relate to this Directive?
- what data exist about sea surface temperature?
- what data intersect this area/time of interest?
- what processing occurred on data from this source (e.g. MyOcean)?
We generate geospatial datasets which support these decisions, but in order to work most effectively, users need to find subsets of this data and process them to make decisions, linked data techniques support this. A common ICT concept is the DIKW pyramid (see image below). Geospatial data, per se, allow to extract Information (what, where, when) but one needs the context around data to reach Knowledge (how) and Wisdom (why). The context is precisely what linking gives to geospatial data.
The Data -> Information -> Knowledge -> Wisdom pyramid
Using linked data techniques, a geospatial dataset (say a cadastral map) can be connected to other static geospatial data, such as a thematic cartography. Links can also point to dynamic measurements (e.g. meteorology, traffic) and to loosely georeferenced material, such as social, political, economic data. A system based on linked geospatial raster data will allow newcomers to find useful data in the specific domain and allow expert users to explore correlated domains such as climatology. Our aim is to build a system where finding the answers to the above questions will be as straightforward as buying products on Amazon, where filters are used to narrow down a search and product selection is enabled by data linking between thousands of servers.
The first step in linking data is to define an ontology - the figure below depicts ours for Oceanography:
This geospatial data ontology allows users not only to discover the most appropriate dataset for one’s use case, it is also a tool that can be used to carry out complex analyses, exploiting the machine-understandable pieces of information in the semantic data. Using RDF as the universal language for modelling data links and SPARQL for processing queries, developers can build systems for data analysis and manipulation simply by gathering together the appropriate resources from the web.
There is, however, a complication. The datasets that must be used in analysing coastal waters (and that we are producing) are raster datasets. Geospatial raster data are grids of data points which typically originate from sophisticated sensing of a geographical area: Each point or pixel represents a portion of that area, observed in a specific band. Raster data provide representations of the world in terms of images (EO scenes, aerophotos) or physical measures that vary continuously with space (temperature, salinity). Raster images constitute an enormous source of information, which is still largely unexploited.
The complication is that each of these raster datasets typically consist of millions of pixels. While each individual objects in a vector dataset can be linked with RDF, using the same technique to describe every single pixel in a raster image is enormously resource consuming and of limited value in most cases. Ontologies applied to raster data should be designed to answer questions like: What type of information can I extract from this data? How can I combine different datasets for better value?
An example of a raster datasets showing sea water potential temperature in the Mediterranean. These data have dimensions in both space and time.
Fortunately there are a variety of different approaches that we can explore to link raster data.
If raster data represent images, one may be interested in features within the image (a church, a stadium, a park). In this case the first step is to extract information from the image (image classification) and create a new vector representation of that geographic area, allowing RDF mapping of vector data to be done as described above.
A more complex case is where raster images represent continuously variable physical fields, like in our Marine Quality Assessment service. In this case, a simple approach is to slice the image using pre-defined thresholds, thus identifying individual areas/polygons associated with features of interest (e.g. areas where the salinity exceeds a critical threshold). This approach will again resolve the problem by translating raster data into a vector representation which is suitable for RDF. The drawback is that the method forces the developer to choose the slicing thresholds beforehand, thus losing the original information richness.
A third and more flexible strategy builds on top of recent Array Databases such as Rasdaman or MonetDB, which allow users to query both metadata and raster data using the SQL/MDA (Multi-Dimensional Arrays) standard. This same approach could be used for implementing an RDF representation of raster data and information extraction. A SPARQL-like approach would then allow to extract and combine semantics from raster data without prior conversion into vector format, thus retaining the full information content. We are currently working in MELODIES to achieve this challenging goal.