Linked Open Environmental Data and MELODIES

Linked Data

Imagine that a friend told you about the movie "Midnight in Paris" and you wanted to learn more about it, what would you do? Google it! Imagine now that you wanted to find the cinema playing the movie which has the cheapest tickets or the cinema playing the movie which is closest to you. What would you do? You can still use Google, but how many clicks would you need?

The solution to this kind of problem is Linked Data.

The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. By following these practices data from diverse sources can use the same standard format, which allows them to be combined and integrated. This is a new way of handling online data that was not offered by the original Web - which concentrated mainly on the interchange of documents - but which is supported by the newly emerging Semantic Web or "web of data".

The LOD cloud

Open Data

Tim Berners-Lee, the inventor of the Semantic Web proposed the "Five Stars of Open Data" that describe different levels of "openness". These steps are the following:

  1. Data are available on the Web under an open license, only in human-readable form (e.g. a PDF document).
  2. Data are available as structured data (e.g. an Excel spreasheet).
  3. Data use a non-proprietary format (e.g. a CSV file)
  4. Data use World Wide Web Consortium standards, such as URIs to identify important things and concepts.
  5. Data are linked to other people's data to provide context.

5-star steps by example

Linked Open Environmental Data

In the environmental domian open data is becoing the norm as many datasets about the environment are being made available through various data portals, but only a small amount of these datasets can be found on the LOD (Linked Open Data) cloud.To accelerate the linking of these open datasets, the European Union has introduced the INSPIRE Directive which requires public authorities across Europe to provide access to their environmental datasets through the adoption of a common framework. This framework means that datasets can be uniquely identified within a pan-European spatial data infrastructure.

Linking open environmental data is the main idea on which the MELODIES project is based. The use of Linked Data in MELODIES will allow the collection of multiple data sources that lie in different data portals, query them and interconnect them using semantic web technologies in order to build applications with great environmental value.

The RDF data model and the SPARQL query language

The common standard format used for expressing information about resources on the Semantic Web is the Resource Description Framework (RDF). The RDF data model allows us to exchange information among applications without loss of meaning. RDF is accompanied by the SPARQL query language which is used for retrieving or modifying data expressed in the RDF data format.  

For the representation of geospatial RDF data that changes over time, the stRDF data model can be used. This is an extension of RDF with geospatial and temporal capabilities. Similarly, for querying geospatial linked data the stSPARQL query language is used. You can find a tutorial on stSPARQL written by the Department of Informatics and Telecommunications at the University of Athens here.

Linked Data in MELODIES

In the MELODIES project auxiliary data which offers queriyng functionalities is being used to enrich the data produced by each of the environmental services. This allows the environetnal data to be explored and manipulated in new and powerful ways. The datasets subsequently produced will follow the 5-star plan described by Tim Berners-Lee.

From a more technical point of view, the life-cycle of data in MELODIES is the following:

  1. Discover the right data sources (e.g., public sector data, Earth Observation data).
  2. Extract the schema of the data. This is done by constructing an OWL ontology that describes the data.
  3. Transformation of the source data into RDF. For this, new tools are being developed at the University of Athens that are able to take as input Earth Observation data and other geospatial data and automatically transform them into RDF.
  4. Storage of the resulting data into the RDF store Strabon which allows querying geospatial data that changes over time using the stSPARQL query language.
  5. Interlink the resulting RDF data with other available linked data, either within or outside MELODIES.  For this step, new link discovery tools will be developed that will be able to connect RDF data by discovering useful topological and temporal relations that hold among them.

Add new comment