A Cloud Platform aimed at environmental services
Welcome on board! That’s the notice I recently had the pleasure to send to all the MELODIES project partners who are developing environmental services on the MELODIES Cloud Platform. Each of these services is aimed at maximizing the exploitation of linked open data and the MELODIES Cloud Platform allows them to efficiently scale-up their processing of environmental and earth observation information.
I’ll detail hereafter how these services are implemented via the Melodies Cloud Platform on Virtual Machines (VMs): Hadoop Sandbox VMs and Strabon Database VMs forming the core technology. Once they have been developed on the Platform, these services are ready to run compute-intensive tasks, on-demand, on the Cloud.
To achieve this service level, the MELODIES Platform builds on three major outcomes of the recent developments in Computer Science and Web technology - Cloud Computing, Open Data repositories, and Linked Open Data web resources.
Cloud Computing enablers: a large number of organizations have now entered the IT market with offerings in virtualization hypervisors, virtual machine management and monitoring, virtual private servers and other resource virtualization-based solutions. Numerous Cloud Services delivery models have matured over time and are contributing their share of new fancy acronyms: SaaS (Software as a Service), PaaS (Platform as a Service) and IaaS (Infrastructure as a Service). But most of all, they bring opportunities for innovation and even some market disruptions.
Open Data opportunities: Open Data relies mostly on the ubiquitous development of the Web and the massive digitalization of our economic and social activities. One particular Open Data domain of opportunity is open government data, which is forming the basis of new relationships between government, business and citizens. Governments have generated and collected vast amounts of publicly-funded datasets, and opening up their access should lead to greater efficiencies and more business opportunities in building added-value services.
Linked Open Data advantages: Many kinds of geospatial data are becoming available as linked datasets, leveraging the proliferation of open geospatial information on the Web (e.g. GeoNames, OpenStreetMap, user-generated geospatial content, etc.). Since a lot of data useful to the wider public has geospatial and time components (e.g. open government data), we expect this trend to continue in the near future. Linked Data offers a powerful technology to mine, filter and combine large datasets from diverse sources. Linked Open Data technologies are designed to natively operate at the scale of the Web. Linked Data leverages the current protocols of the World Wide Web as a means to publish and access structured data (using RDF encoding and HTTP URIs). Understanding Linked Open Data as a Web-Scale Database, the ‘Web of data’, is a game changer.
So let’s have a look into what MELODIES can do to support the creation of environmental services, what the MELODIES Cloud Platform is used for, and how could a beginner use it.
A Platform and a team
Team members from Terradue and University of Athens are providing user support to MELODIES partners through a tickets system on the MELODIES support site. We do our best to be reactive in answering partners issues there. It is a convenient tool to keep track of progress, share experience, and allow efficient team work. Through the support site, each Partner can request access via the MELODIES Platform to one or more of the following: a Hadoop Sandbox VM for the development, test and validation of scalable, parallel processing workflows; a Strabon Database VM for Linked Data ingestion and query; and additional resources according to the individual architecture requirements of the service (e.g. storage, software etc). These VMs are CentOS-based VMs and can be extended with other software services, most of which are installed by the users themselves (via RPMs managed on Yum or GitHub repositories). Users can also get examples and guidance from the extensive online documentation that is provided at the end of this post.
The Hadoop Sandbox VM type is strictly dedicated to integrating processors for scalable distributed computing. This VM scels out as one Master node and several Worker Hadoop nodes working as a cluster; the Master node runs services which orchestrate task distribution on the cluster. The Hadoop VM runs on a Cloud Provider infrastructure, and is accessible as an independent domain once instantiated. It operates primarily under a simulation mode (sandbox mode) and can later on be deployed at scale (cluster mode).
The Strabon Database VM type brings a semantic spatiotemporal RDF store. You can use it to store linked geospatial data that changes over time and pose queries using two popular extensions of SPARQL. Strabon supports spatial data types and the serialization of geometric objects in OGC standards: Well-Known Text (WKT) and Geography Markup Language (GML). It also offers spatial and temporal selections, spatial and temporal joins, a rich set of spatial functions and support for multiple Coordinate Reference Systems.
What kind of data processing chains can a developer implement on the Platform?
We usually use the term "processor" to refer to data processing tools which can be embedded on the Hadoop Sandbox VM (e.g. the NEST and BEAM processors provided by the European Space Agency). These are processors that will scale out on the Cloud, in order to massively process high volumes of EO data. Each node of the Hadoop cluster receives a job description from your Workflow and then processes that job using these ebedded tools.
Then semantic geospatial RDF stores also play a prominent role as the core data repository to be used in MELODIES. Strabon outperforms every other geospatial RDF store in terms of functionality and performance. Strabon helps to transform Earth observation products into RDF, as well as combining these products with other relevant linked geospatial data sources, and querying them in a user-friendly language which provides efficient array manipulation capabilities.
As you see, you can easily test your linked open data architecture involving both massive Earth observation data processing and linked open data management, at a very low cost on the Melodies Cloud Platform. Part of our support activity at Terradue and the University of Athens is to help in testing these architectures before deployment to a commercial Cloud provider. Then we also support users for on-demand deployment their full processing chains, on the Cloud of their choice.
So, ready to sing our MELODIES song ? Welcome on board then!
- Terradue's DevOps Platform, OpenNebula Conference, Berlin, 25th September 2013
- Terradue’s Developer Cloud Sandboxes - User guide
- Terradue’s Developer Cloud Sandboxes - Reference manual
- Terradue’s Developer Cloud Sandboxes - How-to guide with hands-on exercises
- NKUA’s Strabon Database and stSPARQL - User guide
- NKUA’s Strabon Database and stSPARQL - Reference manual
- NKUA’s Strabon Database and stSPARQL - How-to guide with hands-on exercises