FIWARE: Enabling IoT – Big Data Ecosystems

Apr 22, 2016Ecosystem

IoT and Big Data are the new wave on the Future Internet. On one hand IoT is enabling the generation of large volumes of data from its mobile connected assets. On the other hand, using Big Data analysis, extra insights, that have not been possible in the past, are obtained.  The full potential of this data can be unlocked through the creation of an IoT Big Data Ecosystem (IoTBDE), which will allow organizations to capitalize on the business opportunities inherent in the availability of massive amounts of data and to facilitate the development of the next generation of smart services.

These new ecosystems are driven by the availability of data. Such data can be classified depending on origin and nature. The former distinguishes between public, private and operator data. The latter categorizes it in static data, real time data, historic data or inferred data.

Public data refers to data owned, generated and exposed by governments, municipalities or even public agencies. On the other hand, private data is data owned by businesses, for instance, occupancy levels in a private parking lot. Finally, operator data is the data managed by mobile network operators and has to do with operations made at the communication networks. Private and operator data can be offered on a commercial basis.

Static data refers to some structural aspect (location of point of interests, streets, roads) of the world which does not change so much over time. It is typically offered in the form of downloadable datasets. Real time data, on the other hand, is data coming from an IoT infrastructure and reports the dynamically-changing status of entities in the real world. Historical data, has to do with the evolution of the characteristic of an entity in a period of time, and, combined with geospatial properties, enables 4D analysis and representations. Last but not least, inferred data is insight data which has been derived from analytics performed by Big Data processes.

Until recently the open data movement has been focusing on public, static data. It is not uncommon to find that different cities, governments or public agencies are improving transparency and enabling innovation through data openness. The process consists of the indexation and publication of different data assets (datasets) using a portal, usually CKAN, or a spatial data infrastructure catalogue. Those artifacts are published in different formats and represent information using different conventions (units of measurement, coordinate systems, etc.). If a developer wants to make use of them, a process of Search, Download, Extract, Transform and Load has to be undertaken. That is not only expensive and error-prone, but the amount of resources needed to perform it increases linearly with the number of providers, or equivalently, increases with application coverage. And what is worse, if the data changes frequently, that process has to be repeated or automated to some extent.

A first approach to overcome the problems posed by the (SD)ETL approach are REST APIs. Instead of publishing datasets, data providers export a REST endpoint which accepts queries over the data exposed. For instance, currently, different weather data providers are offering APIs to get access to meteorological data throughout the world. APIs allow data providers to curate and mash-up data, offering an added value to data consumers. But still there are a plethora of different APIs, being provider lock-in a big issue. If provider changes, or a new provider is added, applications must be adapted. Last but not least, including multiple providers is hard and expensive, as a new API and format has to be learnt and integrated.

API harmonization for data access is a first step towards enabling a real IoTBDE ecosystem. The idea consists of defining a set of normalized operations, conventions and associated abstractions for querying data. As a result integration of different data sources is uniformized and made interoperable. Furthermore, developers themselves can export additional data using the same API, expanding and leveraging the overall ecosystem.

FIWARE is a champion in API harmonization. In fact, the NGSI version 2 is an open, RESTful API that allows providers to export data, using JSON representations, of different nature and origin uniformly. Below is described how NGSI version 2 can be used to get access, in a uniformed manner to data of different nature and on different spatial or temporal dimensions:

What are the Mercedes Vehicles currently at a radius of 10 kms with center Gangnam-Gu?

GET /v2/entities?type=Vehicle&coords=37.496667,127.0275
&
geometry=point;&georel=near;maxDistance=10000&q=manufacturer:’Mercedes Benz’

Tell me vehicle faults which happened today
GET /v2/entities?type=VehicleFault&q=startDate>=2016-04-20T00:00:00
 
Tell me the weather forecast for the city of Porto, Portugal
GET /v2/entities?type=WeatherForecast&q=country:PT;addressLocality:Porto
 
What was the ambient observed at 11:00 AM at the "Plaza de España" air quality station?
GET /v2/entities?type=AmbientObserved&q=stationCode:28079004;hour:11
 


As shown above, using the same API, data of different nature and origin can be exposed. That is a significant step ahead. Data providers no longer need to create datasets but to publish their NGSI version 2 endpoints to an API directory, like CKAN.

However, even if two providers implement the same API, there can be differences in data structure and representation. For instance, is relative humidity given as a relative or as an absolute value? Is wind direction represented as an angle or as a cardinal point? What is the name of the property which represents relative humidity? ‘relH’, ‘relativeHumidity’, ‘relative_humidity’. Is the validity of a weather forecast grouped as an object with two properties (‘from’, ‘to’) or with two top-level independent properties? These questions suggest that harmonized APIs are a necessary but not a sufficient condition to foster developer-friendly IoT and Big Data Ecosystems. Thus, if data models are not harmonized, then developers are, in practice, forced to change their application when porting it to another context (ex. a different city).

Harmonizing data models means to create a shared vocabulary of terms and relationships that provide uniformity on the representation of different concepts (parking, public transport, weather, …). Harmonized APIs and data models, together, will enable to create applications portable at data level. FIWARE has started an agile, implementation driven process to devise those harmonized data models. In a previous post we described the first results obtained in cooperation with GSMA and Korea Telecom. In fact, FIWARE, Telefónica and other partners showcased at MWC2016, GSMA innovation city, a car navigator capable of exploiting different real time data  (environmental and parking data) offered by several cities in different countries. This car navigator was built regardless the city, as all of them exported harmonized APIs and data models for data coming from different sensors or external systems. Such application is a salient example of the extraordinary opportunity behind IoT and Big Data ecosystems for smart applications and the potential of new businesses for telco operators, data providers, application developers and systems integrators in this vibrant space.

Related articles