FIWARE: Enabling IoT – Big Data Ecosystems

Apr 22, 2016Ecosystem

IoT and Big Data are the new wave on the Future Internet. On one hand IoT is enabling the generation of large volumes of data from its mobile connected assets. On the other hand, using Big Data analysis, extra insights, that have not been possible in the past, are obtained. The full potential of this data can be unlocked through the creation of an IoT Big Data Ecosystem (IoTBDE), which will allow organizations to capitalize on the business opportunities inherent in the availability of massive amounts of data and to facilitate the development of the next generation of smart services.

These new ecosystems are driven by the availability of data. Such data can be classified depending on origin and nature. The former distinguishes between public, private and operator data. The latter categorizes it in static data, real time data, historic data or inferred data.

Public data refers to data owned, generated and exposed by governments, municipalities or even public agencies. On the other hand, private data is data owned by businesses, for instance, occupancy levels in a private parking lot. Finally, operator data is the data managed by mobile network operators and has to do with operations made at the communication networks. Private and operator data can be offered on a commercial basis.

Static data refers to some structural aspect (location of point of interests, streets, roads) of the world which does not change so much over time. It is typically offered in the form of downloadable datasets. Real time data, on the other hand, is data coming from an IoT infrastructure and reports the dynamically-changing status of entities in the real world. Historical data, has to do with the evolution of the characteristic of an entity in a period of time, and, combined with geospatial properties, enables 4D analysis and representations. Last but not least, inferred data is insight data which has been derived from analytics performed by Big Data processes.

Until recently the open data movement has been focusing on public, static data. It is not uncommon to find that different cities, governments or public agencies are improving transparency and enabling innovation through data openness. The process consists of the indexation and publication of different data assets (datasets) using a portal, usually CKAN, or a spatial data infrastructure catalogue. Those artifacts are published in different formats and represent information using different conventions (units of measurement, coordinate systems, etc.). If a developer wants to make use of them, a process of Search, Download, Extract, Transform and Load has to be undertaken. That is not only expensive and error-prone, but the amount of resources needed to perform it increases linearly with the number of providers, or equivalently, increases with application coverage. And what is worse, if the data changes frequently, that process has to be repeated or automated to some extent.

A first approach to overcome the problems posed by the (SD)ETL approach are REST APIs. Instead of publishing datasets, data providers export a REST endpoint which accepts queries over the data exposed. For instance, currently, different weather data providers are offering APIs to get access to meteorological data throughout the world. APIs allow data providers to curate and mash-up data, offering an added value to data consumers. But still there are a plethora of different APIs, being provider lock-in a big issue. If provider changes, or a new provider is added, applications must be adapted. Last but not least, including multiple providers is hard and expensive, as a new API and format has to be learnt and integrated.

API harmonization for data access is a first step towards enabling a real IoTBDE ecosystem. The idea consists of defining a set of normalized operations, conventions and associated abstractions for querying data. As a result integration of different data sources is uniformized and made interoperable. Furthermore, developers themselves can export additional data using the same API, expanding and leveraging the overall ecosystem.

FIWARE is a champion in API harmonization. In fact, the NGSI version 2 is an open, RESTful API that allows providers to export data, using JSON representations, of different nature and origin uniformly. Below is described how NGSI version 2 can be used to get access, in a uniformed manner to data of different nature and on different spatial or temporal dimensions:

What are the Mercedes Vehicles currently at a radius of 10 kms with center Gangnam-Gu?

GET /v2/entities?type=Vehicle&coords=37.496667,127.0275
&geometry=point;&georel=near;maxDistance=10000&q=manufacturer:’Mercedes Benz’

Tell me vehicle faults which happened today

GET /v2/entities?type=VehicleFault&q=startDate>=2016-04-20T00:00:00

Tell me the weather forecast for the city of Porto, Portugal

GET /v2/entities?type=WeatherForecast&q=country:PT;addressLocality:Porto

What was the ambient observed at 11:00 AM at the "Plaza de España" air quality station?

GET /v2/entities?type=AmbientObserved&q=stationCode:28079004;hour:11

As shown above, using the same API, data of different nature and origin can be exposed. That is a significant step ahead. Data providers no longer need to create datasets but to publish their NGSI version 2 endpoints to an API directory, like CKAN.

However, even if two providers implement the same API, there can be differences in data structure and representation. For instance, is relative humidity given as a relative or as an absolute value? Is wind direction represented as an angle or as a cardinal point? What is the name of the property which represents relative humidity? ‘relH’, ‘relativeHumidity’, ‘relative_humidity’. Is the validity of a weather forecast grouped as an object with two properties (‘from’, ‘to’) or with two top-level independent properties? These questions suggest that harmonized APIs are a necessary but not a sufficient condition to foster developer-friendly IoT and Big Data Ecosystems. Thus, if data models are not harmonized, then developers are, in practice, forced to change their application when porting it to another context (ex. a different city).

Harmonizing data models means to create a shared vocabulary of terms and relationships that provide uniformity on the representation of different concepts (parking, public transport, weather, …). Harmonized APIs and data models, together, will enable to create applications portable at data level. FIWARE has started an agile, implementation driven process to devise those harmonized data models. In a previous post we described the first results obtained in cooperation with GSMA and Korea Telecom. In fact, FIWARE, Telefónica and other partners showcased at MWC2016, GSMA innovation city, a car navigator capable of exploiting different real time data (environmental and parking data) offered by several cities in different countries. This car navigator was built regardless the city, as all of them exported harmonized APIs and data models for data coming from different sensors or external systems. Such application is a salient example of the extraordinary opportunity behind IoT and Big Data ecosystems for smart applications and the potential of new businesses for telco operators, data providers, application developers and systems integrators in this vibrant space.

Cookie	Domain	Expiration	Purpose	Intrusiveness level
PHPSESSID	i4trust.org	Session	PHP session cookie associated with embedded content from this domain.	Medium

Cookie	Domain	Expiration	Purpose	Intrusiveness level
_ga	.spotlightr.com	2 years	Google Analytics to count and track pageviews.	Medium
_ga	.hotjar.com	2 years	Google Analytics to count and track pageviews.	Medium
_ga	.fiware.org	2 years	Google Analytics to count and track pageviews.	Medium
_gaexp	.spotlightr.com	3 months	Google Analytics.	Medium
_gat	.fiware.org	1 hour	Google Analytics.	Medium
gat_gtag_UA*_1	.fiware.org	1 hour	Google Tag Manager to store and track conversions.	Medium
_gid	.spotlightr.com	1 day	Google Analytics to count and track pageviews.	Medium
_gid	.hotjar.com	1 day	Google Analytics to count and track pageviews.	Medium
_gid	.fiware.org	1 day	Google Analytics to count and track pageviews.	Medium

Cookie	Domain	Expiration	Purpose	Intrusiveness level
_hjFirstSeen	.hotjar.com	30 minutes	Hotjar cookie unclassified	Medium
_hjTLDTest	.hotjar.com	Session	Hotjat to get temporal information about cookies path.	Low
_hjTLDTest	.fiware.org	Session	Hotjat to get temporal information about cookies path.	Low
_hjid	.hotjar.com	1 year	Hotjar to store a unique user ID.	Medium
_hjid	.fiware.org	1 year	Hotjar to store a unique user ID.	Medium
ajs_anonymous_id	.hotjar.com	1 year	Atlasian Jira Servicedesk to store last visit.	Medium
datr	.facebook.com	2 years	Facebook to provide fraud prevention.	Medium
etBloomCookie_optin_*	www.fiware.org	1 day	This cookies helps us manage the sign-up to our mailing lists.	Medium
et_bloom_optin_optin_4_*_imp	www.fiware.org	1 year	Captures the Name and Email ID for Newsletters.	Medium
fr	.facebook.com	3 months	Facebook to Enable ad delivery or retargeting.	Medium
ginger-cookie	www.fiware.org	1 year		Medium
intercom-id-*	.spotlightr.com	9 months	Intercom Messenger.	Medium
intercom-session-*	.spotlightr.com	7 days	Intercom Messenger.	Medium
sb	.facebook.com	2 years	Facebook to store browser details.	Medium
viewerId	www.fiware.org	3 months	The viewerID cookie allows the site to retain your product viewing history across sessions.	Medium
vooplayerVideo*	www.fiware.org	Session	It serves to determine which part of the video the user has stayed.	Medium

FIWARE: Enabling IoT – Big Data Ecosystems

Related articles

A Milestone in Digital Innovation: Córdoba FIWARE Digital Transformation Lab

Enhancing Local Innovation with Global Ambition: Developing Digital Talent within the FIWARE Ecosystem

Driving Urban Transformation Through High Value Datasets: BeOpen Project Pilots