Evolving Cosmos Global Instance: new storage cluster released

Feb 3, 2016Tech

The arising of Cygnus as a tool able to build historic collections of context data fired the usage of a wide variety of storages, the Global Instance of Cosmos among them. Such a global instance was designed as a shared Hadoop cluster where both storage and computing capabilities coexisted. 10 virtual machines, a total amount of 1 TB of disk and 80 GB of distributed memory did the rest. And it was good. For a while. A couple of years after, more than 600 registered users had exhausted the instance. Thus, time for a new beginning.

Some months ago our Big Data team at FIWARE predicted such an exhaustion, and they drafted the foundations for a new Global Instance of Cosmos, more powerful, modern, with extended services, and designed to last. These are the main decisions they took:

  • The cluster would be divided between storage and computing. This means one specific cluster for storage would be deployed, and a second specific cluster would be in charge of computing.
  • Each cluster would remain private, in the sense the users would not be able to directly access them, but a services node would be deployed for each one, as the single entry points for the instance.
  • Even in the services nodes, the ssh access to the infrastructure would be avoided, and only service APIs would be exposed to the users.

Why dividing storage and computing? This was the most difficult decision we made, since it is known the performance of Big Data clusters is best when networking is avoided, i.e. storage and computing are in the same machine and the data has not to be moved. Nevertheless, we could see some advantages we could not obviate. The main one is putting the data in a storage-only cluster make us independent on the analysis technology; HDFS is nowadays seen as a general purpose storage mechanism used by a wide variety of computing platforms, so this allows us to move in the future to another technology different than Map and Reduce very easily.

Why making private the clusters? Mainly for security reasons, our aim was the 100% of any new cluster was unaccessible, unlike the old one were the Namenode was directly exposed since it was holding the available catalogue of services. Now, such a responsibility falls into dedicated servers running clients connected to the clusters behind.

Why avoiding ssh? Because there is not need for a shell-based access. All the Big Data services can be run from remote, using custom or general purpose clients. And because shell-based access in an opened door for doing a bad usage (not necessarily malicious) of the shared resources (applications may be installed, data can be left in the local accounts…).

Being said that, today we can announce the results of such a design: the deployment of the storage cluster has been completed, while the computing cluster setup is in a good progress. Regarding the storage one (there will probably be another entry in this blog for the computing cluster), we can go deeper into the details [1]:

  • It is based on 11 nodes: 1 services node, 2 Namenodes (high availability configuration) and 8 datanodes.
  • Total storage capacity of 52.5 TB (17.5 TB if we consider a replication factor of 3).
  • HDFS version: 2.4.0
  • Available services: WebHDFS/HttpFS for I/O, with OAuth2-based protection.

Great, but you may be wondering… how does this affect me? Since many users of FIWARE are only interested in storing large amounts of data for future analysis, these are good news for them: they can now move from the old Global Instance to the new one. This is something that all the users will have to do in the mid term, and the old cluster will continue running in the short term. Nevertheless, doing it right now has several advantages:

  • Pioneers will have a large HDFS quota.
  • Since the new storage cluster is based on bare metal, it is faster than the old one.
  • The services installed run modern versions of the software; this means new features not available in the old cluster.
  • Fewer errors due to the lack of disk space will be experienced.

So, if you are interested, please contact our team… Just begin to pack and move!

Big Data Team at FIWARE.

[1] https://github.com/telefonicaid/fiware-cosmos/blob/master/doc/deployment_examples/cosmos/fiware_lab.md

Related articles