Big Data and Cloud are different use cases. While the former is for aggregates, the latter is for division. For Big Data, we require large processing power and storage. Big Data works on clusters of computers, distributing the workload in parallel. Whereas in the cloud, you get your own partition. Virtualization in the Cloud is about partitioning away different programs along with their data. The good news is that the Cloud scales well. But how much can it really scale ?
Big Data is like a supercomputer setup that demands a lot. While we can host a Big Data setup on the Cloud, the question is ‘can the Cloud provide the necessities for it’ ? Maybe if you are hosting your Big Data setup at Amazon or Google, you are assured of the processing power and phenomenal storage that they laud about. But for a small time Cloud provider, Big Data can become their worst nightmare. The reason being that they have limited number of CPU and storage.
The other problem with Big Data is streaming. This is nothing but the online data being processed on the fly. For this, we need a feed directly into the Cloud, which provides the Big Data setup. If we were to transfer this data from our company to the cloud, you know how network speeds are. It may take while and streaming won’t be online anymore. Scientists who gather Peta Bytes of data often ship this information to their Cloud providers. But this is data that does not need to be streaming.
So what is the solution. The answer is Private or Hybrid Clouds. The part that is Big Data dependent can be put into a private cloud or the private part of the Hybrid Cloud. This setup can have all the processing and storage power that is required by the Big Data initiative, and thus you are dedicating your own Big Data Cluster for your requirements. This will also guarantee privacy as this may be data that is meant for your eyes only.
So is Big Data and Cloud a match. I would say yes, if you assess the situation clearly.