Nowadays, every enterprise is surrounded by tons of data from many different sources, in which is continuously increasing second by second. So we must deal with a huge amount of data – or we call this problem is Big Data. Thus, Big Data is actually a hard problem?
Nowadays, every enterprise is surrounded by tons of data from many different sources, in which is continuously increasing second by second. So we must deal with a huge amount of data – or we call this problem is Big Data. Thus, Big Data is actually a hard problem? Fortunately, we have Hadoop and NoSQL. Today, we are making a comparison between them in order to find out some common and different features, and answer the question “What will happen when we combine these two technologies?”
Each of these is closely associated with big data, so there’s overlap in terms of what they are designed to do. For example, they’re both great for managing large and rapidly growing data sets, and also good at handling a variety of data formats, even if those formats change over time.
The issue of data volume: both Hadoop and NoSQL can leverage commodity hardware working together as a cluster. To handle larger data sets, you simply add more hardware to the cluster in a model known as horizontal scaling, also referred to as scaling out. Contrast this to scaling up, in which you upgrade your existing servers with more powerful hardware.
Data formats: both technologies are suitable for the different types you want to manage, including log files, documents, and rich media. Just as importantly, if you have structured data in which the structure differs between records, or if the structure likely will change in the future, then NoSQL and Hadoop are appropriate technologies for your environment.
While each technology is great for big data, they are intended for different types of workloads.
NoSQL is about real-time, interactive access to data. NoSQL uses cases often entail end user interactivity, like in web applications, but more broadly they are about reading and writing data very quickly.
Hadoop is about large-scale processing of data. To process large volumes of data, you want to do the work in parallel, and typically across many servers. Hadoop manages the distribution of work across many servers in a divide-and-conquer methodology known as MapReduce. Since each server houses a subset of your overall data set, MapReduce lets you move the processing close to the data to minimize network accesses to data that will slow down the task.
What will happen when combine two technologies?
NoSQL and Hadoop work together quite well as components in enterprise data architecture. Hadoop and NoSQL must be deployed together. In a typical architecture, NoSQL has responsibility to interact with data, and Hadoop cluster is used for large-scale data processing and analytics. With NoSQL, you can manage user transactions data, sensor data, or customer profile data. Hadoop will help you to analyze that data for outcomes like generating recommendations, performing predictive analytics, and detecting fraudulent activities.
In the end, we can see that we have many technologies for dealing with Big Data. Some of them have few common features with the different responsibility. We usually combine some these technologies in one system to get more efficiently.
Source: MapR Blog