Thursday, December 31, 2015

What is CAP Theorem?



What is CAP Theorem?


In this blog , you will find the answer of this kind of questions. 
First of all, knowing the subject theoretically makes experiencing on new technologies easier.
So you can easily understand why this technology behave like this etc...

Today's subject CAP Theorem.

CAP Theorem is related with the Database systems. In Big Data world, Databases have very important place. They can be used both source and target. That means you can write your data to a Database or you can feed your Big Data Architecture from the data which is stored in a Database.

There are many kinds of Databases. I will not mention this today. I will mention how Databases behave when they run and how CAP theorem can be used to classify their behaviour.

As you can understand this theorem is completely about the Database behaviours.

First of all, please be sure that you know what is Distributed Systems. :) Today only CAP Theorem.
 
CAP Theorem is designed for Distributed Systems actually. There are 3 main points:

C for Consistency
A for Availability
P for Partition Tolerance

What are these terms?

Before explaining that, lets assume we have 3 nodes Distributed System.And We have installed them a kind of NoSQL Database.(MongoDB,Cassandra,HBase etc...)

 Consistency: If you want to reach your new Distributed NosSQL Database and when you send a request, you should have the same response even if you send the request from different nodes.

For example; To node1 an update has come and after that update suddenly node1 has died.
And at the same time you have send request from node2.
What happens now???
If your system is Consistent, you get the updated response, but if not you get the older version of the response.
It is simple like that. 
Who is Consistent?
In NoSQL DB's MongoDB,Hbase,MemCacheDB,Redis are popular Consistent examples.


Availability: This is what I like most. In some ways, every nodes have to return response. Because the whole system has Availability. In nodes there can be replicas of the system or there can be such a controller system(like Zookeeper, details later).

For example: This time you have send request the system from node2 and then node2 has died. 
If your system has Availability then you get a result finally.It is sure.

Lets change the situation and make it more difficult.You update data form node1 and then node1 has gone almost the same time. Your system has Availability. OK? So you get the response for sure.
Is it the updated data? 
If you dont have Consistency system unfortunately you get the older version of the data.
But If you have Consistency you get the correct updated response.

Therefore Availability is great but by itself it is not enough :( 

Who has Availability ?
In NoSQL DB's Cassandra,CouchDB,Riak are popular Availability examples.


Partition Tolerance: This property is very useful too. It means the system works well even if some of the nodes are down. Actually above examples and questions we have already met with this property. Because our nodes have died but we still have response. This property a kind of prerequest of being a Distributed System.

So we can clearly see that these 3 properties have to work together. But unfortunately working at the same time for all of these 3 is impossible. This is it because of Partition Tolerance. 

Your system can be Available Partition(AP) or Consistent Partition(CP) if Distributed but if not distributed it can be Consistent Available(CA). Relational Databases are the best examples of CA structure.

Why do we have to choose 3 of these 2?
In a distributed system, when there is an inevitable network partition (and the cluster breaks into two or more “islands”), you can’t guarantee both Availability (for updates) and Consistency.

Finally we can reach this famous triangle picture.
It is very useful and great summary of CAP Therom:


 
I believe we all understood the CAP Theorem.

Wait for the next Big Data Topics,

Best Regards,

OD

Tuesday, December 22, 2015

Intro



Welcome Art of Big Data



First of all this blog is only designed for Big Data and Big Data related topics.
The target is make this blog a kind of dictionary about Big Data.

Especially to share my experiences, my education subjects and my research topics will be written to this blog. If you want to be a certified expert about Big Data, this blog is designed for you.

If you want to find the answer of the question "What is Big Data?", please google it.
You can find related topics and lessons, examples, use cases and architectural side of Big Data concept.

Every week at least one subject will be in this blog.


I hope this blog will be useful for all of us and Art of Big Data will be a significant resource for experts all around the world.


Best Regards,

OD