Have you ever think how Facebook , Google and other big MNC’s manage your data !!!

Vipul Vats
2 min readSep 17, 2020

--

Data has become one of the most important assets for business. Every companies uses data to provide better experience to the users. But Have you ever think how these companies store, manage and manipulate 1000 of Terabyte of data with high speed and efficiency ? And what type of problems faced by these companies to perform these tasks ? And what are the strategy they use to handle these problems ?

The answer is NO !!! no ones care about that . What we cares that how to upload something and what are the reactions we get on that.

Today with the help of this article i am going to discuss about above mentioned questions.

BIG DATA PROBLEM :

Facebook ,one of the most popular social media platform manage 500+ Terabytes of data on average daily basic in the form of photos, videos , comments and reactions. One of the big stats about Facebook is that Facebook system processes 2.5 billions pieces of content each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day. and it scans roughly 105 terabytes of data each half hours.

These were the numbers of Facebook then just think about Google , YouTube, Twitter ,Instagram .

In technological world these data are known as BIG DATA. There are challenges to managing such a huge volume of data such as capture, store, data analysis, data transfer, data sharing , etc.

The issues faced by these companies to handle millions of user’s data is given a name as Big Data problem.

SOLUTIONS :

The answer to above all problems is Distributed Storage.

A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

The model used by Distributed Storage is master-slave model. This solve out the problem of volume and velocity or I/O .

Master-Slave model

This whole system is known as cluster.

HADOOP is one of the popular software which is used to implement Distributed Storage Cluster .This software is used by many big MNC’s like Facebook ,Google , etc.

--

--