Data replication is the concept of distributing data over a system. Such concept is better accomplished through a non-interactive and reliable process.
Replication is difficult to achieve in the case of relational databases. This is because they were not created to deal with horizontal scaling. In case of relational databases, replication and backup is carried out via a semi-manual process.
However, in the case of big data, it is usually required to ensure an automatic live recovery of large and geo-distributed datasets.
Traditional means of data redundancy focus on data mirroring. They replicate data over target arrays at the data centre or over a distant site. This method consumes a lot of storage space especially in the case of large datasets that exceeds petabytes. In fact, it is an overhead and expensive for organisation to store large streams of data (data in motion) as well as big data archives using traditional means.