One of the important channels businesses use to provide their services are their websites and applications. However, as these businesses expand their portfolio of services, there is a need for these applications to support an increasing customer base. From an IT infrastructure standpoint, this scenario poses two major challenges.
First, there is need to add more web servers/application servers to be able to support more users accessing these websites. And second, the databases have to be scaled to handle more data.
When we specifically look at handling the increasing data, we have to introduce additional databases and integrate them with the existing ones. Towards scaling and maintaining the databases, some of the challenges to be addressed are achieving synchronization, reliability (through replication) and managing hardware failures etc. At the same time, these databases need to be equally efficient by having low latency, good performance, and providing high availability.
A framework created to address these issues is the Hadoop project. In simplistic terms, Hadoop is a framework that facilitates storing and analyzing of large data sets across clusters of commodity hardware which can scale depending on the needs of the business.