Ricevi gli ultimi aggiornamenti da Hortonworks tramite e-mail

Una volta al mese, ricevi gli approfondimenti, le tendenze, le informazioni analitiche e la conoscenza approfondita dei big data.


Sign up for the Developers Newsletter

Una volta al mese, ricevi gli approfondimenti, le tendenze, le informazioni analitiche e la conoscenza approfondita dei big data.

invito all'azione

Per iniziare


Sei pronto per cominciare?

Scarica Sandbox

Come possiamo aiutarti?

* Ho compreso che posso disdire in qualsiasi momento. Sono inoltre a conoscenza delle informazioni aggiuntive presenti nella informativa sulla privacy di Hortonworks.
chiudiPulsante di chiusura
February 07, 2018
diapositiva precedentediapositiva successiva

How Apache Hadoop 3 Adds Value Over Apache Hadoop 2

Thank you to Vinod Vavilapalli and Saumitra Buragohain for contributing to this blog.

This is the 2nd blog of the Hadoop Blog series (part 1, part 3part 4part 5). In this blog, we will show how Apache Hadoop 3 adds value over Apache Hadoop 2 to bring agility and time to market, lower total cost of ownership, scalability and availability and additional new use cases.

Everyone is asking – What is the difference between Apache Hadoop 3 versus Apache Hadoop 2. What’s all this commotion and ruckus mean?  What is Hadoop 3 paving the way towards?

Where to start!  Hadoop 3 combines the efforts of hundreds of contributors over the last five years since Hadoop 2 launched. Several of these committers work at Hortonworks.

Let’s start with your top value propositions around Hadoop 3 and how it can help your organization.

Agility & Time to Market
Although Hadoop 2 uses containers, Hadoop 3 containerization brings agility and package isolation story of Docker.  A container-based service makes it possible to build apps quickly and roll one out in minutes. It also brings faster time to market for services.

Total Cost of Ownership
Hadoop 2 has a lot more storage overhead than Hadoop 3. For example, in Hadoop 2, if there are 6 blocks and 3x replication of each block, the result will be 18 blocks of space.

With erasure coding in Hadoop 3, if there are 6 blocks, it will occupy a 9 block space – 6 blocks and 3 for parity – resulting in less storage overhead.  The end result -instead of the 3x hit on storage, the erasure coding storage method will incur an overhead of 1.5x, while maintaining the same level of data recoverability. It halves the storage cost of HDFS while also retaining data durability.  Storage overhead can be reduced from 200% to 50%. In addition, you benefit from the tremendous cost savings.

Scalability & Availability
Hadoop 2 and Hadoop 1 only use a single NameNode to manage all Namespaces. Hadoop 3 has multiple Namenodes for multiple namespaces for NameNode Federation which improves scalability.

In Hadoop 2, there is only one standby NameNode.  Hadoop 3 supports multiple standby NameNodes. If one standby node goes down over the weekend, you have the benefit of other standby NameNodes so the cluster can continue to operate.  This feature gives you a longer servicing window.

Hadoop 2 uses an old timeline service which has scalability issues.  Hadoop 3 improves the timeline service v2 and improves the scalability and reliability of timeline service.

New Use Cases
Hadoop 2 doesn’t support GPUs. Hadoop 3 enables scheduling of additional resources, such as disks and GPUs for better integration with containers, deep learning & machine learning.  This feature provides the basis for supporting GPUs in Hadoop clusters, which enhances the performance of computations required for Data Science and AI use cases.

Hadoop 2 cannot accommodate intra-node disk balancing. Hadoop 3 has intra-node disk balancing. If you are repurposing or adding new storage to an existing server with older capacity drives, this leads to unevenly disks space in each server.   With intra-node disk balancing, the space in each disk is evenly distributed.

Hadoop 2 has only inter-queue preemption across queues. Hadoop 3 introduces intra-queue preemption which goes to the next level time by allowing preemption between application within a single queue. This means that you can prioritize jobs within the queue based on user limits and/or application priority

In conclusion, we are very excited about the upcoming releases on Hadoop 3.  The accelerated release schedule plans anticipated for this year will bring even more capabilities into the hands of the users as soon as possible.  If you look at the blog published last year called Data Lake 3.0: The Ez Button To Deploy In Minutes And Cut TCO By Half, we will see many of the Data Lake 3.0 architecture and innovations from the Apache Hadoop community come to life in our next release of the Hortonworks Data Platform.




Syed Murtaza Saleem says:

how the existing users of hadoop 2 will leverage the advance features of v3? it seems, they have to setup a completely new environment (cluster) for Hadoop 3 and then migrate stuff from Hadoop 2 OR an upgrade will do the job?

Saumitra Buragohain says:

Hadoo2 to Hadoop3 upgrade will be a seamless in-place upgrade with no requirement for data migration. 3 replicas in Hadoop2 will be retained as 3 replicas in Hadoop 3. If the user wants to reduce storage overhead for cold data, he/she can selectively decide which folder to be Erasure Coded.

Sam says:

When is Hadoop 3.0 slated for release from Hortonworks? Is there a product roadmap that you can share for 3.x+?
I’m looking forward to leverage multiple Namenodes for multiple namespaces to achieve better multi-tenancy isolation.

Saumitra Buragohain says:

We announced HDP 3.0 on June 18th. Please refer to the following blog for more info on multiple namespaces with Name Node Federation.

Roni says:

Thank you for your interest. Unfortunately, future release dates for HDP have not been made public yet. We’re glad you’re excited about the multiple NameNode to help with multitenancy isolation.

onebox app says:

Great read, thank you so much for the wonderful and I am waiting for the HDP. Thank you.

Raj Kumar says:

Thanks for the update and for the introduction of Hadoop 3.


I’ve started to learn Hadoop 2. This was a good read although. Thanks

Ankit says:

Hadoop 2 can also make use of multiple name nodes by making use of Federation concept (segregating NN to manage independent & individual namespaces) but had only 1 standdy NN, but in Hadoop3 you can add more than 1 Standy NN

Upendra says:
Your comment is awaiting moderation.

How about Speculative execution works in Hadoop 3, if we have replicas in erasure coding format, Will it be the same ?

Upendra says:
Your comment is awaiting moderation.

How about Speculative execution works in Hadoop 3 , If we have replicas in Erasure coding format ?
Also, If one task fails in Hadoop 2, It will execute same task in other available block after 2/3 attempts, How we achieved in Hadoop 3 ?


shanjames says:
Your comment is awaiting moderation.

Very useful information, have achieved a great knowledge from the above content on Hadoop Training useful for all the aspirants of Hadoop Training.

Lascia una risposta

L'indirizzo email non verrà reso pubblico. I campi obbligatori sono segnalati con *