When do reduce tasks start in Hadoop? - Forum Topic View

Menu

Menu

QBoard » Big Data » Big Data - Hadoop Eco-System » When do reduce tasks start in Hadoop?

User Dashboard

When do reduce tasks start in Hadoop?

Back To Topics

Tags : hadoop MapReduce reduce

Samar Patil

346 3

In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?

December 30, 2021 1:27 PM IST

0
Viaan Prakash

461

The reduce phase can start long before a reducer is called. As soon as "a" mapper finishes the job, the generated data undergoes some sorting and shuffling (which includes call to combiner and partitioner). The reducer "phase" kicks in the moment post mapper data processing is started. As these processing is done, you will see progress in reducers percentage. However, none of the reducers have been called in yet. Depending on number of processors available/used, nature of data and number of expected reducers, you may want to change the parameter as described by @Donald-miner above.

December 31, 2021 12:11 PM IST

0
Vaibhav Mali

259

As much I understand Reduce phase start with the map phase and keep consuming the record from maps. However since there is sort and shuffle phase after the map phase all the outputs have to be sorted and sent to the reducer. So logically you can imagine that reduce phase starts only after map phase but actually for performance reason reducers are also initialized with the mappers.

January 1, 2022 2:13 PM IST

0

Cluzters.ai is the first step towards uniting various Industry participants in the field of Applied Data Innovations. It is a gamified community geared towards creating a level playing turf for Data science professionals.

Connect With Us

Copyright ©2024 - Privacy - FAQs