QBoard » Big Data » Big Data - Hadoop Eco-System » When do reduce tasks start in Hadoop?

When do reduce tasks start in Hadoop?

  • In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?

      December 30, 2021 1:27 PM IST
    0
  • The reduce phase can start long before a reducer is called. As soon as "a" mapper finishes the job, the generated data undergoes some sorting and shuffling (which includes call to combiner and partitioner). The reducer "phase" kicks in the moment post mapper data processing is started. As these processing is done, you will see progress in reducers percentage. However, none of the reducers have been called in yet. Depending on number of processors available/used, nature of data and number of expected reducers, you may want to change the parameter as described by @Donald-miner above.

     
      December 31, 2021 12:11 PM IST
    0
  • As much I understand Reduce phase start with the map phase and keep consuming the record from maps. However since there is sort and shuffle phase after the map phase all the outputs have to be sorted and sent to the reducer. So logically you can imagine that reduce phase starts only after map phase but actually for performance reason reducers are also initialized with the mappers.

     
      January 1, 2022 2:13 PM IST
    0