Monday, November 24, 2008

Improving MapReduce Performance in Heterogeneous Environments

The authors in this paper propose a new scheduling algorithm LATE that improves performance in heterogeneous environments. The paper is very interesting to read and is well written. Hadoop scheduler makes assumptions like homogeneous environment, tasks progress at constant time, no cost in launching speculative tasks, equal weight to all functions in a reduce task etc. These assumptions degrade performance.

Pros:
  • Thinks about heterogeneous environments.
  • Execution of speculative tasks is estimated from the time left to finish the task.
  • I liked that authors discuss how their time left estimation technique may fail in certain scenarios.
  • I really enjoyed reading the paper. It was very nicely written.
  • This solves a problem in the data parallel applications that are being used in the industry and are getting adopted by more and more companies.

Thoughts:
  • The authors suggest that different methods can be used to estimate the time left for a task. Currently, they estimate it as (1- ProgressScore)/ProgressRate. Is there a way to store history of tasks to gather statistics and then use that for estimating the time taken by a task?
  • Has there been future work in the area of finding more sophisticated methods for estimating finish times.

No comments: