Distributed tuning of machine learning algorithms using MapReduce Clusters
Distributed tuning of machine learning algorithms using MapReduce Clusters is a 2011 conference paper written in English by Ganjisaffar Y., Debeauvais T., Javanmardi S., Caruana R., Lopes C.V. and published in Proceedings of the 3rd Workshop on Large Scale Data Mining: Theory and Applications, LDMTA 2011 - Held in Conjunction with ACM SIGKDD 2011.
Obtaining the best accuracy in machine learning usually requires carefully tuning learning algorithm parameters for each problem. Parameter optimization is computationally challenging for learning methods with many hyperparameters. In this paper we show that MapReduce Clusters are particularly well suited for parallel parameter optimization. We use MapReduce to optimize regularization parameters for boosted trees and random forests on several text problems: three retrieval ranking problems and a Wikipedia vandalism problem. We show how model accuracy improves as a function of the percent of parameter space explored, that accuracy can be hurt by exploring parameter space too aggressively, and that there can be significant interaction between parameters that appear to be independent. Our results suggest that MapReduce is a two-edged sword: it makes parameter optimization feasible on a massive scale that would have been unimaginable just a few years ago, but also creates a new opportunity for overfitting that can reduce accuracy and lead to inferior learning parameters.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.