Browse wiki

Jump to: navigation, search
Distributed tuning of machine learning algorithms using MapReduce Clusters
Abstract Obtaining the best accuracy in machine leaObtaining the best accuracy in machine learning usually requires carefully tuning learning algorithm parameters for each problem. Parameter optimization is computationally challenging for learning methods with many hyperparameters. In this paper we show that MapReduce Clusters are particularly well suited for parallel parameter optimization. We use MapReduce to optimize regularization parameters for boosted trees and random forests on several text problems: three retrieval ranking problems and a Wikipedia vandalism problem. We show how model accuracy improves as a function of the percent of parameter space explored, that accuracy can be hurt by exploring parameter space too aggressively, and that there can be significant interaction between parameters that appear to be independent. Our results suggest that MapReduce is a two-edged sword: it makes parameter optimization feasible on a massive scale that would have been unimaginable just a few years ago, but also creates a new opportunity for overfitting that can reduce accuracy and lead to inferior learning parameters. and lead to inferior learning parameters.
Abstractsub Obtaining the best accuracy in machine leaObtaining the best accuracy in machine learning usually requires carefully tuning learning algorithm parameters for each problem. Parameter optimization is computationally challenging for learning methods with many hyperparameters. In this paper we show that MapReduce Clusters are particularly well suited for parallel parameter optimization. We use MapReduce to optimize regularization parameters for boosted trees and random forests on several text problems: three retrieval ranking problems and a Wikipedia vandalism problem. We show how model accuracy improves as a function of the percent of parameter space explored, that accuracy can be hurt by exploring parameter space too aggressively, and that there can be significant interaction between parameters that appear to be independent. Our results suggest that MapReduce is a two-edged sword: it makes parameter optimization feasible on a massive scale that would have been unimaginable just a few years ago, but also creates a new opportunity for overfitting that can reduce accuracy and lead to inferior learning parameters. and lead to inferior learning parameters.
Bibtextype inproceedings  +
Doi 10.1145/2002945.2002947  +
Has author Yasser Ganjisaffar + , Debeauvais T. + , Sara Javanmardi + , Caruana R. + , Lopes C.V. +
Has extra keyword Algorithm parameters + , Hyper-parameter + , Hyperparameters + , Learning methods + , Learning parameters + , Machine learning + , Map-reduce + , Model accuracy + , Overfitting + , Parameter optimization + , Parameter spaces + , Random forests + , Ranking problems + , Regularization parameters + , Wikipedia + , Data mining + , Decision trees + , Learning systems + , Optimization + , Learning algorithms +
Has keyword Hyper-parameter + , Machine learning + , MapReduce + , Optimization + , Tuning +
Isbn 9781450308441  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Published in Proceedings of the 3rd Workshop on Large Scale Data Mining: Theory and Applications, LDMTA 2011 - Held in Conjunction with ACM SIGKDD 2011 +
Title Distributed tuning of machine learning algorithms using MapReduce Clusters +
Type conference paper  +
Year 2011 +
Creation dateThis property is a special property in this wiki. 7 November 2014 11:47:54  +
Categories Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 7 November 2014 11:47:54  +
DateThis property is a special property in this wiki. 2011  +
hide properties that link here 
Distributed tuning of machine learning algorithms using MapReduce Clusters + Title
 

 

Enter the name of the page to start browsing from.