
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the authors
OK
Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions
Ensembles are useful with all modeling algorithms, but this book focuses on decision trees to explain them most clearly. After describing trees and their strengths and weaknesses, the authors provide an overview of regularization -- today understood to be a key reason for the superior performance of modern ensembling algorithms. The book continues with a clear description of two recent developments: Importance Sampling (IS) and Rule Ensembles (RE). IS reveals classic ensemble methods -- bagging, random forests, and boosting -- to be special cases of a single algorithm, thereby showing how to improve their accuracy and speed. REs are linear rule models derived from decision tree ensembles. They are the most interpretable version of ensembles, which is essential to applications such as credit scoring and fault diagnosis. Lastly, the authors explain the paradox of how ensembles achieve greater accuracy on new data despite their (apparently much greater) complexity.
This book is aimed at novice and advanced analytic researchers and practitioners -- especially in Engineering, Statistics, and Computer Science. Those with little exposure to ensembles will learn why and how to employ this breakthrough method, and advanced practitioners will gain insight into building even more powerful models. Throughout, snippets of code in R are provided to illustrate the algorithms described and to encourage the reader to try the techniques.
- ISBN-101608452840
- ISBN-13978-1608452842
- PublisherMorgan and Claypool Publishers
- Publication dateFebruary 24, 2010
- LanguageEnglish
- Dimensions7.5 x 0.29 x 9.25 inches
- Print length126 pages
Editorial Reviews
From the Inside Flap
"The practical implementations of ensemble methods are enormous. Most current implementations of them are quite primitive and this book will definitely raise the state of the art. Giovanni Seni's thorough mastery of the cutting-edge research and John Elder's practical experience have combined to make an extremely readable and useful book." -- Jaffray Woodriff, Quantitative Investment Management
About the Author
Product details
- Publisher : Morgan and Claypool Publishers
- Publication date : February 24, 2010
- Language : English
- Print length : 126 pages
- ISBN-10 : 1608452840
- ISBN-13 : 978-1608452842
- Item Weight : 8 ounces
- Dimensions : 7.5 x 0.29 x 9.25 inches
- Best Sellers Rank: #2,733,715 in Books (See Top 100 in Books)
- #860 in Database Storage & Design
- #877 in Data Mining (Books)
- #1,494 in Mathematical Analysis (Books)
- Customer Reviews:
About the authors
Dr. John Elder heads the US's most experienced data mining consulting team, with offices in Charlottesville, Virginia, Washington DC, Baltimore MD, and Raleigh NC (www.elderresearch.com). Founded in 1995, Elder Research, Inc. focuses on Federal, commercial, investment, and security applications of advanced analytics, including text mining, stock selection, image recognition, biometrics, process optimization, cross-selling, drug efficacy, credit scoring, risk management, and fraud detection.
John obtained a BS and MEE in Electrical Engineering from Rice University, and a PhD in Systems Engineering from the University of Virginia, where he's an adjunct professor teaching Optimization or Data Mining. Prior to 20 years at ERI, he spent 5 years in aerospace defense consulting, 4 heading research at an investment management firm, and 2 in Rice's Computational & Applied Mathematics department.
Dr. Elder has been named one of the "10 most influential people in Analytics". He's authored innovative data mining tools, is a frequent keynote speaker, and has chaired international Analytics conferences. John's courses on analysis techniques -- taught at dozens of universities, companies, and government labs -- are noted for their clarity and effectiveness. Dr. Elder was honored to serve for 5 years on a panel appointed by President Bush to guide technology for National Security. His book on practical Data Mining, with Bob Nisbet and Gary Miner, won the PROSE award for top book in 2009 in Mathematics. He was one of the discoverers of the powers of ensemble modeling, and co-authored a book on it with Giovanni Seni in February 2010. His book on Practical Text Mining, with colleague Andrew Fast and 4 others, won the PROSE award for top book in 2012 in Computation and Information Science.
John is grateful to be a follower of Christ and the father of 5.
Giovanni Seni is an active data mining practitioner in Silicon Valley; he has over 15 years R&D experience in statistical pattern recognition, data mining, and human-computer interaction applications. He has been a member of the technical staff at large technology companies, and a contributor at smaller organizations. He holds five US patents and has published over twenty conference and journal articles
Giovanni is an Adjunct Faculty at the Computer Engineering Department of Santa Clara University, where he teaches an Introduction to Pattern Recognition and Data Mining class.
Giovanni received a B.S. in Computer Engineering from Universidad de Los Andes (Bogotá, Colombia) in 1989, and a Ph.D. in Computer Science from State University of New York at Buffalo (SUNY Buffalo) in 1995, where he studied on a Fulbright scholarship. He also holds a certificate in Data Mining and Applications from the Department of Statistics at Stanford University.
Customer reviews
- 5 star4 star3 star2 star1 star5 star69%14%17%0%0%69%
- 5 star4 star3 star2 star1 star4 star69%14%17%0%0%14%
- 5 star4 star3 star2 star1 star3 star69%14%17%0%0%17%
- 5 star4 star3 star2 star1 star2 star69%14%17%0%0%0%
- 5 star4 star3 star2 star1 star1 star69%14%17%0%0%0%
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Customers find the book insightful and good for study, with one review noting its well-sequenced topics. They describe it as an absolutely delightful read.
AI-generated from the text of customer reviews
Select to learn more
Customers find the book insightful and good for study, with one customer noting that the topics are well-sequenced.
"This book was published about 15 years ago, but it's still very insightful...." Read more
"...This relatively short book is very well organized. It has excellent examples that including useful snippets of R code...." Read more
"...It provides a high level overview of ensemble learning...." Read more
"...It contains the best "need to know" information found in the Elements in Statistical Learning, and other good books on data mining...." Read more
Customers find the book delightful to read.
"An absolutely delightful read! This relatively short book is very well organized. It has excellent examples that including useful snippets of R code...." Read more
"...But overall, this is a must-read book if you are in the data science field." Read more
"This is a really great (short) book in my opinion...." Read more
Top reviews from the United States
There was a problem filtering reviews. Please reload the page.
- Reviewed in the United States on February 4, 2025This book was published about 15 years ago, but it's still very insightful. Strongly recommend for those who have been practicing but have never carefully studied a book in this category for how ensemble forecasts work.
- Reviewed in the United States on December 29, 2012An absolutely delightful read! This relatively short book is very well organized. It has excellent examples that including useful snippets of R code. The topics are sequenced are very well. The selection of the material is brilliant. The text really worked for me. I cannot remember the last time I read a scientific book and learnt so much in such a short time. My previous knowledge of ensemble methods was only very shallow (knew a little about most of them and somewhat more about bagging/random forests). But the general theoretical framework of this book really brought clarity into my understanding of ensamble methods. I liked the focus on the context of the methodology rather than a lot of math formulas or too extensive examples. I appreciated that there were not too many unnecessary formulas and unexplained jargon. Highly recommended!
- Reviewed in the United States on May 21, 2017There are very few books available discussing general aspects of ensemble methods. One of them is Ensemble methods from Seni, Elder and Grossmann. It provides a high level overview of ensemble learning. However, the book contains a lot of equations which make it hard to read from the beginning until the end. You will rather pick a few sections and read them independently.
On one side, the book seems rather light for an academic audience (it only surfaces each topic). On the other side, it is too academic for industry practitioners. So it’s not fully clear who the target audience is.
To be noted issues regarding missing axis label on some pictures. Also the quality of certain pictures is really low. In conclusion, I would recommend it only if you need an overview of techniques in the field and are not scared of reading equations instead of plain English.
- Reviewed in the United States on September 30, 2017This book explained ensemble methods in a very clear manner in only about 100 pages. But what I hope more is the author can open some MOOC like Coursera or some other books with more detail examples (maybe some examples of Kaggle competition).
But overall, this is a must-read book if you are in the data science field.
- Reviewed in the United States on October 30, 2010This is a really great (short) book in my opinion. It contains the best "need to know" information found in the Elements in Statistical Learning, and other good books on data mining. The included R code is a big bonus. I am enjoying reading it so far, and I highly recommend it. The only thing that frustrates me is that the online version on the publishers website is in color, while the print version is not. This is the only reason I did not give it 5 stars. I saw the online version first, and thought that the print version would be in color as well. I am sadly mistaken. There are many graphics in this book that reference different colors and it just looks really crappy in grayscale. If you are familiar with the Elements of Statistical Learnining, imagine printing that out in grayscale and you will know what I mean.
- Reviewed in the United States on May 1, 2015Good for study
- Reviewed in the United States on August 1, 2011This book is an accessible introduction to the theory and practice of ensemble methods in machine learning. It is a quick read, has sufficient detail for a novice to begin experimenting, and copious references for those who are interested in digging deeper. The authors also provide a nice discussion of cross-validation, and their section on regularization techniques is much more straightforward, in my opinion, than the equivalent sections in The Elements of Statistical Learning (Elements is a wonderful, necessary book, but a hard read).
The heart of the text is the chapter on Importance Sampling. The authors frame the classic ensemble methods (bagging, boosting, and random forests) as special cases of the Importance Sampling methodology. This not only clarifies the explanations of each approach, but also provides a principled basis for finding improvements to the original algorithms. They have one of the clearest descriptions of AdaBoost that I've ever read.
The penultimate chapter is on "Rule Ensembles": an attempt at a more interpretable ensemble learner. They also discuss measures for variable importance and interaction strength. The last chapter discusses Generalized Degrees of Freedom as an alternative complexity measure; it is probably of more interest to researchers and mathematicians than to practitioners.
Overall, I found the book clear and concise, with good attention to practical details. I appreciated the snippets of R code and the references to relevant R packages. One minor nitpick: this book has also been published digitally, presumably with color figures. Because the print version is grayscale, some of the color-coded graphs are now illegible. Usually the major points of the figure are clear from the context in the text; still, the color to grayscale conversion is something for future authors in this series to keep in mind.
Recommended.
- Reviewed in the United States on March 21, 2015Excellent introduction to Ensemble methods. Good for beginners.
Top reviews from other countries
- René OstenfeldReviewed in the United Kingdom on January 18, 2018
5.0 out of 5 stars Five Stars
Very important book.
- Trading CentralReviewed in Canada on November 25, 2013
3.0 out of 5 stars Very Short Introduction To Subject Area
If you are looking for detailed information this book is not for you.
If on the other hand you want a short introduction this may or may not work depending on your current knowledge of the area.
The book tries to highlight many areas and a definite shortfall is the lack of depth provided on each subject area covered.
The price is also a steep one for such a short title and as offered by another reviewer the eBook format available free online is likely a better bet especially for students.
At just over 90 pages of useful information, this book will be a quick read and depending on the readers level of expertise a quick intro or a succinct overview of the methods available in this evolving area of machine learning.
Better value with comparable coverage of the subject area is available for the practitioner in the Handbook of Statistical Analysis & Data Mining Applications also authored by one of the writers of this executive summary of ensemble methods.
-
Dr. Chrilly DonningerReviewed in Germany on February 11, 2012
3.0 out of 5 stars Gute Übersicht, miserabler Verlag.
Der schmale Band ist eine gut geschriebene Übersicht über praktisch relevante Ensemble Methoden. Die Autoren gehen nicht auf alle Feinheiten ein, sie präsentieren jedoch jeden Algorithmus mit Pseudo-Kode. Der Text enthält auch zahlreiche Farb-Graphiken. Zumindest liest man im Text von grünen, blauen, roten Punkten bzw. Linien. Nur sieht man davon im Buch nix. Es ist alles Grau in Grau. Wobei noch verschärfend hinzukommt, dass die Grauwerte der verschiedenen Farben praktisch identisch gewählt wurden. Damit sind die Grafiken weitgehend sinnlos. Offensichtlich hat man eine Power-Point Präsentation ohne jede weitere Verarbeitung 1:1 gedruckt. Ein Lektorat gibt es offensichtlich nicht mehr. Einem Lektor hätte auch auffallen müssen, dass es in den References einen Friedman, J. und einen Friedman, J.H. gibt. Nachdem der Name Friedman im Wissenschaftsbetrieb relativ häufig anzutreffen ist, könnte es sich um einen oder zwei verschiedene Autoren handeln. Ich habe mir die Papers heruntergeladen. Es ist ein- und dieselbe Person. Die einheitliche Schreibweise von Autorennamen ist wohl ein Luxus aus längst vergangen Tagen.