Image Unavailable

Image not available for
Color:

To view this video download Flash Player

Follow the authors

Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions

by Giovanni Seni (Author), John Elder (Author), Robert Grossman (Series Editor)

20 ratings

Ensemble methods have been called the most influential development in Data Mining and Machine Learning in the past decade. They combine multiple models into one usually more accurate than the best of its components. Ensembles can provide a critical boost to industrial challenges -- from investment timing to drug discovery, and fraud detection to recommendation systems -- where predictive accuracy is more vital than model interpretability.

Ensembles are useful with all modeling algorithms, but this book focuses on decision trees to explain them most clearly. After describing trees and their strengths and weaknesses, the authors provide an overview of regularization -- today understood to be a key reason for the superior performance of modern ensembling algorithms. The book continues with a clear description of two recent developments: Importance Sampling (IS) and Rule Ensembles (RE). IS reveals classic ensemble methods -- bagging, random forests, and boosting -- to be special cases of a single algorithm, thereby showing how to improve their accuracy and speed. REs are linear rule models derived from decision tree ensembles. They are the most interpretable version of ensembles, which is essential to applications such as credit scoring and fault diagnosis. Lastly, the authors explain the paradox of how ensembles achieve greater accuracy on new data despite their (apparently much greater) complexity.

This book is aimed at novice and advanced analytic researchers and practitioners -- especially in Engineering, Statistics, and Computer Science. Those with little exposure to ensembles will learn why and how to employ this breakthrough method, and advanced practitioners will gain insight into building even more powerful models. Throughout, snippets of code in R are provided to illustrate the algorithms described and to encourage the reader to try the techniques.

Editorial Reviews

From the Inside Flap

"This book by Seni and Elder provides a timely, concise introduction to this topic. After an intuitive, highly accessible sketch of the key concerns in predictive learning, the book takes the readers through a shortcut into the heart of the popular tree-based ensemble creation strategies, and follows that with a compact yet clear presentation of the developments in the frontiers of statistics, where active attempts are being made to explain and exploit the mysteries of ensembles through conventional statistical theory and methods." -- Tin Kam Ho, Bell Labs, Alcatel-Lucent

"The practical implementations of ensemble methods are enormous. Most current implementations of them are quite primitive and this book will definitely raise the state of the art. Giovanni Seni's thorough mastery of the cutting-edge research and John Elder's practical experience have combined to make an extremely readable and useful book." -- Jaffray Woodriff, Quantitative Investment Management

About the Author

The authors are industry experts in data mining and machine learning who are also adjunct professors and popular speakers. Although early pioneers in discovering and using ensembles, they here distill and clarify the recent groundbreaking work of leading academics (such as Jerome Friedman) to bring the benefits of ensembles to practitioners.

Product details

Publisher ‏ : ‎ Morgan and Claypool Publishers
Publication date ‏ : ‎ February 24, 2010
Language ‏ : ‎ English
Print length ‏ : ‎ 126 pages
ISBN-10 ‏ : ‎ 1608452840
ISBN-13 ‏ : ‎ 978-1608452842
Item Weight ‏ : ‎ 8 ounces
Dimensions ‏ : ‎ 7.5 x 0.29 x 9.25 inches

Best Sellers Rank: #2,733,715 in Books (See Top 100 in Books)
- #860 in Database Storage & Design
- #877 in Data Mining (Books)
- #1,494 in Mathematical Analysis (Books)

Customer Reviews:
20 ratings

Brief content visible, double tap to read full content.

Full content visible, double tap to read brief content.

Videos

Help others learn more about this product by uploading a video!

Upload your video

About the authors

Follow authors to get new release updates, plus improved recommendations.

John Elder IV
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Dr. John Elder heads the US's most experienced data mining consulting team, with offices in Charlottesville, Virginia, Washington DC, Baltimore MD, and Raleigh NC (www.elderresearch.com). Founded in 1995, Elder Research, Inc. focuses on Federal, commercial, investment, and security applications of advanced analytics, including text mining, stock selection, image recognition, biometrics, process optimization, cross-selling, drug efficacy, credit scoring, risk management, and fraud detection.
John obtained a BS and MEE in Electrical Engineering from Rice University, and a PhD in Systems Engineering from the University of Virginia, where he's an adjunct professor teaching Optimization or Data Mining. Prior to 20 years at ERI, he spent 5 years in aerospace defense consulting, 4 heading research at an investment management firm, and 2 in Rice's Computational & Applied Mathematics department.
Dr. Elder has been named one of the "10 most influential people in Analytics". He's authored innovative data mining tools, is a frequent keynote speaker, and has chaired international Analytics conferences. John's courses on analysis techniques -- taught at dozens of universities, companies, and government labs -- are noted for their clarity and effectiveness. Dr. Elder was honored to serve for 5 years on a panel appointed by President Bush to guide technology for National Security. His book on practical Data Mining, with Bob Nisbet and Gary Miner, won the PROSE award for top book in 2009 in Mathematics. He was one of the discoverers of the powers of ensemble modeling, and co-authored a book on it with Giovanni Seni in February 2010. His book on Practical Text Mining, with colleague Andrew Fast and 4 others, won the PROSE award for top book in 2012 in Computation and Information Science.
John is grateful to be a follower of Christ and the father of 5.
See more on the author's page
Giovanni Seni
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Giovanni Seni is an active data mining practitioner in Silicon Valley; he has over 15 years R&D experience in statistical pattern recognition, data mining, and human-computer interaction applications. He has been a member of the technical staff at large technology companies, and a contributor at smaller organizations. He holds five US patents and has published over twenty conference and journal articles
Giovanni is an Adjunct Faculty at the Computer Engineering Department of Santa Clara University, where he teaches an Introduction to Pattern Recognition and Data Mining class.
Giovanni received a B.S. in Computer Engineering from Universidad de Los Andes (Bogotá, Colombia) in 1989, and a Ph.D. in Computer Science from State University of New York at Buffalo (SUNY Buffalo) in 1995, where he studied on a Fulbright scholarship. He also holds a certificate in Data Mining and Applications from the Department of Statistics at Stanford University.
See more on the author's page

Customer reviews

20 global ratings

5 star
69%
4 star
14%
3 star
17%
2 star
0%
1 star
0%

How customer reviews and ratings work

Review this product

Write a customer review

Customers say

Customers find the book insightful and good for study, with one review noting its well-sequenced topics. They describe it as an absolutely delightful read.

AI-generated from the text of customer reviews

Select to learn more

Information quality Readability

7 customers mention "Information quality"7 positive0 negative

Customers find the book insightful and good for study, with one customer noting that the topics are well-sequenced.

"This book was published about 15 years ago, but it's still very insightful...." Read more

"...This relatively short book is very well organized. It has excellent examples that including useful snippets of R code...." Read more

"...It provides a high level overview of ensemble learning...." Read more

"...It contains the best "need to know" information found in the Elements in Statistical Learning, and other good books on data mining...." Read more

3 customers mention "Readability"3 positive0 negative

Customers find the book delightful to read.

"An absolutely delightful read! This relatively short book is very well organized. It has excellent examples that including useful snippets of R code...." Read more

"...But overall, this is a must-read book if you are in the data science field." Read more

"This is a really great (short) book in my opinion...." Read more

Top reviews from the United States

There was a problem filtering reviews. Please reload the page.

Y. Yuan
Still very insightful
Reviewed in the United States on February 4, 2025
Verified Purchase

This book was published about 15 years ago, but it's still very insightful. Strongly recommend for those who have been practicing but have never carefully studied a book in this category for how ensemble forecasts work.

Read more

Helpful

Report
Moni Neradilek
Delightful read
Reviewed in the United States on December 29, 2012
Verified Purchase
An absolutely delightful read! This relatively short book is very well organized. It has excellent examples that including useful snippets of R code. The topics are sequenced are very well. The selection of the material is brilliant. The text really worked for me. I cannot remember the last time I read a scientific book and learnt so much in such a short time. My previous knowledge of ensemble methods was only very shallow (knew a little about most of them and somewhat more about bagging/random forests). But the general theoretical framework of this book really brought clarity into my understanding of ensamble methods. I liked the focus on the context of the methodology rather than a lot of math formulas or too extensive examples. I appreciated that there were not too many unnecessary formulas and unexplained jargon. Highly recommended!

Read more

5 people found this helpful

Helpful

Report
Sandro Saitta
If you need an overview of techniques in the field
Reviewed in the United States on May 21, 2017
Verified Purchase
There are very few books available discussing general aspects of ensemble methods. One of them is Ensemble methods from Seni, Elder and Grossmann. It provides a high level overview of ensemble learning. However, the book contains a lot of equations which make it hard to read from the beginning until the end. You will rather pick a few sections and read them independently.

On one side, the book seems rather light for an academic audience (it only surfaces each topic). On the other side, it is too academic for industry practitioners. So it’s not fully clear who the target audience is.

To be noted issues regarding missing axis label on some pictures. Also the quality of certain pictures is really low. In conclusion, I would recommend it only if you need an overview of techniques in the field and are not scared of reading equations instead of plain English.

Read more

3 people found this helpful

Helpful

Report
Leon
Very good book for data scientist
Reviewed in the United States on September 30, 2017
Verified Purchase
This book explained ensemble methods in a very clear manner in only about 100 pages. But what I hope more is the author can open some MOOC like Coursera or some other books with more detail examples (maybe some examples of Kaggle competition).

But overall, this is a must-read book if you are in the data science field.

Read more

Helpful

Report
Amazon Customer
Great Need to Know Info on Ensembles
Reviewed in the United States on October 30, 2010
Verified Purchase
This is a really great (short) book in my opinion. It contains the best "need to know" information found in the Elements in Statistical Learning, and other good books on data mining. The included R code is a big bonus. I am enjoying reading it so far, and I highly recommend it. The only thing that frustrates me is that the online version on the publishers website is in color, while the print version is not. This is the only reason I did not give it 5 stars. I saw the online version first, and thought that the print version would be in color as well. I am sadly mistaken. There are many graphics in this book that reference different colors and it just looks really crappy in grayscale. If you are familiar with the Elements of Statistical Learnining, imagine printing that out in grayscale and you will know what I mean.

Read more

12 people found this helpful

Helpful

Report
Seungkwan Nam
Good for study
Reviewed in the United States on May 1, 2015
Verified Purchase
Good for study

Read more

One person found this helpful

Helpful

Report
Nina Zumel
Clear, accessible introduction
Reviewed in the United States on August 1, 2011
This book is an accessible introduction to the theory and practice of ensemble methods in machine learning. It is a quick read, has sufficient detail for a novice to begin experimenting, and copious references for those who are interested in digging deeper. The authors also provide a nice discussion of cross-validation, and their section on regularization techniques is much more straightforward, in my opinion, than the equivalent sections in The Elements of Statistical Learning (Elements is a wonderful, necessary book, but a hard read).

The heart of the text is the chapter on Importance Sampling. The authors frame the classic ensemble methods (bagging, boosting, and random forests) as special cases of the Importance Sampling methodology. This not only clarifies the explanations of each approach, but also provides a principled basis for finding improvements to the original algorithms. They have one of the clearest descriptions of AdaBoost that I've ever read.

The penultimate chapter is on "Rule Ensembles": an attempt at a more interpretable ensemble learner. They also discuss measures for variable importance and interaction strength. The last chapter discusses Generalized Degrees of Freedom as an alternative complexity measure; it is probably of more interest to researchers and mathematicians than to practitioners.

Overall, I found the book clear and concise, with good attention to practical details. I appreciated the snippets of R code and the references to relevant R packages. One minor nitpick: this book has also been published digitally, presumably with color figures. Because the print version is grayscale, some of the color-coded graphs are now illegible. Usually the major points of the figure are clear from the context in the text; still, the color to grayscale conversion is something for future authors in this series to keep in mind.

Recommended.

Read more

16 people found this helpful

Helpful

Report
M. Shannon
excellent introduction
Reviewed in the United States on March 21, 2015
Verified Purchase
Excellent introduction to Ensemble methods. Good for beginners.

Read more

2 people found this helpful

Helpful

Report

Top reviews from other countries

Translate all reviews to English

René Ostenfeld
Five Stars
Reviewed in the United Kingdom on January 18, 2018
Verified Purchase

Very important book.

Read more
Report
Trading Central
Very Short Introduction To Subject Area
Reviewed in Canada on November 25, 2013
Verified Purchase
If you are looking for detailed information this book is not for you.

If on the other hand you want a short introduction this may or may not work depending on your current knowledge of the area.

The book tries to highlight many areas and a definite shortfall is the lack of depth provided on each subject area covered.

The price is also a steep one for such a short title and as offered by another reviewer the eBook format available free online is likely a better bet especially for students.

At just over 90 pages of useful information, this book will be a quick read and depending on the readers level of expertise a quick intro or a succinct overview of the methods available in this evolving area of machine learning.

Better value with comparable coverage of the subject area is available for the practitioner in the Handbook of Statistical Analysis & Data Mining Applications also authored by one of the writers of this executive summary of ensemble methods.

Read more
Report
Dr. Chrilly Donninger
Gute Übersicht, miserabler Verlag.
Reviewed in Germany on February 11, 2012
Verified Purchase
Der schmale Band ist eine gut geschriebene Übersicht über praktisch relevante Ensemble Methoden. Die Autoren gehen nicht auf alle Feinheiten ein, sie präsentieren jedoch jeden Algorithmus mit Pseudo-Kode. Der Text enthält auch zahlreiche Farb-Graphiken. Zumindest liest man im Text von grünen, blauen, roten Punkten bzw. Linien. Nur sieht man davon im Buch nix. Es ist alles Grau in Grau. Wobei noch verschärfend hinzukommt, dass die Grauwerte der verschiedenen Farben praktisch identisch gewählt wurden. Damit sind die Grafiken weitgehend sinnlos. Offensichtlich hat man eine Power-Point Präsentation ohne jede weitere Verarbeitung 1:1 gedruckt. Ein Lektorat gibt es offensichtlich nicht mehr. Einem Lektor hätte auch auffallen müssen, dass es in den References einen Friedman, J. und einen Friedman, J.H. gibt. Nachdem der Name Friedman im Wissenschaftsbetrieb relativ häufig anzutreffen ist, könnte es sich um einen oder zwei verschiedene Autoren handeln. Ich habe mir die Papers heruntergeladen. Es ist ein- und dieselbe Person. Die einheitliche Schreibweise von Autorennamen ist wohl ein Luxus aus längst vergangen Tagen.

Read more
Report
Translate review to English

See more reviews