Ayoh - Shop now
Out of Print--Limited Availability.
Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Follow the authors

Something went wrong. Please try your request again later.

Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions

4.5 out of 5 stars 20 ratings

Ensemble methods have been called the most influential development in Data Mining and Machine Learning in the past decade. They combine multiple models into one usually more accurate than the best of its components. Ensembles can provide a critical boost to industrial challenges -- from investment timing to drug discovery, and fraud detection to recommendation systems -- where predictive accuracy is more vital than model interpretability.

Ensembles are useful with all modeling algorithms, but this book focuses on decision trees to explain them most clearly. After describing trees and their strengths and weaknesses, the authors provide an overview of regularization -- today understood to be a key reason for the superior performance of modern ensembling algorithms. The book continues with a clear description of two recent developments: Importance Sampling (IS) and Rule Ensembles (RE). IS reveals classic ensemble methods -- bagging, random forests, and boosting -- to be special cases of a single algorithm, thereby showing how to improve their accuracy and speed. REs are linear rule models derived from decision tree ensembles. They are the most interpretable version of ensembles, which is essential to applications such as credit scoring and fault diagnosis. Lastly, the authors explain the paradox of how ensembles achieve greater accuracy on new data despite their (apparently much greater) complexity.

This book is aimed at novice and advanced analytic researchers and practitioners -- especially in Engineering, Statistics, and Computer Science. Those with little exposure to ensembles will learn why and how to employ this breakthrough method, and advanced practitioners will gain insight into building even more powerful models. Throughout, snippets of code in R are provided to illustrate the algorithms described and to encourage the reader to try the techniques.

Editorial Reviews

From the Inside Flap

"This book by Seni and Elder provides a timely, concise introduction to this topic. After an intuitive, highly accessible sketch of the key concerns in predictive learning, the book takes the readers through a shortcut into the heart of the popular tree-based ensemble creation strategies, and follows that with a compact yet clear presentation of the developments in the frontiers of statistics, where active attempts are being made to explain and exploit the mysteries of ensembles through conventional statistical theory and methods." -- Tin Kam Ho, Bell Labs, Alcatel-Lucent

"The practical implementations of ensemble methods are enormous. Most current implementations of them are quite primitive and this book will definitely raise the state of the art. Giovanni Seni's thorough mastery of the cutting-edge research and John Elder's practical experience have combined to make an extremely readable and useful book." -- Jaffray Woodriff, Quantitative Investment Management

About the Author

The authors are industry experts in data mining and machine learning who are also adjunct professors and popular speakers. Although early pioneers in discovering and using ensembles, they here distill and clarify the recent groundbreaking work of leading academics (such as Jerome Friedman) to bring the benefits of ensembles to practitioners.

Product details

  • Publisher ‏ : ‎ Morgan and Claypool Publishers
  • Publication date ‏ : ‎ February 24, 2010
  • Language ‏ : ‎ English
  • Print length ‏ : ‎ 126 pages
  • ISBN-10 ‏ : ‎ 1608452840
  • ISBN-13 ‏ : ‎ 978-1608452842
  • Item Weight ‏ : ‎ 8 ounces
  • Dimensions ‏ : ‎ 7.5 x 0.29 x 9.25 inches
  • Customer Reviews:
    4.5 out of 5 stars 20 ratings

About the authors

Follow authors to get new release updates, plus improved recommendations.

Customer reviews

4.5 out of 5 stars
20 global ratings

Review this product

Share your thoughts with other customers

Customers say

Customers find the book insightful and good for study, with one review noting its well-sequenced topics. They describe it as an absolutely delightful read.

AI-generated from the text of customer reviews

Select to learn more

7 customers mention "Information quality"7 positive0 negative

Customers find the book insightful and good for study, with one customer noting that the topics are well-sequenced.

"This book was published about 15 years ago, but it's still very insightful...." Read more

"...This relatively short book is very well organized. It has excellent examples that including useful snippets of R code...." Read more

"...It provides a high level overview of ensemble learning...." Read more

"...It contains the best "need to know" information found in the Elements in Statistical Learning, and other good books on data mining...." Read more

3 customers mention "Readability"3 positive0 negative

Customers find the book delightful to read.

"An absolutely delightful read! This relatively short book is very well organized. It has excellent examples that including useful snippets of R code...." Read more

"...But overall, this is a must-read book if you are in the data science field." Read more

"This is a really great (short) book in my opinion...." Read more

Top reviews from the United States

  • Reviewed in the United States on February 4, 2025
    This book was published about 15 years ago, but it's still very insightful. Strongly recommend for those who have been practicing but have never carefully studied a book in this category for how ensemble forecasts work.
  • Reviewed in the United States on December 29, 2012
    An absolutely delightful read! This relatively short book is very well organized. It has excellent examples that including useful snippets of R code. The topics are sequenced are very well. The selection of the material is brilliant. The text really worked for me. I cannot remember the last time I read a scientific book and learnt so much in such a short time. My previous knowledge of ensemble methods was only very shallow (knew a little about most of them and somewhat more about bagging/random forests). But the general theoretical framework of this book really brought clarity into my understanding of ensamble methods. I liked the focus on the context of the methodology rather than a lot of math formulas or too extensive examples. I appreciated that there were not too many unnecessary formulas and unexplained jargon. Highly recommended!
    5 people found this helpful
    Report
  • Reviewed in the United States on May 21, 2017
    There are very few books available discussing general aspects of ensemble methods. One of them is Ensemble methods from Seni, Elder and Grossmann. It provides a high level overview of ensemble learning. However, the book contains a lot of equations which make it hard to read from the beginning until the end. You will rather pick a few sections and read them independently.

    On one side, the book seems rather light for an academic audience (it only surfaces each topic). On the other side, it is too academic for industry practitioners. So it’s not fully clear who the target audience is.

    To be noted issues regarding missing axis label on some pictures. Also the quality of certain pictures is really low. In conclusion, I would recommend it only if you need an overview of techniques in the field and are not scared of reading equations instead of plain English.
    3 people found this helpful
    Report
  • Reviewed in the United States on September 30, 2017
    This book explained ensemble methods in a very clear manner in only about 100 pages. But what I hope more is the author can open some MOOC like Coursera or some other books with more detail examples (maybe some examples of Kaggle competition).

    But overall, this is a must-read book if you are in the data science field.
  • Reviewed in the United States on October 30, 2010
    This is a really great (short) book in my opinion. It contains the best "need to know" information found in the Elements in Statistical Learning, and other good books on data mining. The included R code is a big bonus. I am enjoying reading it so far, and I highly recommend it. The only thing that frustrates me is that the online version on the publishers website is in color, while the print version is not. This is the only reason I did not give it 5 stars. I saw the online version first, and thought that the print version would be in color as well. I am sadly mistaken. There are many graphics in this book that reference different colors and it just looks really crappy in grayscale. If you are familiar with the Elements of Statistical Learnining, imagine printing that out in grayscale and you will know what I mean.
    12 people found this helpful
    Report
  • Reviewed in the United States on May 1, 2015
    Good for study
    One person found this helpful
    Report
  • Reviewed in the United States on August 1, 2011
    This book is an accessible introduction to the theory and practice of ensemble methods in machine learning. It is a quick read, has sufficient detail for a novice to begin experimenting, and copious references for those who are interested in digging deeper. The authors also provide a nice discussion of cross-validation, and their section on regularization techniques is much more straightforward, in my opinion, than the equivalent sections in The Elements of Statistical Learning (Elements is a wonderful, necessary book, but a hard read).

    The heart of the text is the chapter on Importance Sampling. The authors frame the classic ensemble methods (bagging, boosting, and random forests) as special cases of the Importance Sampling methodology. This not only clarifies the explanations of each approach, but also provides a principled basis for finding improvements to the original algorithms. They have one of the clearest descriptions of AdaBoost that I've ever read.

    The penultimate chapter is on "Rule Ensembles": an attempt at a more interpretable ensemble learner. They also discuss measures for variable importance and interaction strength. The last chapter discusses Generalized Degrees of Freedom as an alternative complexity measure; it is probably of more interest to researchers and mathematicians than to practitioners.

    Overall, I found the book clear and concise, with good attention to practical details. I appreciated the snippets of R code and the references to relevant R packages. One minor nitpick: this book has also been published digitally, presumably with color figures. Because the print version is grayscale, some of the color-coded graphs are now illegible. Usually the major points of the figure are clear from the context in the text; still, the color to grayscale conversion is something for future authors in this series to keep in mind.

    Recommended.
    16 people found this helpful
    Report
  • Reviewed in the United States on March 21, 2015
    Excellent introduction to Ensemble methods. Good for beginners.
    2 people found this helpful
    Report

Top reviews from other countries

Translate all reviews to English
  • René Ostenfeld
    5.0 out of 5 stars Five Stars
    Reviewed in the United Kingdom on January 18, 2018
    Very important book.
  • Trading Central
    3.0 out of 5 stars Very Short Introduction To Subject Area
    Reviewed in Canada on November 25, 2013
    If you are looking for detailed information this book is not for you.

    If on the other hand you want a short introduction this may or may not work depending on your current knowledge of the area.

    The book tries to highlight many areas and a definite shortfall is the lack of depth provided on each subject area covered.

    The price is also a steep one for such a short title and as offered by another reviewer the eBook format available free online is likely a better bet especially for students.

    At just over 90 pages of useful information, this book will be a quick read and depending on the readers level of expertise a quick intro or a succinct overview of the methods available in this evolving area of machine learning.

    Better value with comparable coverage of the subject area is available for the practitioner in the Handbook of Statistical Analysis & Data Mining Applications also authored by one of the writers of this executive summary of ensemble methods.
  • Dr. Chrilly Donninger
    3.0 out of 5 stars Gute Übersicht, miserabler Verlag.
    Reviewed in Germany on February 11, 2012
    Der schmale Band ist eine gut geschriebene Übersicht über praktisch relevante Ensemble Methoden. Die Autoren gehen nicht auf alle Feinheiten ein, sie präsentieren jedoch jeden Algorithmus mit Pseudo-Kode. Der Text enthält auch zahlreiche Farb-Graphiken. Zumindest liest man im Text von grünen, blauen, roten Punkten bzw. Linien. Nur sieht man davon im Buch nix. Es ist alles Grau in Grau. Wobei noch verschärfend hinzukommt, dass die Grauwerte der verschiedenen Farben praktisch identisch gewählt wurden. Damit sind die Grafiken weitgehend sinnlos. Offensichtlich hat man eine Power-Point Präsentation ohne jede weitere Verarbeitung 1:1 gedruckt. Ein Lektorat gibt es offensichtlich nicht mehr. Einem Lektor hätte auch auffallen müssen, dass es in den References einen Friedman, J. und einen Friedman, J.H. gibt. Nachdem der Name Friedman im Wissenschaftsbetrieb relativ häufig anzutreffen ist, könnte es sich um einen oder zwei verschiedene Autoren handeln. Ich habe mir die Papers heruntergeladen. Es ist ein- und dieselbe Person. Die einheitliche Schreibweise von Autorennamen ist wohl ein Luxus aus längst vergangen Tagen.
    Report