Machine learning with spark and python : (Record no. 37394)
[ view plain ]
LIBRARY OF CONGRESS CONTROL NUMBER | |
---|---|
LC control number | 2019940771 |
INTERNATIONAL STANDARD BOOK NUMBER | |
International Standard Book Number | 9781119561934 |
DEWEY DECIMAL CLASSIFICATION NUMBER | |
Call number | 006.31 BO MA |
MAIN ENTRY--PERSONAL NAME | |
Authors | Bowles, Michael |
TITLE STATEMENT | |
Title | Machine learning with spark and python : |
Subtitle | essential techniques for predictive analytics |
Statement of responsibility, etc | Michael Bowles |
EDITION STATEMENT | |
Edition | 2nd ed. |
PUBLICATION, DISTRIBUTION, ETC. (IMPRINT) | |
Place of publication | Indianapolis, IN : |
Publisher | John Wiley and Sons, |
Date | c2020. |
PHYSICAL DESCRIPTION | |
Extent | xxvii, 340 p. : |
Other Details | ill. ; |
Size | 25 cm. |
CONTENTS | |
Contents | • Machine generated contents note: ch. 1 The Two Essential Algorithms for Making Predictions • Why Are These Two Algorithms So Useful? • What Are Penalized Regression Methods? • What Are Ensemble Methods? • How to Decide Which Algorithm to Use • The Process Steps for Building a Predictive Model • Framing a Machine Learning Problem • Feature Extraction and Feature Engineering • Determining Performance of a Trained Model • Chapter Contents and Dependencies • Summary • ch. 2 Understand the Problem by Understanding the Data • The Anatomy of a New Problem • Different Types of Attributes and Labels Drive Modeling Choices • Things to Notice about Your New Data Set • Classification Problems: Detecting Unexploded Mines Using Sonar • Physical Characteristics of the Rocks Versus Mines Data Set • Statistical Summaries of the Rocks Versus Mines Data Set • Visualization of Outliers Using a Quantile-Quantile Plot • Statistical Characterization of Categorical Attributes • Contents note continued: How to Use Python Pandas to Summarize the Rocks Versus Mines Data Set • Visualizing Properties of the Rocks Versus Mines Data Set • Visualizing with Parallel Coordinates Plots • Visualizing Interrelationships between Attributes and Labels • Visualizing Attribute and Label Correlations Using a Heat Map • Summarizing the Process for Understanding the Rocks Versus Mines Data Set • Real-Valued Predictions with Factor Variables: How Old Is Your Abalone? • Parallel Coordinates for Regression Problems • -Visualize Variable Relationships for the Abalone Problem • How to Use a Correlation Heat Map for Regression • -Visualize Pair-Wise Correlations for the Abalone Problem • Real-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes • Multiclass Classification Problem: What Type of Glass Is That? • Using PySpark to Understand Large Data Sets • Contents note continued: ch. 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data • The Basic Problem: Understanding Function Approximation • Working with Training Data • Assessing Performance of Predictive Models • Factors Driving Algorithm Choices and Performance • -Complexity and Data • Contrast between a Simple Problem and a Complex Problem • Contrast between a Simple Model and a Complex Model • Factors Driving Predictive Algorithm Performance • Choosing an Algorithm: Linear or Nonlinear? • Measuring the Performance of Predictive Models • Performance Measures for Different Types of Problems • Simulating Performance of Deployed Models • Achieving Harmony between Model and Data • Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size • Using Forward Stepwise Regression to Control Overfitting • Evaluating and Understanding Your Predictive Model • Contents note continued: Control Overfitting by Penalizing Regression Coefficients • -Ridge Regression • Using PySpark for Training Penalized Regression Models on Extremely Large Data Sets • ch. 4 Penalized Linear Regression • Why Penalized Linear Regression Methods Are So Useful • Extremely Fast Coefficient Estimation • Variable Importance Information • Extremely Fast Evaluation When Deployed • Reliable Performance • Sparse Solutions • Problem May Require Linear Model • When to Use Ensemble Methods • Penalized Linear Regression: Regulating Linear Regression for Optimum Performance • Training Linear Models: Minimizing Errors and More • Adding a Coefficient Penalty to the OLS Formulation • Other Useful Coefficient Penalties • -Manhattan and ElasticNet • Why Lasso Penalty Leads to Sparse Coefficient Vectors • ElasticNet Penalty Includes Both Lasso and Ridge • Solving the Penalized Linear Regression Problem • Contents note continued: Understanding Least Angle Regression and Its Relationship to Forward Stepwise Regression • How LARS Generates Hundreds of Models of Varying Complexity • Choosing the Best Model from the Hundreds LARS Generates • Using Glmnet: Very Fast and Very General • Comparison of the Mechanics of Glmnet and LARS Algorithms • Initializing and Iterating the Glmnet Algorithm • Extension of Linear Regression to Classification Problems • Solving Classification Problems with Penalized Regression • Working with Classification Problems Having More Than Two Outcomes • Understanding Basis Expansion: Using Linear Methods on Nonlinear Problems • Incorporating Non-Numeric Attributes into Linear Methods • ch. 5 Building Predictive Models Using Penalized Linear Methods • Python Packages for Penalized Linear Regression • Multivariable Regression: Predicting Wine Taste • Building and Testing a Model to Predict Wine Taste • Contents note continued: Training on the Whole Data Set before Deployment • Basis Expansion: Improving Performance by Creating New Variables from Old Ones • Binary Classification: Using Penalized Linear Regression to Detect Unexploded Mines • Build a Rocks Versus Mines Classifier for Deployment • Multiclass Classification: Classifying Crime Scene Glass Samples • Linear Regression and Classification Using PySpark • Using PySpark to Predict Wine Taste • Logistic Regression with PySpark: Rocks Versus Mines • Incorporating Categorical Variables in a PySpark Model: Predicting Abalone Rings • Multiclass Logistic Regression with Meta Parameter Optimization • ch. 6 Ensemble Methods • Binary Decision Trees • How a Binary Decision Tree Generates Predictions • How to Train a Binary Decision Tree • Tree Training Equals Split Point Selection • How Split Point Selection Affects Predictions • Algorithm for Selecting Split Points • Contents note continued: Multivariable Tree Training • -Which Attribute to Split? • Recursive Splitting for More Tree Depth • Overfitting Binary Trees • Measuring Overfit with Binary Trees • Balancing Binary Tree Complexity for Best Performance • Modifications for Classification and Categorical Features • Bootstrap Aggregation: "Bagging" • How Does the Bagging Algorithm Work? • Bagging Performance • -Bias Versus Variance • How Bagging Behaves on Multivariable Problem • Bagging Needs Tree Depth for Performance • Summary of Bagging • Gradient Boosting • Basic Principle of Gradient Boosting Algorithm • Parameter Settings for Gradient Boosting • How Gradient Boosting Iterates toward a Predictive Model • Getting the Best Performance from Gradient Boosting • Gradient Boosting on a Multivariable Problem • Summary for Gradient Boosting • Random Forests • Random Forests: Bagging Plus Random Attribute Subsets • Random Forests Performance Drivers • Contents note continued: Random Forests Summary • ch. 7 Building Ensemble Models with Python • Solving Regression Problems with Python Ensemble Packages • Using Gradient Boosting to Predict Wine Taste • Using the Class Constructor for GradientBoostingRegressor • Using GradientBoostingRegressor to Implement a Regression Model • Assessing the Performance of a Gradient Boosting Model • Building a Random Forest Model to Predict Wine Taste • Constructing a RandomForestRegressor Object • Modeling Wine Taste with RandomForestRegressor • Visualizing the Performance of a Random Forest Regression Model • Incorporating Non-Numeric Attributes in Python Ensemble Models • Coding the Sex of Abalone for Gradient Boosting Regression in Python • Assessing Performance and the Importance of Coded Variables with Gradient Boosting • Coding the Sex of Abalone for Input to Random Forest Regression in Python • Assessing Performance and the Importance of Coded Variables • Contents note continued: Solving Binary Classification Problems with Python Ensemble Methods • Detecting Unexploded Mines with Python Gradient Boosting • Determining the Performance of a Gradient Boosting Classifier • Detecting Unexploded Mines with Python Random Forest • Constructing a Random Forest Model to Detect Unexploded Mines • Determining the Performance of a Random Forest Classifier • Solving Multiclass Classification Problems with Python Ensemble Methods • Dealing with Class Imbalances • Classifying Glass Using Gradient Boosting • Determining the Performance of the Gradient Boosting Model on Glass Classification • Classifying Glass with Random Forests • Determining the Performance of the Random Forest Model on Glass Classification • Solving Regression Problems with PySpark Ensemble Packages • Predicting Wine Taste with PySpark Ensemble Methods • Predicting Abalone Age with PySpark Ensemble Methods • Contents note continued: Distinguishing Mines from Rocks with PySpark Ensemble Methods • Identifying Glass Types with PySpark Ensemble Methods • Summary. |
SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical Heading | Machine learning |
ELECTRONIC LOCATION AND ACCESS | |
Uniform Resource Identifier | https://uowd.box.com/s/5tfcyofz1iagzl63whqgic1sfxmdzuij |
Public note | Location Map |
MAIN ENTRY--PERSONAL NAME | |
-- | 34086 |
SUBJECT ADDED ENTRY--TOPICAL TERM | |
-- | 5121 |
Lost status | Source of classification or shelving scheme | Damaged status | Not for loan | Permanent location | Current location | Shelving location | Date acquired | Source of acquisition | Full call number | Barcode | Date last seen | Price effective from | Koha item type | Public note |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
University of Wollongong in Dubai | University of Wollongong in Dubai | Main Collection | 2020-01-19 | AMAUK | 006.31 BO MA | T0064182 | 2020-01-14 | 2020-01-14 | REGULAR | Feb2020 |