Bowles, Michael

Machine learning with spark and python : essential techniques for predictive analytics Michael Bowles - 2nd ed. - Indianapolis, IN : John Wiley and Sons, c2020. - xxvii, 340 p. : ill. ; 25 cm.

• Machine generated contents note: ch. 1 The Two Essential Algorithms for Making Predictions
• Why Are These Two Algorithms So Useful?
• What Are Penalized Regression Methods?
• What Are Ensemble Methods?
• How to Decide Which Algorithm to Use
• The Process Steps for Building a Predictive Model
• Framing a Machine Learning Problem
• Feature Extraction and Feature Engineering
• Determining Performance of a Trained Model
• Chapter Contents and Dependencies
• Summary
• ch. 2 Understand the Problem by Understanding the Data
• The Anatomy of a New Problem
• Different Types of Attributes and Labels Drive Modeling Choices
• Things to Notice about Your New Data Set
• Classification Problems: Detecting Unexploded Mines Using Sonar
• Physical Characteristics of the Rocks Versus Mines Data Set
• Statistical Summaries of the Rocks Versus Mines Data Set
• Visualization of Outliers Using a Quantile-Quantile Plot
• Statistical Characterization of Categorical Attributes
• Contents note continued: How to Use Python Pandas to Summarize the Rocks Versus Mines Data Set
• Visualizing Properties of the Rocks Versus Mines Data Set
• Visualizing with Parallel Coordinates Plots
• Visualizing Interrelationships between Attributes and Labels
• Visualizing Attribute and Label Correlations Using a Heat Map
• Summarizing the Process for Understanding the Rocks Versus Mines Data Set
• Real-Valued Predictions with Factor Variables: How Old Is Your Abalone?
• Parallel Coordinates for Regression Problems
• -Visualize Variable Relationships for the Abalone Problem
• How to Use a Correlation Heat Map for Regression
• -Visualize Pair-Wise Correlations for the Abalone Problem
• Real-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes
• Multiclass Classification Problem: What Type of Glass Is That?
• Using PySpark to Understand Large Data Sets
• Contents note continued: ch. 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data
• The Basic Problem: Understanding Function Approximation
• Working with Training Data
• Assessing Performance of Predictive Models
• Factors Driving Algorithm Choices and Performance
• -Complexity and Data
• Contrast between a Simple Problem and a Complex Problem
• Contrast between a Simple Model and a Complex Model
• Factors Driving Predictive Algorithm Performance
• Choosing an Algorithm: Linear or Nonlinear?
• Measuring the Performance of Predictive Models
• Performance Measures for Different Types of Problems
• Simulating Performance of Deployed Models
• Achieving Harmony between Model and Data
• Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size
• Using Forward Stepwise Regression to Control Overfitting
• Evaluating and Understanding Your Predictive Model
• Contents note continued: Control Overfitting by Penalizing Regression Coefficients
• -Ridge Regression
• Using PySpark for Training Penalized Regression Models on Extremely Large Data Sets
• ch. 4 Penalized Linear Regression
• Why Penalized Linear Regression Methods Are So Useful
• Extremely Fast Coefficient Estimation
• Variable Importance Information
• Extremely Fast Evaluation When Deployed
• Reliable Performance
• Sparse Solutions
• Problem May Require Linear Model
• When to Use Ensemble Methods
• Penalized Linear Regression: Regulating Linear Regression for Optimum Performance
• Training Linear Models: Minimizing Errors and More
• Adding a Coefficient Penalty to the OLS Formulation
• Other Useful Coefficient Penalties
• -Manhattan and ElasticNet
• Why Lasso Penalty Leads to Sparse Coefficient Vectors
• ElasticNet Penalty Includes Both Lasso and Ridge
• Solving the Penalized Linear Regression Problem
• Contents note continued: Understanding Least Angle Regression and Its Relationship to Forward Stepwise Regression
• How LARS Generates Hundreds of Models of Varying Complexity
• Choosing the Best Model from the Hundreds LARS Generates
• Using Glmnet: Very Fast and Very General
• Comparison of the Mechanics of Glmnet and LARS Algorithms
• Initializing and Iterating the Glmnet Algorithm
• Extension of Linear Regression to Classification Problems
• Solving Classification Problems with Penalized Regression
• Working with Classification Problems Having More Than Two Outcomes
• Understanding Basis Expansion: Using Linear Methods on Nonlinear Problems
• Incorporating Non-Numeric Attributes into Linear Methods
• ch. 5 Building Predictive Models Using Penalized Linear Methods
• Python Packages for Penalized Linear Regression
• Multivariable Regression: Predicting Wine Taste
• Building and Testing a Model to Predict Wine Taste
• Contents note continued: Training on the Whole Data Set before Deployment
• Basis Expansion: Improving Performance by Creating New Variables from Old Ones
• Binary Classification: Using Penalized Linear Regression to Detect Unexploded Mines
• Build a Rocks Versus Mines Classifier for Deployment
• Multiclass Classification: Classifying Crime Scene Glass Samples
• Linear Regression and Classification Using PySpark
• Using PySpark to Predict Wine Taste
• Logistic Regression with PySpark: Rocks Versus Mines
• Incorporating Categorical Variables in a PySpark Model: Predicting Abalone Rings
• Multiclass Logistic Regression with Meta Parameter Optimization
• ch. 6 Ensemble Methods
• Binary Decision Trees
• How a Binary Decision Tree Generates Predictions
• How to Train a Binary Decision Tree
• Tree Training Equals Split Point Selection
• How Split Point Selection Affects Predictions
• Algorithm for Selecting Split Points
• Contents note continued: Multivariable Tree Training
• -Which Attribute to Split?
• Recursive Splitting for More Tree Depth
• Overfitting Binary Trees
• Measuring Overfit with Binary Trees
• Balancing Binary Tree Complexity for Best Performance
• Modifications for Classification and Categorical Features
• Bootstrap Aggregation: "Bagging"
• How Does the Bagging Algorithm Work?
• Bagging Performance
• -Bias Versus Variance
• How Bagging Behaves on Multivariable Problem
• Bagging Needs Tree Depth for Performance
• Summary of Bagging
• Gradient Boosting
• Basic Principle of Gradient Boosting Algorithm
• Parameter Settings for Gradient Boosting
• How Gradient Boosting Iterates toward a Predictive Model
• Getting the Best Performance from Gradient Boosting
• Gradient Boosting on a Multivariable Problem
• Summary for Gradient Boosting
• Random Forests
• Random Forests: Bagging Plus Random Attribute Subsets
• Random Forests Performance Drivers
• Contents note continued: Random Forests Summary
• ch. 7 Building Ensemble Models with Python
• Solving Regression Problems with Python Ensemble Packages
• Using Gradient Boosting to Predict Wine Taste
• Using the Class Constructor for GradientBoostingRegressor
• Using GradientBoostingRegressor to Implement a Regression Model
• Assessing the Performance of a Gradient Boosting Model
• Building a Random Forest Model to Predict Wine Taste
• Constructing a RandomForestRegressor Object
• Modeling Wine Taste with RandomForestRegressor
• Visualizing the Performance of a Random Forest Regression Model
• Incorporating Non-Numeric Attributes in Python Ensemble Models
• Coding the Sex of Abalone for Gradient Boosting Regression in Python
• Assessing Performance and the Importance of Coded Variables with Gradient Boosting
• Coding the Sex of Abalone for Input to Random Forest Regression in Python
• Assessing Performance and the Importance of Coded Variables
• Contents note continued: Solving Binary Classification Problems with Python Ensemble Methods
• Detecting Unexploded Mines with Python Gradient Boosting
• Determining the Performance of a Gradient Boosting Classifier
• Detecting Unexploded Mines with Python Random Forest
• Constructing a Random Forest Model to Detect Unexploded Mines
• Determining the Performance of a Random Forest Classifier
• Solving Multiclass Classification Problems with Python Ensemble Methods
• Dealing with Class Imbalances
• Classifying Glass Using Gradient Boosting
• Determining the Performance of the Gradient Boosting Model on Glass Classification
• Classifying Glass with Random Forests
• Determining the Performance of the Random Forest Model on Glass Classification
• Solving Regression Problems with PySpark Ensemble Packages
• Predicting Wine Taste with PySpark Ensemble Methods
• Predicting Abalone Age with PySpark Ensemble Methods
• Contents note continued: Distinguishing Mines from Rocks with PySpark Ensemble Methods
• Identifying Glass Types with PySpark Ensemble Methods
• Summary.

ISBN: 9781119561934

LCCN: 2019940771

Subjects--Topical Terms:
Machine learning

Dewey Class. No.: 006.31 BO MA