Machine learning with spark and python : (Record no. 37394)

LIBRARY OF CONGRESS CONTROL NUMBER
LC control number 2019940771
INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781119561934
DEWEY DECIMAL CLASSIFICATION NUMBER
Call number 006.3​1 BO MA
MAIN ENTRY--PERSONAL NAME
Authors Bowles, Michael
TITLE STATEMENT
Title Machine learning with spark and python :
Subtitle essential techniques for predictive analytics
Statement of responsibility, etc Michael Bowles
EDITION STATEMENT
Edition 2nd ed.
PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication Indianapolis, IN :
Publisher John Wiley and Sons,
Date c2020.
PHYSICAL DESCRIPTION
Extent xxvii, 340 p. :
Other Details ill. ;
Size 25 cm.
CONTENTS
Contents • Machine generated contents note: ch. 1 The Two Essential Algorithms for Making Predictions
• Why Are These Two Algorithms So Useful?
• What Are Penalized Regression Methods?
• What Are Ensemble Methods?
• How to Decide Which Algorithm to Use
• The Process Steps for Building a Predictive Model
• Framing a Machine Learning Problem
• Feature Extraction and Feature Engineering
• Determining Performance of a Trained Model
• Chapter Contents and Dependencies
• Summary
• ch. 2 Understand the Problem by Understanding the Data
• The Anatomy of a New Problem
• Different Types of Attributes and Labels Drive Modeling Choices
• Things to Notice about Your New Data Set
• Classification Problems: Detecting Unexploded Mines Using Sonar
• Physical Characteristics of the Rocks Versus Mines Data Set
• Statistical Summaries of the Rocks Versus Mines Data Set
• Visualization of Outliers Using a Quantile-Quantile Plot
• Statistical Characterization of Categorical Attributes
• Contents note continued: How to Use Python Pandas to Summarize the Rocks Versus Mines Data Set
• Visualizing Properties of the Rocks Versus Mines Data Set
• Visualizing with Parallel Coordinates Plots
• Visualizing Interrelationships between Attributes and Labels
• Visualizing Attribute and Label Correlations Using a Heat Map
• Summarizing the Process for Understanding the Rocks Versus Mines Data Set
• Real-Valued Predictions with Factor Variables: How Old Is Your Abalone?
• Parallel Coordinates for Regression Problems
• -Visualize Variable Relationships for the Abalone Problem
• How to Use a Correlation Heat Map for Regression
• -Visualize Pair-Wise Correlations for the Abalone Problem
• Real-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes
• Multiclass Classification Problem: What Type of Glass Is That?
• Using PySpark to Understand Large Data Sets
• Contents note continued: ch. 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data
• The Basic Problem: Understanding Function Approximation
• Working with Training Data
• Assessing Performance of Predictive Models
• Factors Driving Algorithm Choices and Performance
• -Complexity and Data
• Contrast between a Simple Problem and a Complex Problem
• Contrast between a Simple Model and a Complex Model
• Factors Driving Predictive Algorithm Performance
• Choosing an Algorithm: Linear or Nonlinear?
• Measuring the Performance of Predictive Models
• Performance Measures for Different Types of Problems
• Simulating Performance of Deployed Models
• Achieving Harmony between Model and Data
• Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size
• Using Forward Stepwise Regression to Control Overfitting
• Evaluating and Understanding Your Predictive Model
• Contents note continued: Control Overfitting by Penalizing Regression Coefficients
• -Ridge Regression
• Using PySpark for Training Penalized Regression Models on Extremely Large Data Sets
• ch. 4 Penalized Linear Regression
• Why Penalized Linear Regression Methods Are So Useful
• Extremely Fast Coefficient Estimation
• Variable Importance Information
• Extremely Fast Evaluation When Deployed
• Reliable Performance
• Sparse Solutions
• Problem May Require Linear Model
• When to Use Ensemble Methods
• Penalized Linear Regression: Regulating Linear Regression for Optimum Performance
• Training Linear Models: Minimizing Errors and More
• Adding a Coefficient Penalty to the OLS Formulation
• Other Useful Coefficient Penalties
• -Manhattan and ElasticNet
• Why Lasso Penalty Leads to Sparse Coefficient Vectors
• ElasticNet Penalty Includes Both Lasso and Ridge
• Solving the Penalized Linear Regression Problem
• Contents note continued: Understanding Least Angle Regression and Its Relationship to Forward Stepwise Regression
• How LARS Generates Hundreds of Models of Varying Complexity
• Choosing the Best Model from the Hundreds LARS Generates
• Using Glmnet: Very Fast and Very General
• Comparison of the Mechanics of Glmnet and LARS Algorithms
• Initializing and Iterating the Glmnet Algorithm
• Extension of Linear Regression to Classification Problems
• Solving Classification Problems with Penalized Regression
• Working with Classification Problems Having More Than Two Outcomes
• Understanding Basis Expansion: Using Linear Methods on Nonlinear Problems
• Incorporating Non-Numeric Attributes into Linear Methods
• ch. 5 Building Predictive Models Using Penalized Linear Methods
• Python Packages for Penalized Linear Regression
• Multivariable Regression: Predicting Wine Taste
• Building and Testing a Model to Predict Wine Taste
• Contents note continued: Training on the Whole Data Set before Deployment
• Basis Expansion: Improving Performance by Creating New Variables from Old Ones
• Binary Classification: Using Penalized Linear Regression to Detect Unexploded Mines
• Build a Rocks Versus Mines Classifier for Deployment
• Multiclass Classification: Classifying Crime Scene Glass Samples
• Linear Regression and Classification Using PySpark
• Using PySpark to Predict Wine Taste
• Logistic Regression with PySpark: Rocks Versus Mines
• Incorporating Categorical Variables in a PySpark Model: Predicting Abalone Rings
• Multiclass Logistic Regression with Meta Parameter Optimization
• ch. 6 Ensemble Methods
• Binary Decision Trees
• How a Binary Decision Tree Generates Predictions
• How to Train a Binary Decision Tree
• Tree Training Equals Split Point Selection
• How Split Point Selection Affects Predictions
• Algorithm for Selecting Split Points
• Contents note continued: Multivariable Tree Training
• -Which Attribute to Split?
• Recursive Splitting for More Tree Depth
• Overfitting Binary Trees
• Measuring Overfit with Binary Trees
• Balancing Binary Tree Complexity for Best Performance
• Modifications for Classification and Categorical Features
• Bootstrap Aggregation: "Bagging"
• How Does the Bagging Algorithm Work?
• Bagging Performance
• -Bias Versus Variance
• How Bagging Behaves on Multivariable Problem
• Bagging Needs Tree Depth for Performance
• Summary of Bagging
• Gradient Boosting
• Basic Principle of Gradient Boosting Algorithm
• Parameter Settings for Gradient Boosting
• How Gradient Boosting Iterates toward a Predictive Model
• Getting the Best Performance from Gradient Boosting
• Gradient Boosting on a Multivariable Problem
• Summary for Gradient Boosting
• Random Forests
• Random Forests: Bagging Plus Random Attribute Subsets
• Random Forests Performance Drivers
• Contents note continued: Random Forests Summary
• ch. 7 Building Ensemble Models with Python
• Solving Regression Problems with Python Ensemble Packages
• Using Gradient Boosting to Predict Wine Taste
• Using the Class Constructor for GradientBoostingRegressor
• Using GradientBoostingRegressor to Implement a Regression Model
• Assessing the Performance of a Gradient Boosting Model
• Building a Random Forest Model to Predict Wine Taste
• Constructing a RandomForestRegressor Object
• Modeling Wine Taste with RandomForestRegressor
• Visualizing the Performance of a Random Forest Regression Model
• Incorporating Non-Numeric Attributes in Python Ensemble Models
• Coding the Sex of Abalone for Gradient Boosting Regression in Python
• Assessing Performance and the Importance of Coded Variables with Gradient Boosting
• Coding the Sex of Abalone for Input to Random Forest Regression in Python
• Assessing Performance and the Importance of Coded Variables
• Contents note continued: Solving Binary Classification Problems with Python Ensemble Methods
• Detecting Unexploded Mines with Python Gradient Boosting
• Determining the Performance of a Gradient Boosting Classifier
• Detecting Unexploded Mines with Python Random Forest
• Constructing a Random Forest Model to Detect Unexploded Mines
• Determining the Performance of a Random Forest Classifier
• Solving Multiclass Classification Problems with Python Ensemble Methods
• Dealing with Class Imbalances
• Classifying Glass Using Gradient Boosting
• Determining the Performance of the Gradient Boosting Model on Glass Classification
• Classifying Glass with Random Forests
• Determining the Performance of the Random Forest Model on Glass Classification
• Solving Regression Problems with PySpark Ensemble Packages
• Predicting Wine Taste with PySpark Ensemble Methods
• Predicting Abalone Age with PySpark Ensemble Methods
• Contents note continued: Distinguishing Mines from Rocks with PySpark Ensemble Methods
• Identifying Glass Types with PySpark Ensemble Methods
• Summary.

SUBJECT ADDED ENTRY--TOPICAL TERM
Topical Heading Machine learning
ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier https://uowd.box.com/s/5tfcyofz1iagzl63whqgic1sfxmdzuij
Public note Location Map
MAIN ENTRY--PERSONAL NAME
-- 34086
SUBJECT ADDED ENTRY--TOPICAL TERM
-- 5121
Holdings
Lost status Source of classification or shelving scheme Damaged status Not for loan Permanent location Current location Shelving location Date acquired Source of acquisition Full call number Barcode Date last seen Price effective from Koha item type Public note
        University of Wollongong in Dubai University of Wollongong in Dubai Main Collection 2020-01-19 AMAUK 006.3​1 BO MA T0064182 2020-01-14 2020-01-14 REGULAR Feb2020

Powered by Koha