Machine learning with spark and python : essential techniques for predictive analytics

By: Bowles, Michael

Material type:

BookPublisher: Indianapolis, IN : John Wiley and Sons, c2020.Edition: 2nd ed.Description: xxvii, 340 p. : ill. ; 25 cm.ISBN: 9781119561934 Subject(s): Machine learningDDC classification: 006.31 BO MA Online resources: Location Map

Tags from this library: No tags from this library for this title. Log in to add tags.

Holdings ( 1 )
Title notes

Item type	Home library	Call number	Status	Notes	Date due	Barcode	Item holds
REGULAR	University of Wollongong in Dubai Main Collection	006.31 BO MA (Browse shelf)	Available	Feb2020		T0064182

Total holds: 0

, Shelving location: Main Collection Close shelf browser

Previous								Next
Previous	005.84 UL DI Digital world war :	005.88 LI RA Ransomware :	006.31 AW EF Efficient learning machines :	006.31 BO MA Machine learning with spark and python :	006.312 AA PR Process mining :	006.312 RA HI High-performance big-data analytics :	006.696 DE AU Autodesk 3DS Max 2015	Next

• Machine generated contents note: ch. 1 The Two Essential Algorithms for Making Predictions
• Why Are These Two Algorithms So Useful?
• What Are Penalized Regression Methods?
• What Are Ensemble Methods?
• How to Decide Which Algorithm to Use
• The Process Steps for Building a Predictive Model
• Framing a Machine Learning Problem
• Feature Extraction and Feature Engineering
• Determining Performance of a Trained Model
• Chapter Contents and Dependencies
• Summary
• ch. 2 Understand the Problem by Understanding the Data
• The Anatomy of a New Problem
• Different Types of Attributes and Labels Drive Modeling Choices
• Things to Notice about Your New Data Set
• Classification Problems: Detecting Unexploded Mines Using Sonar
• Physical Characteristics of the Rocks Versus Mines Data Set
• Statistical Summaries of the Rocks Versus Mines Data Set
• Visualization of Outliers Using a Quantile-Quantile Plot
• Statistical Characterization of Categorical Attributes
• Contents note continued: How to Use Python Pandas to Summarize the Rocks Versus Mines Data Set
• Visualizing Properties of the Rocks Versus Mines Data Set
• Visualizing with Parallel Coordinates Plots
• Visualizing Interrelationships between Attributes and Labels
• Visualizing Attribute and Label Correlations Using a Heat Map
• Summarizing the Process for Understanding the Rocks Versus Mines Data Set
• Real-Valued Predictions with Factor Variables: How Old Is Your Abalone?
• Parallel Coordinates for Regression Problems
• -Visualize Variable Relationships for the Abalone Problem
• How to Use a Correlation Heat Map for Regression
• -Visualize Pair-Wise Correlations for the Abalone Problem
• Real-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes
• Multiclass Classification Problem: What Type of Glass Is That?
• Using PySpark to Understand Large Data Sets
• Contents note continued: ch. 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data
• The Basic Problem: Understanding Function Approximation
• Working with Training Data
• Assessing Performance of Predictive Models
• Factors Driving Algorithm Choices and Performance
• -Complexity and Data
• Contrast between a Simple Problem and a Complex Problem
• Contrast between a Simple Model and a Complex Model
• Factors Driving Predictive Algorithm Performance
• Choosing an Algorithm: Linear or Nonlinear?
• Measuring the Performance of Predictive Models
• Performance Measures for Different Types of Problems
• Simulating Performance of Deployed Models
• Achieving Harmony between Model and Data
• Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size
• Using Forward Stepwise Regression to Control Overfitting
• Evaluating and Understanding Your Predictive Model
• Contents note continued: Control Overfitting by Penalizing Regression Coefficients
• -Ridge Regression
• Using PySpark for Training Penalized Regression Models on Extremely Large Data Sets
• ch. 4 Penalized Linear Regression
• Why Penalized Linear Regression Methods Are So Useful
• Extremely Fast Coefficient Estimation
• Variable Importance Information
• Extremely Fast Evaluation When Deployed
• Reliable Performance
• Sparse Solutions
• Problem May Require Linear Model
• When to Use Ensemble Methods
• Penalized Linear Regression: Regulating Linear Regression for Optimum Performance
• Training Linear Models: Minimizing Errors and More
• Adding a Coefficient Penalty to the OLS Formulation
• Other Useful Coefficient Penalties
• -Manhattan and ElasticNet
• Why Lasso Penalty Leads to Sparse Coefficient Vectors
• ElasticNet Penalty Includes Both Lasso and Ridge
• Solving the Penalized Linear Regression Problem
• Contents note continued: Understanding Least Angle Regression and Its Relationship to Forward Stepwise Regression
• How LARS Generates Hundreds of Models of Varying Complexity
• Choosing the Best Model from the Hundreds LARS Generates
• Using Glmnet: Very Fast and Very General
• Comparison of the Mechanics of Glmnet and LARS Algorithms
• Initializing and Iterating the Glmnet Algorithm
• Extension of Linear Regression to Classification Problems
• Solving Classification Problems with Penalized Regression
• Working with Classification Problems Having More Than Two Outcomes
• Understanding Basis Expansion: Using Linear Methods on Nonlinear Problems
• Incorporating Non-Numeric Attributes into Linear Methods
• ch. 5 Building Predictive Models Using Penalized Linear Methods
• Python Packages for Penalized Linear Regression
• Multivariable Regression: Predicting Wine Taste
• Building and Testing a Model to Predict Wine Taste
• Contents note continued: Training on the Whole Data Set before Deployment
• Basis Expansion: Improving Performance by Creating New Variables from Old Ones
• Binary Classification: Using Penalized Linear Regression to Detect Unexploded Mines
• Build a Rocks Versus Mines Classifier for Deployment
• Multiclass Classification: Classifying Crime Scene Glass Samples
• Linear Regression and Classification Using PySpark
• Using PySpark to Predict Wine Taste
• Logistic Regression with PySpark: Rocks Versus Mines
• Incorporating Categorical Variables in a PySpark Model: Predicting Abalone Rings
• Multiclass Logistic Regression with Meta Parameter Optimization
• ch. 6 Ensemble Methods
• Binary Decision Trees
• How a Binary Decision Tree Generates Predictions
• How to Train a Binary Decision Tree
• Tree Training Equals Split Point Selection
• How Split Point Selection Affects Predictions
• Algorithm for Selecting Split Points
• Contents note continued: Multivariable Tree Training
• -Which Attribute to Split?
• Recursive Splitting for More Tree Depth
• Overfitting Binary Trees
• Measuring Overfit with Binary Trees
• Balancing Binary Tree Complexity for Best Performance
• Modifications for Classification and Categorical Features
• Bootstrap Aggregation: "Bagging"
• How Does the Bagging Algorithm Work?
• Bagging Performance
• -Bias Versus Variance
• How Bagging Behaves on Multivariable Problem
• Bagging Needs Tree Depth for Performance
• Summary of Bagging
• Gradient Boosting
• Basic Principle of Gradient Boosting Algorithm
• Parameter Settings for Gradient Boosting
• How Gradient Boosting Iterates toward a Predictive Model
• Getting the Best Performance from Gradient Boosting
• Gradient Boosting on a Multivariable Problem
• Summary for Gradient Boosting
• Random Forests
• Random Forests: Bagging Plus Random Attribute Subsets
• Random Forests Performance Drivers
• Contents note continued: Random Forests Summary
• ch. 7 Building Ensemble Models with Python
• Solving Regression Problems with Python Ensemble Packages
• Using Gradient Boosting to Predict Wine Taste
• Using the Class Constructor for GradientBoostingRegressor
• Using GradientBoostingRegressor to Implement a Regression Model
• Assessing the Performance of a Gradient Boosting Model
• Building a Random Forest Model to Predict Wine Taste
• Constructing a RandomForestRegressor Object
• Modeling Wine Taste with RandomForestRegressor
• Visualizing the Performance of a Random Forest Regression Model
• Incorporating Non-Numeric Attributes in Python Ensemble Models
• Coding the Sex of Abalone for Gradient Boosting Regression in Python
• Assessing Performance and the Importance of Coded Variables with Gradient Boosting
• Coding the Sex of Abalone for Input to Random Forest Regression in Python
• Assessing Performance and the Importance of Coded Variables
• Contents note continued: Solving Binary Classification Problems with Python Ensemble Methods
• Detecting Unexploded Mines with Python Gradient Boosting
• Determining the Performance of a Gradient Boosting Classifier
• Detecting Unexploded Mines with Python Random Forest
• Constructing a Random Forest Model to Detect Unexploded Mines
• Determining the Performance of a Random Forest Classifier
• Solving Multiclass Classification Problems with Python Ensemble Methods
• Dealing with Class Imbalances
• Classifying Glass Using Gradient Boosting
• Determining the Performance of the Gradient Boosting Model on Glass Classification
• Classifying Glass with Random Forests
• Determining the Performance of the Random Forest Model on Glass Classification
• Solving Regression Problems with PySpark Ensemble Packages
• Predicting Wine Taste with PySpark Ensemble Methods
• Predicting Abalone Age with PySpark Ensemble Methods
• Contents note continued: Distinguishing Mines from Rocks with PySpark Ensemble Methods
• Identifying Glass Types with PySpark Ensemble Methods
• Summary.

Browse results

Previous
Back to results
Next

UOWD Library

Machine learning with spark and python : essential techniques for predictive analytics

By: Bowles, Michael

, Shelving location: Main Collection Close shelf browser