Statistical disclosure control for microdata : Matthias Templ
By: Templ, Matthias
Material type: BookPublisher: Cham, Switzerland : Springer, c2017.Description: xix, 287 p. : ill. ; 25 cm.ISBN: 9783319502700Subject(s): Mathematical statistics -- Data processing | R (Computer program language) | MATHEMATICS / Applied | MATHEMATICS / Probability & Statistics / GeneralDDC classification: 519.5 TE ST Online resources: Location MapItem type | Home library | Call number | Status | Date due | Barcode | Item holds |
---|---|---|---|---|---|---|
REGULAR | University of Wollongong in Dubai Main Collection | 519.5 TE ST (Browse shelf) | Available | T0056683 |
, Shelving location: Main Collection Close shelf browser
No cover image available | ||||||||
519.5 SP ST Statistics / | 519.5 ST AT Statistical and mathematical sciences and their applications / | 519.5 ST AT Statistics decisions through data / | 519.5 TE ST Statistical disclosure control for microdata : | 519.5 TR BU Business statistics : | 519.5 UT MI Mind on statistics / | 519.5 UT MI Mind on statistics / |
Preface; Overview of the Book; Acknowledgements; Contents; Acronyms; 1 Software; 1.1 Prerequisites; 1.1.1 Installation and Updates; 1.1.2 Install sdcMicro and Its Browser-Based Point-and-Click App; 1.1.3 Updating the SDC Tools; 1.1.4 Help; 1.1.5 The R Workspace and the Working Directory; 1.1.6 Data Types; 1.1.7 Generic Functions, Methods and Classes; 1.2 Brief Overview on SDC Software Tools; 1.3 Differences Between SDC Tools; 1.4 Working with sdcMicro; 1.4.1 General Information About sdcMicro; 1.4.2 S4 Class Structure of the sdcMicro Package; 1.4.3 Utility Functions 1.4.4 Reporting Facilities1.5 The Point-and-Click App sdcApp; 1.6 The simPop package; References; 2 Basic Concepts; 2.1 Types of Variables; 2.1.1 Non-confidential Variables; 2.1.2 Identifying Variables; 2.1.3 Sensitive Variables; 2.1.4 Linked Variables; 2.1.5 Sampling Weights; 2.1.6 Hierarchies, Clusters and Strata; 2.1.7 Categorical Versus Continuous Variables; 2.2 Types of Disclosure; 2.2.1 Identity Disclosure; 2.2.2 Attribute Disclosure; 2.2.3 Inferential Disclosure; 2.3 Disclosure Risk Versus Information Loss and Data Utility; 2.4 Release Types; 2.4.1 Public Use Files (PUF) 2.4.2 Scientific Use Files (SUF)2.4.3 Controlled Research Data Center; 2.4.4 Remote Execution; 2.4.5 Remote Access; References; 3 Disclosure Risk; 3.1 Introduction; 3.2 Frequency Counts; 3.2.1 The Number of Cells of Equal Size; 3.2.2 Frequency Counts with Missing Values; 3.2.3 Sample Frequencies in sdcMicro; 3.3 Principles of k-anonymity and l-diversity; 3.3.1 Simplified Estimation of Population Frequency Counts; 3.4 Special Uniques Detection Algorithm (SUDA); 3.4.1 Minimal Sample Uniqueness; 3.4.2 SUDA Scores; 3.4.3 SUDA DIS Scores; 3.4.4 SUDA in sdcMicro; 3.5 The Individual Risk Approach 3.5.1 The Benedetti-Franconi Model for Risk Estimation3.6 Disclosure Risks for Hierarchical Data; 3.7 Measuring Global Risks; 3.7.1 Measuring the Global Risk Using Log-Linear Models:; 3.7.2 Standard Log-Linear Model; 3.7.3 Clogg and Eliason Method; 3.7.4 Pseudo Maximum Likelihood Method; 3.7.5 Weighted Log-Linear Model; 3.8 Application of the Log-Linear Models; 3.9 Global Risk Measures; 3.10 Quality of the Risk Measures Under Different Sampling Designs; 3.11 Disclosure Risk for Continuous Variables; 3.12 Special Treatment of Outliers When Calculating Disclosure Risks; References 4 Methods for Data Perturbation4.1 Kind of Methods; 4.2 Methods for Categorical Key Variables; 4.2.1 Recoding; 4.2.2 Local Suppression; 4.2.3 Post-randomization Method (PRAM); 4.3 Methods for Continuous Key Variables; 4.3.1 Microaggregation; 4.3.2 Noise Addition; 4.3.3 Shuffling; References; 5 Data Utility and Information Loss; 5.1 Element-Wise Comparisons; 5.1.1 Comparing Missing Values; 5.1.2 Comparing Aggregated Information; 5.2 Element-Wise Measures for Continuous Variables; 5.2.1 Element-Wise Comparisons of Mixed Scaled Variables; 5.3 Entropy; 5.4 Propensity Score Methods
This book on statistical disclosure control presents the theory, applications and software implementation of the traditional approach to (micro)data anonymization, including data perturbation methods, disclosure risk, data utility, information loss and methods for simulating synthetic data. Introducing readers to the R packages sdcMicro and simPop, the book also features numerous examples and exercises with solutions, as well as case studies with real-world data, accompanied by the underlying R code to allow readers to reproduce all results.