A comprehensive overview of statistical data analysis research, featuring real-world case studies and applications How should data analysis be taught? How valid are the results? How should one deal with inhomogeneous data? What kinds of computing languages should be used, if used at all? These are but a few of the many challenging questions surrounding the fundamentals of data analysis. Data Analysis: What Can Be Learned from the Past 50 Years explores the historical and philosophical implications inherent in any study of statistical data analysis. This book addresses the needs of researchers who are working with larger, complicated data sets by offering an understanding of the significance of robust data sets, the implementation of software languages, and the use of models.
Rather than focus on specific procedures, this book concentrates on general insights that can be drawn from data analysis research. The author utilizes case studies to explore the impact of technological advances on data analysis techniques and other thought-provoking issues, including:
- Homogeneous, unstructured data
- Statistical pitfalls
- Singular value decomposition
- Nonlinear weighted least squares
- Simulation of stochastic models
- Scatter- and curve-plots
With plentiful examples that showcase best practices for working with challenges in the field, Data Analysis is an excellent supplement for courses on data analysis, robust statistics, data mining, and computational statistics at the upper-undergraduate and graduate levels. It is also a valuable reference for applied statisticians working in the fields of business, engineering, and the life and health sciences.
Inhaltsverzeichnis
Preface. 1 What is Data Analysis?
1. 1 Tukey's 1962 paper.
1. 2 The Path of Statistics.
2 Strategy Issues in Data Analysis.
2. 1 Strategy in Data Analysis.
2. 2 Philosophical issues.
2. 3 Issues of size.
2. 4 Strategic planning.
2. 5 The stages of data analysis.
2. 6 Tools required for strategy reasons.
3 Massive Data Sets.
3. 1 Introduction.
3. 2 Disclosure: Personal experiences.
3. 3 What is i massive? A classification of size.
3. 4 Obstacles to scaling.
3. 5 On the structure of large data sets.
3. 6 Data base management and related issues.
3. 7 The stages of a data analysis.
3. 8 Examples and some thoughts on strategy.
3. 9 Volume reduction.
3. 10 Supercomputers and software challenges.
3. 11 Summary of conclusions.
4 Languages for Data Analysis.
4. 1 Goals and purposes.
4. 2 Natural languages and computing languages.
4. 3 Interface issues.
4. 4 Miscellaneous issues.
4. 5 Requirements for a general purpose immediate language.
5 Approximate Models.
5. 1 Models.
5. 2 Bayesian modeling.
5. 3 Mathematical statistics and approximate models.
5. 4 Statistical significance and physical relevance.
5. 5 Judicious use of a wrong model.
5. 6 Composite models.
5. 7 Modeling the length of day.
5. 8 The role of simulation.
5. 9 Summary of conclusions.
6 Pitfalls.
6. 1 Simpson's paradox.
6. 2 Missing data.
6. 3 Regression of Y on X or of X on Y.
7 Create order in data.
7. 1 General considerations.
7. 2 Principal component methods.
7. 3 Multidimensional scaling.
7. 4 Correspondence analysis.
7. 5 Multidimensional scaling vs. Correspondence analysis.
8 More case studies.
8. 1 A nutshell example.
8. 2 Shape invariant modeling.
8. 3 Comparison of point configurations.
8. 4 Notes on numerical optimization.
References.
Index.