Hands-On Exploratory Data Analysis with Python
Suresh Kumar Mukhiya Usman Ahmed更新时间:2021-06-24 16:45:36
最新章节:Leave a review - let other readers know what you think封面
Title Page
Copyright and Credits
Hands-On Exploratory Data Analysis with Python
About Packt
Why subscribe?
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: The Fundamentals of EDA
Exploratory Data Analysis Fundamentals
Understanding data science
The significance of EDA
Steps in EDA
Making sense of data
Numerical data
Discrete data
Continuous data
Categorical data
Measurement scales
Nominal
Ordinal
Interval
Ratio
Comparing EDA with classical and Bayesian analysis
Software tools available for EDA
Getting started with EDA
NumPy
Pandas
SciPy
Matplotlib
Summary
Further reading
Visual Aids for EDA
Technical requirements
Line chart
Steps involved
Bar charts
Scatter plot
Bubble chart
Scatter plot using seaborn
Area plot and stacked plot
Pie chart
Table chart
Polar chart
Histogram
Lollipop chart
Choosing the best chart
Other libraries to explore
Summary
Further reading
EDA with Personal Email
Technical requirements
Loading the dataset
Data transformation
Data cleansing
Loading the CSV file
Converting the date
Removing NaN values
Applying descriptive statistics
Data refactoring
Dropping columns
Refactoring timezones
Data analysis
Number of emails
Time of day
Average emails per day and hour
Number of emails per day
Most frequently used words
Summary
Further reading
Data Transformation
Technical requirements
Background
Merging database-style dataframes
Concatenating along with an axis
Using df.merge with an inner join
Using the pd.merge() method with a left join
Using the pd.merge() method with a right join
Using pd.merge() methods with outer join
Merging on index
Reshaping and pivoting
Transformation techniques
Performing data deduplication
Replacing values
Handling missing data
NaN values in pandas objects
Dropping missing values
Dropping by rows
Dropping by columns
Mathematical operations with NaN
Filling missing values
Backward and forward filling
Interpolating missing values
Renaming axis indexes
Discretization and binning
Outlier detection and filtering
Permutation and random sampling
Random sampling without replacement
Random sampling with replacement
Computing indicators/dummy variables
String manipulation
Benefits of data transformation
Challenges
Summary
Further reading
Section 2: Descriptive Statistics
Descriptive Statistics
Technical requirements
Understanding statistics
Distribution function
Uniform distribution
Normal distribution
Exponential distribution
Binomial distribution
Cumulative distribution function
Descriptive statistics
Measures of central tendency
Mean/average
Median
Mode
Measures of dispersion
Standard deviation
Variance
Skewness
Kurtosis
Types of kurtosis
Calculating percentiles
Quartiles
Visualizing quartiles
Summary
Further reading
Grouping Datasets
Technical requirements
Understanding groupby()
Groupby mechanics
Selecting a subset of columns
Max and min
Mean
Data aggregation
Group-wise operations
Renaming grouped aggregation columns
Group-wise transformations
Pivot tables and cross-tabulations
Pivot tables
Cross-tabulations
Summary
Further reading
Correlation
Technical requirements
Introducing correlation
Types of analysis
Understanding univariate analysis
Understanding bivariate analysis
Understanding multivariate analysis
Discussing multivariate analysis using the Titanic dataset
Outlining Simpson's paradox
Correlation does not imply causation
Summary
Further reading
Time Series Analysis
Technical requirements
Understanding the time series dataset
Fundamentals of TSA
Univariate time series
Characteristics of time series data
TSA with Open Power System Data
Data cleaning
Time-based indexing
Visualizing time series
Grouping time series data
Resampling time series data
Summary
Further reading
Section 3: Model Development and Evaluation
Hypothesis Testing and Regression
Technical requirements
Hypothesis testing
Hypothesis testing principle
statsmodels library
Average reading time
Types of hypothesis testing
T-test
p-hacking
Understanding regression
Types of regression
Simple linear regression
Multiple linear regression
Nonlinear regression
Model development and evaluation
Constructing a linear regression model
Model evaluation
Computing accuracy
Understanding accuracy
Implementing a multiple linear regression model
Summary
Further reading
Model Development and Evaluation
Technical requirements
Types of machine learning
Understanding supervised learning
Regression
Classification
Understanding unsupervised learning
Applications of unsupervised learning
Clustering using MiniBatch K-means clustering
Extracting keywords
Plotting clusters
Word cloud
Understanding reinforcement learning
Difference between supervised and reinforcement learning
Applications of reinforcement learning
Unified machine learning workflow
Data preprocessing
Data collection
Data analysis
Data cleaning normalization and transformation
Data preparation
Training sets and corpus creation
Model creation and training
Model evaluation
Best model selection and evaluation
Model deployment
Summary
Further reading
EDA on Wine Quality Data Analysis
Technical requirements
Disclosing the wine quality dataset
Loading the dataset
Descriptive statistics
Data wrangling
Analyzing red wine
Finding correlated columns
Alcohol versus quality
Alcohol versus pH
Analyzing white wine
Red wine versus white wine
Adding a new attribute
Converting into a categorical column
Concatenating dataframes
Grouping columns
Univariate analysis
Multivariate analysis on the combined dataframe
Discrete categorical attributes
3-D visualization
Model development and evaluation
Summary
Further reading
Appendix
String manipulation
Creating strings
Accessing characters in Python
String slicing
Deleting/updating from a string
Escape sequencing in Python
Formatting strings
Using pandas vectorized string functions
Using string functions with a pandas DataFrame
Using regular expressions
Further reading
Other Books You May Enjoy
Leave a review - let other readers know what you think
更新时间:2021-06-24 16:45:36