Learn Python by Building Data Science Applications
Philipp Kats David Katz更新时间:2021-06-24 13:07:03
最新章节:Leave a review - let other readers know what you thinkcoverpage
Title Page
Copyright and Credits
Learn Python by Building Data Science Applications
About Packt
Why subscribe?
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Code in Action
Conventions used
Get in touch
Reviews
Section 1: Getting Started with Python
Preparing the Workspace
Technical requirements
Installing Python
Downloading materials for running the code
Installing Python packages
Working with VS Code
The VS Code interface
Beginning with Jupyter
Notebooks
The Jupyter interface
Pre-flight check
Summary
Questions
Further reading
First Steps in Coding - Variables and Data Types
Technical requirements
Assigning variables
Naming the variable
Understanding data types
Floats and integers
Operations with self-assignment
Order of execution
Strings
Formatting
Format method
F-strings
Legacy formatting
Formatting mini-language
Strings as sequences
Booleans
Logical operators
Converting the data types
Exercise
Summary
Questions
Further reading
Functions
Technical requirements
Understanding a function
Interface functions
The input function
The eval function
Variable properties
The help function
The type function
The isinstance function
dir
Math
abs
The round function
Iterables
The len function
The sorted function
The range function
The all and any functions
The max min and sum functions
Defining the function
Default values
Var-positional and var-keyword
Docstrings
Type annotations
Refactoring the temperature conversion
Understanding anonymous (lambda) functions
Understanding recursion
Summary
Questions
Further reading
Data Structures
Technical requirements
What are data structures?
Lists
Slicing
Tuples
Immutability
Dictionaries
Sets
More data structures
frozenset
defaultdict
Counter
Queue
deque
namedtuple
Enumerations
Using generators
Useful functions to use with data structures
The sum max and min functions
The all and any functions
The zip function
The map filter and reduce functions
Comprehensions
Summary
Questions
Further reading
Loops and Other Compound Statements
Technical requirements
Understanding if else and elif statements
Inline if statements
Using if in a comprehension
Running code many times with loops
The for loop
itertools
cycle
chain
product
Enumeration
The while loop
Additional loop functionality – break and continue
Handling exceptions with try/except and try/finally
Exceptions
try/except
try/except/finally
Understanding the with statements
Summary
Questions
Further reading
First Script – Geocoding with Web APIs
Technical requirements
Geocoding as a service
Learning about web APIs
Working with HTTPS
Working with the Nominatim API
The requests library
Starting to code
Caching with decorators
Reading and writing data
Geocoding the addresses
Moving code to a separate module
Collecting NYC Open Data from the Socrata service
Summary
Questions
Further reading
Scraping Data from the Web with Beautiful Soup 4
Technical requirements
When there is no API
HTML in a nutshell
Scraping with Beautiful Soup 4
CSS and XPath selectors
Developer console
Scraping WWII battles
Step 1 – Scraping the list of battles
Unordered list
Step 2 – Scraping information from the Wiki page
Key information
Additional information
Step 3 – Scraping data as a whole
Quality control
Beyond Beautiful Soup
Summary
Questions
Further reading
Simulation with Classes and Inheritance
Technical requirements
Understanding classes
Special (dunder) methods
__init__
__repr__ and __str__
Arithmetical and logical operations
Equality/relationship methods
__len__
__getitem__
__class__
Inheritance
Using super()
Data classes
Using classes in simulation
Writing the base classes
Writing the Island class
Herbivore haven
Harsh islands
Visualization
Summary
Questions
Further reading
Shell Git Conda and More – at Your Command
Technical requirements
Shell
Pipes
Executing Python scripts
Command-line interface
Git
Concept
GitHub
Practical example
gitignore
Conda
Conda for virtual environments
Conda and Jupyter
Make
Cookiecutter
Summary
Questions
Section 2: Hands-On with Data
Python for Data Applications
Technical requirements
Introducing Python for data science
Exploring NumPy
Beginning with pandas
Trying SciPy and scikit-learn
Understanding Jupyter
Summary
Questions
Data Cleaning and Manipulation
Technical requirements
Getting started with pandas
Selection – by columns indices or both
Masking
Data types and data conversion
Math
Merging
Working with real data
Initial exploration
Defining the scope of work to be done
Getting to know regular expressions
Parsing locations
Geocoding
Time
Belligerents
Understanding casualties
Multilevel slicing
Quality assurance
Writing the file
Summary
Questions
Further reading
Data Exploration and Visualization
Technical requirements
Exploring the dataset
Descriptive statistics
Data visualization with matplotlib (and its pandas interface)
Aggregating the data to calculate summary statistics
Resampling
Mapping
Declarative visualization with vega and altair
Drawing maps with Altair
Storing the Altair chart
Big data visualization with datashader
Summary
Questions
Further reading
Training a Machine Learning Model
Technical requirements
Understanding the basics of ML
Exploring unsupervised learning
Moving on to supervised learning
k-nearest neighbors
Linear regression
Decision trees
Summary
Questions
Further reading
Improving Your Model – Pipelines and Experiments
Technical requirements
Understanding cross-validation
Exploring feature engineering
Failed attempts
Optimizing the hyperparameters
Using a random forest model
Tracking your data and metrics with version control
Starting with data
Adding code to the equation
Metrics
Summary
Questions
Further reading
Section 3: Moving to Production
Packaging and Testing with Poetry and PyTest
Technical requirements
Building a package
Bringing your own package
Using a package manager – pip and conda
Creating a package scaffolding
A few ways to build your package
Trying out code with Poetry
Adding actual code
Defining dependencies
Non-code resources
Publishing the package
Development workflow
Testing the code so far
Testing with PyTest
Writing our own tests
Automating the process with CI services
Generating documentation generation with sphinx
Installing a package in editable mode
Summary
Questions
Further reading
Data Pipelines with Luigi
Technical requirements
Introducing the ETL pipeline
Redesigning your code as a pipeline
Building our first task in Luigi
Connecting the dots
Understanding time-based tasks
Scheduling with cron
Exploring the different output formats
Writing to an S3 bucket
Writing to SQL
Expanding Luigi with custom template classes
Summary
Questions
Further reading
Let's Build a Dashboard
Technical requirements
Building a dashboard – three types of dashboard
Static dashboards
Debugging Altair
Connecting your app to the Luigi pipeline
Understanding dynamic dashboards
First try with panel
Reading data from the database
Creating an interactive dashboard in Jupyter
Summary
Questions
Further reading
Serving Models with a RESTful API
Technical requirements
What is a RESTful API?
Python web frameworks
Building a basic API service
Exploring service with OpenAPI
Finalizing our naive first iteration
Data validation
Sending data in with POST requests
Adding features to our service
Building a web page
Speeding up with asynchronous calls
Deploying and testing your API loads with Locust
Summary
Questions
Further reading
Serverless API Using Chalice
Technical requirements
Understanding serverless
Getting started with Chalice
Setting up a simple model
Externalizing medians
Building a serverless API for an ML model
When we're still out of memory
Building a serverless function as a data pipeline
S3-triggered events
Summary
Questions
Further reading
Best Practices and Python Performance
Technical requirements
Speeding up your Python code
Rewriting the code with NumPy
Specialized data structures and algorithms
Dask
Dask-ML
Numba
Concurrency and parallelism
Different types of concurrency
Two types of problems
Before you start rewriting your code
Using best practices for coding in your project
Code formatting with black
Measuring code quality with Wily
Writing tests with hypothesis
Beyond this book – packages and technologies to look out for
Different Python flavors
Docker containers
Kubernetes
Summary
Questions
Further reading
Assessments
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 20
Other Books You May Enjoy
Leave a review - let other readers know what you think
更新时间:2021-06-24 13:07:03