Python Data Science Training Course

Learn the fundamentals of Python Data Science, and how to apply your knowledge


Our Intro To Programming level is required for entry into this course

Who will benefit

Python-literate coders who want to move into the Data Science Space


Attendance : If you have attended 80% of the sessions and completed all the class work, you qualify for the Attendance Certificate. Competency : If you have also completed all the practical projects as described the Outcomes section, you qualify for the Competency Certificate.

What you will learn

  • Know the Fundamentals of Data Science with Python and the libraries that enable it like Pandas, Numpy, MatPlotLib and more

What do I need?

Live Online Training : A laptop, and a stable internet connection. The recommended minimum speed is around 10 Mbps. Classroom Training : A laptop, please notify us if you are not brining your own laptop. Please see the calendar below for the schedule

Day One1.2.3.4

1. Preliminaries. 
    1.1 What Is This Book About? 

  • What Kinds of Data? 

1.2 Why Python for Data Analysis? 

  • Python as Glue 
  • Solving the “Two-Language” Problem 
  • Why Not Python? 

1.3 Essential Python Libraries 

  • NumPy 
  • pandas 
  • matplotlib 
  • IPython and Jupyter 
  • SciPy 
  • scikit-learn 
  • statsmodels 

1.4 Installation and Setup 

  • Windows 
  • Apple (OS X, macOS) 
  • GNU/Linux 
  • Installing or Updating Python Packages 
  • Python 2 and Python 3 
  • Integrated Development Environments (IDEs) and Text Editors 

1.5 Community and Conferences 

1.6 Navigating This Book 

  • Code Examples 
  • Data for Examples
  • Import Conventions 
  • Jargon 

2. Python Language Basics, IPython, and Jupyter Notebooks.

2.1 The Python Interpreter     

2.2 IPython Basics 

  • Running the IPython Shell 
  • Running the Jupyter Notebook 
  • Tab Completion 
  • Introspection 
  • The %run Command 
  • Executing Code from the Clipboard 
  • Terminal Keyboard Shortcuts 
  • About Magic Commands 
  • Matplotlib Integration 

2.3 Python Language Basics 

  • Language Semantics
  • Scalar Types 
  • Control Flow 

3. Built-in Data Structures, Functions, and Files.

3.1 Data Structures and Sequences 1

  • Tuple 
  • List 
  • Built-in Sequence Functions 
  • dict
  • set 
  • List, Set, and Dict Comprehensions 

3.2 Functions 

  • Namespaces, Scope, and Local Functions 
  • Returning Multiple Values 
  • Functions Are Objects 
  • Anonymous (Lambda) Functions
  • Currying: Partial Argument Application 
  • Generators 
  • Errors and Exception Handling 

3.3 Files and the Operating System 

  • Bytes and Unicode with Files 

3.4 Conclusion 

4. NumPy Basics: Arrays and Vectorized Computation.

 4.1 The NumPy ndarray: A Multidimensional Array Object 

  • Data Types for ndarrays 
  • Arithmetic with NumPy Arrays 
  • Basic Indexing and Slicing 
  • Boolean Indexing 
  • Fancy Indexing 
  • Transposing Arrays and Swapping Axes 

4.2 Universal Functions: Fast Element-Wise Array Functions 

4.3 Array-Oriented Programming with Arrays 

  • Expressing Conditional Logic as Array Operations 
  • Mathematical and Statistical Methods 
  • Methods for Boolean Arrays 
  • Sorting 
  • Unique and Other Set Logic 

4.4 File Input and Output with Arrays 

4.5 Linear Algebra 

4.6 Pseudorandom Number Generation 

4.7 Example: Random Walks 

Day Two5. 6.7.8

5. Getting Started with pandas. 

5.1 Introduction to pandas Data Structures

  • Series 
  • DataFrame
  • Index Objects 

5.2 Essential Functionality 

  • Reindexing 
  • Dropping Entries from an Axis 
  • Indexing, Selection, and Filtering 
  • Integer Indexes 
  • Arithmetic and Data Alignment 
  • Function Application and Mapping 
  • Sorting and Ranking 
  • Axis Indexes with Duplicate Labels 

5.3 Summarizing and Computing Descriptive Statistics 

  • Correlation and Covariance 
  • Unique Values, Value Counts, and Membership 

6. Data Loading, Storage, and File Formats. 

6.1 Reading and Writing Data in Text Format 

  • Reading Text Files in Pieces
  • Writing Data to Text Format 
  • Working with Delimited Formats
  • JSON Data 
  • XML and HTML: Web Scraping

6.2 Binary Data Formats 

  • Using HDF5 Format 
  • Reading Microsoft Excel Files

6.3 Interacting with Web APIs

6.4 Interacting with Databases 

7. Data Cleaning and Preparation. .

7.1 Handling Missing Data  

  • Filtering Out Missing Data 
  • Filling In Missing Data 

7.2 Data Transformation 

  • Removing Duplicates 
  • Transforming Data Using a Function or Mapping 
  • Replacing Values 
  • Renaming Axis Indexes
  • Discretization and Binning 
  • Detecting and Filtering Outliers 
  • Permutation and Random Sampling 
  • Computing Indicator/Dummy Variables 

7.3 String Manipulation

  • String Object Methods 
  • Regular Expressions 
  • Vectorized String Functions in pandas 

8. Data Wrangling: Join, Combine, and Reshape.

8.1 Hierarchical Indexing 

  • Reordering and Sorting Levels 
  • Summary Statistics by Level 
  • Indexing with a DataFrame’s columns 

8.2 Combining and Merging Datasets 

  • Database-Style DataFrame Joins 
  • Merging on Index 
  • Concatenating Along an Axis
  • Combining Data with Overlap

8.3 Reshaping and Pivoting 

  • Reshaping with Hierarchical Indexing 
  • Pivoting “Long” to “Wide” Format 
  • Pivoting “Wide” to “Long” Format 

Day Three9.10.11

9. Plotting and Visualization.

9.1 A Brief matplotlib API Primer 

  • Figures and Subplots 
  • Colors, Markers, and Line Styles
  • Ticks, Labels, and Legends 
  • Annotations and Drawing on a Subplot 
  • Saving Plots to File 
  • matplotlib Configuration 

9.2 Plotting with pandas and seaborn 

  • Line Plots
  • Bar Plots 
  • Histograms and Density Plots 
  • Scatter or Point Plots 
  • Facet Grids and Categorical Data 

9.3 Other Python Visualization Tools

10. Data Aggregation and Group Operations. 

10.1 GroupBy Mechanics 

  • Iterating Over Groups 
  • Selecting a Column or Subset of Columns 
  • Grouping with Dicts and Series 
  • Grouping with Functions 
  • Grouping by Index Levels 

10.2 Data Aggregation 

  • Column-Wise and Multiple Function Application 
  • Returning Aggregated Data Without Row Indexes 

10.3 Apply: General split-apply-combine 

  • Suppressing the Group Keys 
  • Quantile and Bucket Analysis 
  • Example: Filling Missing Values with Group-Specific Values 
  • Example: Random Sampling and Permutation 
  • Example: Group Weighted Average and Correlation
  • Example: Group-Wise Linear Regression

10.4 Pivot Tables and Cross-Tabulation 

  • Cross-Tabulations: Crosstab 

11. Time Series.

11.1 Date and Time Data Types and Tools 

  • Converting Between String and Datetime 

11.2 Time Series Basics 

  • Indexing, Selection, Subsetting 
  • Time Series with Duplicate Indices 

11.3 Date Ranges, Frequencies, and Shifting 

  • Generating Date Ranges 
  • Frequencies and Date Offsets 
  • Shifting (Leading and Lagging) Data 

11.4 Time Zone Handling 

  • Time Zone Localization and Conversion 
  • Operations with Time Zone−Aware Timestamp Objects 
  • Operations Between Different Time Zones

11.5 Periods and Period Arithmetic

  • Period Frequency Conversion
  • Quarterly Period Frequencies 
  • Converting Timestamps to Periods (and Back) 
  • Creating a PeriodIndex from Arrays 

11.6 Resampling and Frequency Conversion 

  • Downsampling 
  • Upsampling and Interpolation
  • Resampling with Periods 

11.7 Moving Window Functions 

  • Exponentially Weighted Functions
  • Binary Moving Window Functions 
  • User-Defined Moving Window Functions

Day Four12.13.14

12. Advanced pandas.

12.1 Categorical Data 

  • Background and Motivation 
  • Categorical Type in pandas
  • Computations with Categoricals
  • Categorical Methods 

12.2 Advanced GroupBy Use 

  • Group Transforms and “Unwrapped” GroupBys 
  • Grouped Time Resampling 

12.3 Techniques for Method Chaining

  • The pipe Method

12.4 Conclusion 

13. Introduction to Modeling Libraries in Python.

13.1 Interfacing Between pandas and Model Code 

13.2 Creating Model Descriptions with Patsy 

  • Data Transformations in Patsy Formulas 
  • Categorical Data and Patsy 

13.3 Introduction to statsmodels 

  • Estimating Linear Models 
  • Estimating Time Series Processes 

13.4 Introduction to scikit-learn 

13.5 Continuing Your Education 

14. Data Analysis Examples.

14.1 Data from Bitly 

  • Counting Time Zones in Pure Python 
  • Counting Time Zones with pandas 

14.2 MovieLens 1M Dataset 

  • Measuring Rating Disagreement 

14.3 US Baby Names 1880–2010 

  • Analyzing Naming Trends 

14.4 USDA Food Database 
14.5 2012 Federal Election Commission Database 

  • Donation Statistics by Occupation and Employer 
  • Bucketing Donation Amounts 
  • Donation Statistics by State 

Day Five

A. Advanced NumPy

A.1 ndarray Object Internals 

  • NumPy dtype Hierarchy 

A.2 Advanced Array Manipulation 

  • Reshaping Arrays 

C Versus Fortran Order 

  • Concatenating and Splitting Arrays
  • Repeating Elements: tile and repeat 
  • Fancy Indexing Equivalents: take and put 

A.3 Broadcasting 

  • Broadcasting Over Other Axes 
  • Setting Array Values by Broadcasting 

A.4 Advanced ufunc Usage 

  • ufunc Instance Methods 
  • Writing New ufuncs in Python 

A.5 Structured and Record Arrays Nested dtypes and Multidimensional Fields 

  • Why Use Structured Arrays? 

A.6 More About Sorting 

  • Indirect Sorts: argsort and lexsort 
  • Alternative Sort Algorithms 
  • Partially Sorting Arrays 
  • numpy.searchsorted: Finding Elements in a Sorted Array 

A.7 Writing Fast NumPy Functions with Numba 

  • Creating Custom numpy.ufunc Objects with Numba 

A.8 Advanced Array Input and Output 

  • Memory-Mapped Files 
  • HDF5 and Other Array Storage Options 

A.9 Performance Tips 

  • The Importance of Contiguous Memory 

B. More on the IPython System.
B.1 Using the Command History 

  • Searching and Reusing the Command History 
  • Input and Output Variables 

B.2 Interacting with the Operating System 

  • Shell Commands and Aliases
  • Directory Bookmark System 

B.3 Software Development Tools 

  • Interactive Debugger 
  • Timing Code: %time and %timeit 
  • Basic Profiling: %prun and %run -p 
  • Profiling a Function Line by Line 

B.4 Tips for Productive Code Development Using IPython 

  • Reloading Module Dependencies 
  • Code Design Tips 

B.5 Advanced IPython Features 

  • Making Your Own Classes IPython-Friendly
  • Profiles and Configuration
Back to top