# Foundation of Data Science

5-6 hours per week

#### Course Information

Course Length : **9**** weeks**

Estimated Effort : **5 – 6 hrs per week**

#### Course Overview

Data Science along with artificial intelligence (AI) and its various components such as statistical learning (SL), machine learning (ML) and deep learning algorithms (DL) are recognized as main drivers of organizational value creation. According to Dr Jim Gray, Data Science is the fourth paradigm which drives innovative solutions to organizational problems.

In this course we will start with basic concepts in probability such as joint and conditional probabilities. We will discuss the implementation of these concepts in ML algorithms for Market Basket Analysis and Recommender Systems. After covering basic probability concepts, we move on to random variables, discrete and continuous probability distributions, sampling, estimation and central limit theorem.

An important step in ML model building is feature selection to avoid overfitting and underfitting. ML models such as regression and logistic regression use hypothesis testing to select features. We will discuss various hypothesis tests and how they are used in feature selection.

Every ML model has an optimization stage, either to fine-tune the feature weights, or to find an optimal set of features. We will discuss important optimization techniques, and algorithms such as Gradient Descent, that play an important role in AI and ML model development.

Data must be represented in a matrix for AI and ML model development. Matrix operations such as matrix inverse and multiplication are elementary steps in model development. These fundamental concepts in linear algebra will be discussed.

#### Professor U Dinesh Kumar

**Course Instructor**

Decision Sciences & Information Systems

Indian Institute of Management Bangalore (IIMB)

#### Course Syllabus

##### Week 1: Descriptive Statistics and Data Visualization

- Introduction
- Data Types and Scales
- Population and Sample
- Measures of Central Tendency
- Measures of Variation
- Measures of Shape
- Data Visualization
- Demo Using Excel and Tableau

##### Week 2: Network mobilization

- Introduction
- Probability Theory-Terminology
- Axioms of Probability
- Bayes’ Theorem
- Random Variables
- PDF& CDF of Continuous
- Random Variable
- Binomial Distribution
- Poisson Distribution
- Geometric Distribution
- Uniform Distribution
- Exponential Distribution
- Normal Distribution
- Chi-Square Distribution
- Student’s t-Distribution
- F-Distribution
- Tutorials

##### Week 3: Sampling and Estimation

- Introduction
- Population Parameter & Sample Statistic
- Sampling
- Probabilistic Sampling
- Non-Probability Sampling
- Sampling Distribution
- Central Limit Theorem
- Sample Size Estimation for Mean of the Population
- Estimation of Population Parameters
- Method of Moments
- Estimation of Parameters Using Maximum Likelihood Estimation

##### Week 4: Confidence Intervals

- Introduction
- CI for Population Mean
- CI for Population Proportion
- CI for Population Mean when Standard Deviation is unknown
- CI for Population Variance

##### Week 5: Hypothesis Testing

- Introduction
- Setting up a Hypothesis Test
- One-Tailed and Two-Tailed Test
- Type I Error, Type II Error, and Power of the Hypothesis Test
- Hypothesis testing for Population Mean with Known Variance: Z-Test
- Hypothesis testing for Population Proportion: Z-Test
- Hypothesis test for Population Mean under Unknown Population Variance: t-test
- Paired Sample t-test
- Two-Sample Z and t-test
- Two-Sample Z-Test for Proportions
- Effect Size: Cohen’s D
- Hypothesis Test for Equality of Population Variances
- Non-Parametric Tests: Chi-Square Tests
- Tutorials

##### Week 6: Analysis of Variance

- Introduction
- Multiple t-Tests for Comparing Several Means
- One-way ANOVA
- Two-way ANOVA
- Tutorials

##### Week 7: Correlation Analysis

- Introduction
- Pearson Correlation Coefficient
- Spearman Rank Correlation
- Point Bi-Serial Correlation
- The Phi-Coefficient

##### Week 8: Applied Linear Algebra

- Why do we need Linear Algebra?
- Matrix Algebra and Operations
- Eigen Values and Eigen Vectors
- Linear Algebra in Dimensionality Reduction
- Linear Algebra in Natural Language Processing
- Linear Algebra in Machine Learning

By the end of this course you will be able to:

- Describe the role of probability theory, optimization and linear algebra in the field of Artificial Intelligence.
- Define probability distributions such as binomial and normal and its applications in ML model development.
- Conduct hypothesis tests such as Z test and t-test and how it is used in ML Model development.
- Explain optimization and linear algebra concepts and their applications in ML and AI.
- Conduct hypothesis testing, optimization and linear algebra using Excel.

This course is suitable for students/practitioners interested in improving their knowledge in the fundamental concepts of Data Science. The course will also prepare the learner for a career in the field of Data Analytics.