Data Science Series
Data play intimate roles in all aspects of life, personal and professional, from individual online purchases to business and science. The deluge of data generated from many sources, or so-called “Big Data”, present enormous challenges in analyzing them, let alone deriving the insights contained within them. The knowledge required to analyze them are drawn from multiple disciplines, such as statistics, machine learning, information sciences, and computer science, which can take years to master.
The following series of Data Science courses provide hands-on learning and some theoretical backgrounds toward this end. By the end of the series, students should be well equipped to apply cutting edge Data Science techniques, such as Artificial Neural Network, Deep Learning, Reinforcement Learning, Random Forest, Regression of various kinds, Model selection, Predictive Modelling, as well as formulating real-world questions into series of analyses, all the way to some basics of result interpretation. Throughout the series, students will be analyzing real-world data of various fields, such as financial, education, public health, and genetic data, using freely available programs called R and Python.
The Data Science course series is organized into a three-year program. Below are some learning highlights from each course.
Data Science I(Year 1)
In year one students learn to program in R while focusing primarily upon different types of statistical analysis using a variety of datasets for context and interest. The statistical focus includes comparison vs. parametric statistics. Students learn about:
- Problem formulation, different types of statistical models for analyzing data, and how to determine which model is most applicable to the problem posed. Modeling to be explored include:
- linear regression
- linear mixed effect model
- logistic regression & survival analysis
- prediction modeling
- generalized linear regression
Analyzing outputs will require students to learn a series of analysis essentials and techniques, including:
- T-test and the theory behind it as well as how to interpret other values (r, ANOVA)
- Hazard Ratio
- Kaplan Meier curve
- Mann-Whitney U test
- Fisher’s test
- Kolmogorov-Smirnov test
- Basics of Bayesian analysis and the theory behind it, including:
- Bayesian method
- Regression as a Bayesian problem
- Acceptance-rejection method
- Basics of Markov-Chain and/or random walk
- Basics of Markov-Chain Monte-Carlo method
- Missing data imputation
- MICE method
Data Science II Preview(Year two)
In year two students program in Python and the course focuses on machine learning starting with the building blocks, including the basics of information theory, Expectation-Maximization (EM) theory and Bayesian method as applied to supervised learning. Students continue to explore different types of computational learning methods, e.g.,Principal Component Analysis (PCA), network approaches, clustering approaches, ensemble learning, structural equation modeling (SEM), etc. In so doing, students take the logical learning steps for building competencies to understand machine-learning theory and techniques, including: Ridge regression, LASSO, Elastic Net in which they explore various tradeoffs and analytical methods.
Data Science III Preview (Year Three)
In year three students take on Artificial Neural Network (ANN) theory, applications of ANN (e.g., handwriting categorization, voice transcription, text mining, face recognition, time series prediction problems (using perhaps stock / other financial data, etc.) Students learning covers topics such as:
- Boltzmann machine?
- Single vs Multiple layered ANN
- Backpropagation algorithm
- Convergence / optimization issues
- Self Organizing Map
The final focus in the course is on Deep Learning, basic theory, applications (e.g., understanding tensor flow, and vision and natural language processing applications) and exploration of areas including:
- Convoluted Neural Network
- Deep Belief Network
- Recurrent Neural Network
By the conclusion of the series students understand and can use different models, analytical tools and techniques.