MATH 4323 - Data Science and Statistical Learning - University of Houston

# MATH 4323 - Data Science and Statistical Learning

***This is a course guideline.  Students should contact instructor for the updated information on current course syllabus, textbooks, and course content*

Prerequisites: MATH 3339

Course Description: Course will deal with theory and applications for such statistical learning techniques as maximal marginal classifiers, support vector machines, K-means and hierarchical clustering. Other topics might include: algorithm performance evaluation, cluster validation, data scaling, resampling methods. R Statistical programming will be used throughout the course.

Textbook: While lecture notes will serve as the main source of material for the course, the following book constitutes a great reference:
• ”An Introduction to Statistical Learning (with applications in R)” by James, Witten et al. ISBN: 978-1461471370
Learning Objectives: By the end of the course a successful student should:
• Have a solid conceptual grasp on the described statistical learning methods.
• Be able to correctly identify the appropriate techniques to deal with particular data sets.
• Have a working knowledge of R programming software in order to apply those techniques and subse- quently assess the quality of fitted models.
• Demonstrate the ability to clearly communicate the results of applying selected statistical learning methods to the data.
Software: Make sure to download R and RStudio (which can’t be installed without R) before the course starts. Use the link https://www.rstudio.com/products/rstudio/download/ to download it from the mirror appropriate for your platform. Let me know via email in case you encounter difficulties.

Tentative Course Outline:
• Review: Task of Statistical Learning. Supervised and unsupervised learning. Most ubiquitous statistical learning techniques.
• Support Vector Classifier. Maximal margin classifier: separating hyperplane, support vectors. Non-separable case: support vector classifier.
• Support Vector Machines. Non-linear decision boundaries. Kernels. One-versus-one and one-vs-all classification for K > 2 classes. Evaluating quality of classification.
• Clustering Methods: K-Means. Within-cluster variation. Computing centroids. Multiple starts. Selecting K.
• Clustering Methods: Hierarchical. Agglomerative clustering. Linkage. Interpreting dendrogram. Choice of dissimilarity measure. Data scaling.
• Evaluation of Clustering Solution. Is this a good clustering? Variance explained. Between- and within-cluster variation. Silhouette coefficient.
*Note: Students should visit their instructor's website for Course Structure and Grading Policies, as this information is typically included in the instructor's syllabus. This is a course guideline and is merely an example of what the course structure may be and therefore, should be confirmed with the instructor. Students should contact the instructor for the updated information on current course syllabus, textbooks, and course content.

CSD Accommodations:

Accommodation Forms: Students seeking academic adjustments/auxiliary aids must, in a timely manner (usually at the beginning of the semester), provide their instructor with a current Student Accommodation Form (SAF) (paper copy or online version, as appropriate) from the CSD office before an approved accommodation can be implemented.

Details of this policy, and the corresponding responsibilities of the student are outlined in The Student Academic Adjustments/Auxiliary Aids Policy (01.D.09) document under [STEP 4: Student Submission (5.4.1 & 5.4.2), Page 6]. For more information please visit the Center for Students with Disabilities Student Resources page.

Additionally, if a student is requesting a (CSD approved) testing accommodation, then the student will also complete a Request for Individualized Testing Accommodations (RITA) paper form to arrange for tests to be administered at the CSD office. CSD suggests that the student meet with their instructor during office hours and/or make an appointment to complete the RITA form to ensure confidentiality.

*Note: RITA forms must be completed at least 48 hours in advance of the original test date. Please consult your counselor ahead of time to ensure that your tests are scheduled in a timely manner. Please keep in mind that if you run over the agreed upon time limit for your exam, you will be penalized in proportion to the amount of extra time taken.

UH CAPS

Counseling and Psychological Services (CAPS) can help students who are having difficulties managing stress, adjusting to college, or feeling sad and hopeless. You can reach (CAPS) by calling 713-743-5454 during and after business hours for routine appointments or if you or someone you know is in crisis. No appointment is necessary for the "Let's Talk" program, a drop-in consultation service at convenient locations and hours around campus.