语言：英文

课程安排：6月1日-6月8日

课程时间：18:30-20:00

授课地点：交通运输工程学院211室

人数：不超过40人

报名（已结束）：张玉梁 18516191316 邮箱 645023341@qq.com

**Instructor: **Professor Satish Ukkusuri, Full Professor, Purdue University, Contact Email: sukkusur@purdue.edu

Teaching assistant: Yuliang Zhang, Email:645023341@qq.com

**Prerequisite: **Undergraduate calculus, knowledge of probability, statistics and linear algebra at the undergraduate level.

**Day/Time: **June 1-12, 1-2 lectures per day. Each lecture is 50 minutes

**Location: **Room 211, School of Transportation Engineering

**Texts:**

Required handouts will be provided by the instructor.

Students interested in lecture notes can obtain notes from: here

Jiawei Han and Micheline Kamber, Data mining: concepts and techniques, the Morgan Kaufmann series in data management systems, Jim Gray, Series Editor. Morgan Kaufmann Publishers, 2006.

Christopher M. Bishop, Pattern recognition and machine learning, the Morgan Kaufmann series in information science and statistics, M. Jordan, J. Kleinberg, and B. Scholkopf, Series Editor. Springer Science, 2006

**Course Description:**

This course provides an introduction to data mining as applied to transportation systems. Topics will range from basic statistical concepts, data cleaning and data mining techniques with a focus on analysis of large data sets. Students will be asked to apply the concepts to one problem of interest in their research. Extensive use of intuitive arguments and mathematical notation will be used. Students taking this class are expected to have basic knowledge of linear algebra, optimization and statistics.

**Course Objectives:**

A student completing this course is expected to be able to:

1. Prepare and clean datasets for analysis of patterns

2. Apply the concept of regression using datasets

3. Apply the concept of association rules in analyzing datasets

4. Apply the concept of clustering to identify trends within datasets

5. Be aware of the concepts of supervised learning algorithms to analyze data

**Course Scheduling:**

** 1. Introduction: **

** [June 1, PM 6:30-PM 8:00] **

[lecture 1] What is data mining? What makes it a new and unique discipline?

Relationship between Data Warehousing, On-line Analytical Processing, and Data Mining.

** [June 2-3, PM 6:30-PM 8:00] **

[lecture 2&3] Preliminaries of Statistics and Optimization.

** [June 4, PM 3:30-PM 5:00] **

[lecture 4] Data Warehousing and Data mining process: Data preparation/cleansing, task identification.

** 2. Association rule: **

** [June 4, PM 6:30-PM 8:00] **

[lecture 5] Association Rule Mining: Apriori and FP growth.

** 3. Classification: **

** [June 6, PM 6:30-PM 8:00] **

[lecture 6] Bayes classifier

**4. Clustering and supervised learning: **

** [June 7, PM 3:30-PM 5:00] **

[lecture 7] basics of clustering and mixture models.

** [June 7, PM 6:30-PM 8:00] **

[lecture 8] k-means and k-medoids.

** [June 8, PM 3:30-PM 5:00] **

[lecture 9] supervised learning: rule-based classification.

** 5. [June 8, PM 6:30-PM 8:00] **

**[lecture 10] Presentation by instructor on data mining application in transportation**

**Note: Some of these topics may be covered early or late depending on the questions and pace in the class.**

**Format: **Classes will be in a combination of lecture and discussion. Students are expected to participate actively in class discussions. There will be reading assigned for each class and students are expected to be prepared to answer questions.

**Homework: **In class and take home problems will be given. Students are expected to solve these problems. Group activity is acceptable as long as students can write the solutions on their own.

**Student Feedback:**

Throughout the course, students are especially encouraged to bring attention of the professor any difficulties/issues encountered during the lectures. The primary purpose of this is to provide the instructor with continuous feedback on how to improve the classroom-learning environment.

Copyright © 2012 - All Rights Reserved - 杨超教授课题组

地址：上海市曹安公路4800号 邮编：201804