![]() |
Web Based Information Architectures |
| Section | Topics | Who | Slides | Readings | Date | HW&Project |
|
Introduction
|
Introduction | Li | [pdf-large] [pdf-small] |
Sept. 8 | ||
|
Web Information Crawling and Web Properties |
Crawling the Web 1. basic www technologies 2. high-performance crawling 3. parallel crawling |
Peng | [ppt-zip] [pdf-zip] [mp3-1,-2] |
MIW Ch.2, MW Ch.2 High-Performance Web Crawling (2001) Parallel Crawlers (2002) |
Sept. 16 | |
| Advanced Crawling Techniques 1. focused crawling 2. web dynamics 3. hidden web |
Peng | [ppt-zip] [pdf-zip] |
MIW ch.6, MW ch8.1,8.3 Crawling the HiddenWeb(2001) Focused crawling: a new approach to topic-specific Web resource discovery (1999) Effective page refresh policies for Web crawlers(2003) |
Sept. 23 | ||
| Modeling the Web 1. web graphs 2. generative models |
Peng | [ppt-zip] [pdf-zip] [mp3-1,-2, -3] |
MIW ch.3 Estimating the Relative Size and Overlap of Public Web Search Engines(1998) Graph structure in the web(2000) |
Sept. 30 | Reading Assignment | |
|
Web Information Processing |
Information Retrieval 1. modeling 2. Application |
Li | [ppt-zip] [pdf-zip] [mp3] |
MW ch3.2, MIW ch4.3,4.4,4.5 MIR ch.2 ch.3 |
Oct. 14 | HW 1 Assignment |
| Index and query techniques 1. inverted file index 2. index construction 3. querying |
Peng | [ppt-zip] [pdf-zip] [mp3] |
MG ch3.2 ch3.3 ch4.1-4.4 ch.5 MIR ch.7 ch.8 |
Oct. 21 | Announcement | |
| Information Extraction 1. named entity recognition 2. template-filling 3. hidden markov models |
Li | [ppt-zip] [pdf-zip] [mp3] |
Oct. 28 | Due date of HW1 | ||
| Text Categorization I 1. Feature Selection 2. Classifiers (DT,Rocchio,NB,kNN) |
Wang | [ppt-full] [pdf-small] [mp3-1,-2] |
MW Ch5.1--5.6, MIW ch4.6 Machine Learning in Automated Text Categorization. A comparative study on feature selection in text categorization |
Nov. 4 | ||
| Text Categorization II 1. Classifiers(Nnet,SVM) 2. Threshold strategy 3. Evaluation |
Wang | [ppt-full] [pdf-small] [mp3] |
MW Ch5.7--5.10, MIR Ch3 A Study on Thresholding Strategies for Text Categorization. A re-examination of text categorization methods |
Nov.11 | ||
| Text Clustering I 1. Partitioning methods 2. Hierarchical methods 3. Density-based methods 4. LSI/SVD and its applications |
Wang | [ppt-full] [pdf-small] [mp3-1,-2] |
MW Ch4.1-4.3, MIW Ch 4.5, 4.8 Data Clustering A Review Using linear algebra for information retrieval |
Nov.18 | ||
| Text Clustering II 1. Criterion function for Clustering 2. On-line clustering 3. Collaborative filtering |
Wang | [ppt-full] [pdf-small] [mp3-1] |
MW Ch4.5, MIW ch.8 Learning to Cluster Web Search Results Web Document Clustering: A Feasibility Demonstration |
Nov. 25 | HW 2 Assignment | |
| Information Retrieval 1. retrieval evaluation 2. test collection |
Li | [ppt-zip] [pdf-zip] [mp3-1,2] |
Dec. 2 | |||
|
Web Information Mining
|
Social Network Analysis 1. prestige model 2. PageRank & HITS algorithm |
Li | [ppt-full] [pdf-small] [mp3] |
MW ch.7 | Dec. 9 | |
| Web Mining 1. Link analysis 2. modeling and understanding human behavior on the web |
Wang | [ppt-full] [pdf-small] [mp3-1,-2] |
MIW ch.5, ch7 A novel Web usage mining approach for search engines |
Dec. 16 | Dec.18 Due date of HW2 |
|
|
Reports
|
Readings Reports | Li | [ppt] [ppt-all] |
Dec. 23 | member list | |
| Project Reports | Li | [ppt-all] | Dec. 30 | member list |