Web Based Information Architectures
Fall 2004

A Graduate Course for Computer Science Major, fall 2004, Peking University
Instructor: Professor LI Xiaoming, lxm@pku.edu.cn


Syllabus:

Section Topics Who Slides Readings Date HW&Project
 

Introduction

 

Introduction Li [pdf-large]
[pdf-small]
  Sept. 8  
 

 

Web Information Crawling and Web Properties

Crawling the Web
1. basic www technologies
2. high-performance crawling
3. parallel crawling
Peng [ppt-zip]
[pdf-zip]
[mp3-1,-2]
MIW Ch.2, MW Ch.2
High-Performance Web Crawling (2001)
Parallel Crawlers (2002)
Sept. 16  
Advanced Crawling Techniques
1. focused crawling
2. web dynamics
3. hidden web
Peng [ppt-zip]
[pdf-zip]
MIW ch.6, MW ch8.1,8.3
Crawling the HiddenWeb(2001)
Focused crawling: a new approach to topic-specific Web resource discovery (1999)
Effective page refresh policies for Web crawlers(2003)
Sept. 23  
Modeling the Web
1. web graphs
2. generative models
 
Peng [ppt-zip]
[pdf-zip]
[mp3-1,-2,
-3]
MIW ch.3
Estimating the Relative Size and Overlap of Public Web Search Engines(1998)
Graph structure in the web(2000)
Sept. 30 Reading Assignment
 

 

 

 

Web Information Processing

Information Retrieval 
1. modeling
2. Application
Li [ppt-zip]
[pdf-zip]
[mp3]
 
MW ch3.2, MIW ch4.3,4.4,4.5
MIR ch.2 ch.3
Oct. 14 HW 1 Assignment
Index and query techniques
1. inverted file index
2. index construction
3. querying
Peng [ppt-zip]
[pdf-zip]
[mp3]
MG ch3.2 ch3.3 ch4.1-4.4 ch.5
MIR ch.7 ch.8
Oct. 21 Announcement
Information Extraction 
1. named entity recognition
2. template-filling
3. hidden markov models
Li [ppt-zip]
[pdf-zip]
[mp3]
  Oct. 28 Due date of  HW1
Text Categorization I
1. Feature Selection
2. Classifiers
(DT,Rocchio,NB,kNN)
Wang [ppt-full]
[pdf-small]
[mp3-1,-2]
MW Ch5.1--5.6, MIW ch4.6
Machine Learning in Automated Text Categorization.
A comparative study on feature selection in text categorization
Nov. 4  
Text Categorization II
1. Classifiers(Nnet,SVM)
2. Threshold strategy
3. Evaluation
Wang [ppt-full]
[pdf-small]
[mp3]
MW Ch5.7--5.10, MIR Ch3
A Study on Thresholding Strategies for Text Categorization.
A re-examination of text categorization methods
Nov.11  
Text Clustering I
1.  Partitioning methods
2.  Hierarchical methods
3.  Density-based methods
4.  LSI/SVD and its applications
Wang [ppt-full]
[pdf-small]
[mp3-1,-2]
MW Ch4.1-4.3, MIW Ch 4.5, 4.8
Data Clustering A Review
Using linear algebra for information retrieval
Nov.18  
Text Clustering II
1. Criterion function for Clustering
2. On-line clustering
3. Collaborative filtering
Wang [ppt-full]
[pdf-small]
[mp3-1]
MW Ch4.5, MIW ch.8
Learning to Cluster Web Search Results
Web Document Clustering: A Feasibility Demonstration
Nov. 25 HW 2 Assignment
Information Retrieval 
1. retrieval evaluation
2. test collection
 
Li [ppt-zip]
[pdf-zip]
[mp3-1,2]
  Dec. 2  

Web Information Mining

 

Social Network Analysis
1. prestige model
2. PageRank & HITS algorithm
 
Li [ppt-full]
[pdf-small]
[mp3]
MW ch.7 Dec. 9  
Web Mining
1. Link analysis
2. modeling and understanding human behavior on the web
Wang [ppt-full]
[pdf-small]
[mp3-1,-2]
MIW ch.5, ch7
A novel Web usage mining approach for search engines
Dec. 16 Dec.18 Due date of  HW2
 

Reports

 

Readings Reports  Li [ppt]
[ppt-all]
  Dec. 23 member list
Project Reports Li [ppt-all]   Dec. 30 member list

 


Last modified: 2005-01-14 11:09:40