国家自然科学基金重点项目“WEB搜索与挖掘的新理论与方法”(项目编号:60933004)
2010.1-2012.12, 200万,李晓明负责
课题研究内容为:
:瞄准Web 3.0 的智能搜索引擎,深入研究Web 搜索与挖掘的新理论和方法:1) 研究Web 的性质与演化规律,探索Web 信息的基本特征,提出Web 信息搜集的新模式和方法;2) 研究Web 信息的语义表示与推理,探索异构Web 信息关联、集成与重构的逻辑基础,提出Web 异构信息的语义模型;3) 研究Web 数据的挖掘与组织,从结构、内容和用户行为诸方面探索Web 信息的模式,针对Web 信息的异构性和时态性,建立支持高效访问的数据组织方式;4)针对Web 信息海量性的特点,研究分布并行挖掘理论与方法,为潜在的实际应用提供高性能算法和支持环境;5) 研究图像视频的语义自动标注,综合利用概念语义与关联信息,提高对Web 图像视频理解的能力,提出Web 多模态检索的新方法。 在上述理论研究成果的基础上,研制Web 智能搜索引擎原型系统,验证本项目研究中提出的新理论和新方法。
Papers
-
Rishan Chen,Jiaji Zhu,Wenhui Yao,Jian Zhou,Bo Peng. Majority Consensus Protocol for Strong Consistency and High Availability, 6-th Annual Conference of China Grid.
-
Dongdong Shan, Wayne Xin Zhao, Jing He, Rui Yan, Hongfei Yan, Xiaoming Li: Efficient phrase querying with flat position index. CIKM 2011: 2001-2004
-
Yang Lu, Jing He, Dongdong Shan, Hongfei Yan: Recommending citations with translation model. CIKM 2011: 2017-2020
-
Rui Yan, Xiaojun Wan, Jahna Otterbacher, Liang Kong, Xiaoming Li and Yan Zhang.
Evolutionary Timeline Summarization: a Balanced Optimization Framework via Iterative Substitution.
In Proceedings of the 34th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2011), pages 745-754, Beijing, China, July 24-28, 2011.
-
Rui Yan, Liang Kong, Congrui Huang, Xiaojun Wan, Xiaoming Li and Yan Zhang.
Timeline Generation through Evolutionary Trans-Temporal Summarization.
In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2011),
pages 433-443, Edinburgh, United Kingdom, July 27-31, 2011.
-
Rui Yan, Jian-Yun Nie, and Xiaoming Li.
Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization.
In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2011),
pages 1342-1351, Edinburgh, United Kingdom, July 27-31, 2011.
-
一种基于文档重要度的静态索引剪枝方法,
李晓明,单栋栋
华南理工大学学报(自然科学版)
Vol. 39 No.4, 2011
-
基于用户浏览时间的点击模型
何靖,袁文清、闫宏飞
华南理工大学学报(自然科学版)
Vol. 39 No.4, 2011
-
Xin Zhao, Jing Jiang, Jing He, Yang Song, Palakorn Achanauparp, Ee-Peng Lim and Xiaoming Li. Topical keyphrase extraction from Twitter. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT'11) (long paper), pages 379-388, 2011. (26% acceptance)
-
Jing He, Wayne Xin Zhao, Baihan Shu, Xiaoming Li, Hongfei Yan. Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation. SIGIR 2011: 275-284
-
Xianling Mao, Xiaobing Liu, Nan Di, Xiaoming Li, Hongfei Yan.
SizeSpotSigs: An Effective Deduplicate Algorithm Considering the Size of Page Content.
PAKDD 2011: 537-548
-
Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan and Xiaoming Li.
Comparing Twitter and traditional media using topic models.
In Proceedings of the 33rd European Conference on Information Retrieval (ECIR'11) (full paper),
pages 338-349, 2011. (20% acceptance)
- Lian'en Huang; Xiaoming Li; , "HisTrace: A system for mining on news-related articles instead of web pages," Web Society (SWS), 2010 IEEE 2nd Symposium on , vol., no., pp.30-37, 16-17 Aug. 2010
- Jing He, Baihan Shu, Xiaoming Li, and Hongfei Yan,
Effective Time Ratio: A Measure for Web Search Engines with Document Snippets
AIRS 2010, 2010.
- Geng Li,Bo Peng,
Improving Range Query Performance on Historic Web Page Data.
Chinagrid2010. Guangzhou, China, July 16-18, 2010
- Chong Chen, Feng Li, Xianling Mao, Jing He, Hongfei Yan,
Design and Implementation for Literature Search and Impact-based Summaries,
in the proceeding of 2010 International Conference on Intelligent Systems and Knowledge Engineering (ISKE2010),
Nov.15-16 Hangzhou, China
- Chong CHEN, Jing HE, Dongdong SHAN, Hongfei YAN,
Optimize Document Identifier Assignment for Inverted Index Compression,
in proceeding of the 2010 International Conference on Web Information Systems and Mining (WISM'10),
Sanya, China. Also published in Journal of Computational Information Systems
- Xin Zhao, Jing Jiang, Jing He, Dongdong Shan, Hongfei Yan,Xiaoming Li, "Context Modeling for Ranking and Tagging Bursty Features in Text Streams", CIKM2010 (poster)
- Xin Zhao, Jing Jiang, Hongfei Yan,Xiaoming Li, "Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid",in the Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP'10), pages 56–65, MIT, Massachusetts, USA, 9-11 October 2010.
- 何靖,陈翀,闫宏飞,开放域问答系统研究综述, 第六届全国信息检索学术会议(CCIR2010), 2010年8月12-15日,黑龙江镜泊湖.
- Yi Li, Xiaoming Li and Jonathan J.H Zhu,
PPS Sampling of Web Graph Using Preferential Jumping Strategy,
in the proceeding of 2010 2nd IEEE Symposium on Web Society,
August 16-17,2010
- Yi Li, Jonathan J. H. Zhu, Xiaoming Li, "A Survey of Major Techniques for Combating Link Spamming," Journal of Information & Computational Science 7: 2 (2010) 439-446.
- 毛先领,何靖,闫宏飞,“网页去噪:研究综述”,计算机研究与发展 (to appear)
- [Shan, et al.,2005] Dongdong Shan, Dongsheng Zhao, Jing He, and Hongfei Yan, "PARADISE Based Search Engine at TREC 2009 Web Track," in TREC 2009.
- [陈翀, et al.,2010] 陈翀,李峰,毛先领,何靖,闫宏飞, "文献搜索与基于影响的摘要系统的设计与实现," 广西师范大学学报, vol.28 no.1, 2010, pp. 135-138.