Curriculum Vitae

Personal data:

Yan Hongfei, Male
Born: Oct.1973, in Harbin, China
Title: Ph.D.


Working Experience:

  1. Chinese Web Information Retrieval Forum., since June, 2004.
  2. Tianwang Search Engine ( ), since March, 2000 ( C/C++, Solaris/Linux ).
  3. Distance Learning, January, 2000 ( ASP, SQL Server, WinNT and IIS ).
  4. Hospital Information Management System of the fourth Hospital of Beijing, June, 1999 ( Delphi and SQL Server, WinNT and Win98 environment).
  5. Doctor's Advice Management System of Hospital, June, 1998 ( PowerBuilder and SQL Server, WinNT and Win95 environment).
  6. Campus-Wide Information Systems (CWISs) of Xinjiang Finance and Economics Institute, March, 1998 ( PowerBuilder and Ingres, WinNT and Win95 environment).
  7. Hospitalization Insurance System of MuDanJian Society Labor Insurer, September, 1997 ( PowerBuilder and SQL Server, WinNT and Win95 environment).
  8. Multimedia Demonstration System of Harbin Engineering University, September,1996 (Visual Basic, Win3.1 environment).
  9. Multimedia Demonstration System of Harbin High New Technology Industry Development Area, March, 1994 ( DeLi Multimedia Tools and 3DS, DOS environment).


  1. 北京大学信息科学技术学科课程体系研究组,北京大学信息科学技术学科课程体系:清华大学出版社,2008. [yhf part pdf]
  2. 李晓明, 闫宏飞, 王继民, 搜索引擎-原理、技术与系统: 科学出版社, 2005.[link]


  1. 闫宏飞、树柏涵、赵鑫、李晓明,一种社交网络热词和事件挖掘系统及方法,专利授权号:ZL201110434991.0,授权日期:2014年6月11日.
  2. 一种搜索引擎动态摘要提取方法,发明人:闫宏飞、树柏涵、李晓明,专利授权号:ZL200910076485.1,证书号:710623.


  1. Weizheng Chen, Xia Zhang, Jinpeng Wang, Yan Zhang, Hongfei Yan, Xiaoming Li: Non-Linear Smoothed Transductive Network Embedding with Text Information. ACML 2016: 1-16
  2. Xia Zhang, Weizheng Chen, Hongfei Yan: TLINE: Scalable Transductive Network Embedding. AIRS 2016: 98-110
  3. Chong Chen, Edgar Huang, Hongfei Yan: Detecting the association of health problems in consumer-level medical text[J]. Journal of Information Science, 2016: 1-12
  4. Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen: A General SIMD-Based Approach to Accelerating Compression Algorithms. ACM Trans. Inf. Syst. 33(3): 15:1-15:28 (2015)
  5. Weizheng Chen, Jinpeng Wang, Yan Zhang, Hongfei Yan, Xiaoming Li: User Based Aggregation for Biterm Topic Model. ACL (2) 2015: 489-494
  6. Ming Zhang, Jile Zhu, Yanzhen Zou, Hongfei Yan, Dan Hao, and Chuxiong Liu. 2015. Educational Evaluation in the PKU SPOC Course "Data Structures and Algorithms".
    In Proceedings of the Second (2015) ACM Conference on Learning @ Scale (L@S '15). ACM, New York, NY, USA, 237-240. DOI:
  7. Xin Zhao, Yuexin Wu, Hongfei Yan, Xiaoming Li: Group based Self Training for E-Commerce Product Record Linkage. COLING 2014: 1311-1321
  8. 张旭东,孙志明,刘亚宁,单栋栋,闫宏飞:基于 64 位体系结构的倒排索引压缩算法[J]. 计算机工程, 2014, 40(2): 71-76
  9. 江翰,赵鑫,吴悦昕,闫宏飞. 基于语义查询扩展的产品评论检索. CCIR2014,2014年8月8-10日,云南,昆明.(推荐到《计算机科学与探索》发表)
  10. 吴悦昕,赵鑫,过岩巍,闫宏飞. 在线游戏用户的流失预测:基于不平衡数据的采样方法比较. CCIR2014,2014年8月8-10日,云南,昆明.(推荐到《中文信息学报》发表

  11. 张旭东,单栋栋,毛先领,赵鑫,闫宏飞:基于指令级并行的倒排索引压缩算法. CCIR 2013(推荐到《计算机研究与发展》发表)
  12. 过岩巍,吴悦昕,赵鑫,闫宏飞,黄建兴:网络游戏案例研究:用户行为分析和流失预测. CCIR 2013(推荐到《中文信息学报》发表)
  13. 刘亚宁,严睿,闫宏飞:基于用户偏好与语言模型的个性化引文推挤. CCIR 2013(推荐到《中文信息学报》发表)
  14. 陈维政,严睿,闫宏飞,李晓明:利用维基百科实体增强基于图的多文档摘要. CCIR 2013(推荐到《中文信息学报》发表)
  15. Jinpeng Wang, Wayne Xin Zhao, Haitian Wei, Hongfei Yan, Xiaoming Li: Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs. EMNLP 2013: 1337-1347
  16. Xudong Zhang, Wayne Xin Zhao, Dongdong Shan, Hongfei Yan: Group-Scheme: SIMD-based compression algorithms for web text data. BigData Conference 2013: 525-530
  17. Ya'ning Liu, Rui Yan, Hongfei Yan: Guess What You Will Cite: Personalized Citation Recommendation Based on Users' Preference. AIRS 2013: 428-439
  18. Yang Lu, Wayne Xin Zhao, Hongfei Yan, Xiaoming Li: A Metric Learning Based Approach to Evaluate Task-Specific Time Series Similarity. WAIM 2013: 314-325

  19. Xin Zhao, Rishan Chen, Kai Fan, Hongfei Yan, Xiaoming Li: A Novel Burst-based Text Representation Model for Scalable Event Detection. ACL (2) 2012: 43-47
  20. Dongdong Shan, Wayne Xin Zhao, Rishan Chen, Baihan Shu, Ziqi Wang, Junjie Yao, Hongfei Yan, Xiaoming Li: EventSearch: a system for event discovery and retrieval on multi-type historical data. KDD 2012: 1564-1567
  21. Xianling Mao, Jing He, Hongfei Yan, Xiaoming Li: Hierarchical topic integration through semi-supervised hierarchical topic modeling. CIKM 2012: 1612-1616
  22. Xianling Mao, Zhaoyan Ming, Zheng-Jun Zha, Tat-Seng Chua, Hongfei Yan, Xiaoming Li: Automatic labeling hierarchical topics. CIKM 2012: 2383-2386
  23. Xianling Mao, Zhaoyan Ming, Tat-Seng Chua, Si Li, Hongfei Yan, Xiaoming Li: SSHLDA: A Semi-Supervised Hierarchical Topic Model. EMNLP-CoNLL 2012: 800-809
  24. Dongdong Shan, Shuai Ding, Jing He, Hongfei Yan, Xiaoming Li: Optimized top-k processing with global page scores on block-max indexes. WSDM 2012: 423-432

  25. Dongdong Shan, Wayne Xin Zhao, Jing He, Rui Yan, Hongfei Yan, Xiaoming Li: Efficient phrase querying with flat position index. CIKM 2011: 2001-2004
  26. Yang Lu, Jing He, Dongdong Shan, Hongfei Yan: Recommending citations with translation model. CIKM 2011: 2017-2020
  27. Xianling Mao, Xiaobing Liu, Nan Di, Xiaoming Li, Hongfei Yan. SizeSpotSigs: An Effective Deduplicate Algorithm Considering the Size of Page Content. PAKDD 2011: 537-548
  28. Jing He, Wayne Xin Zhao, Baihan Shu, Xiaoming Li, Hongfei Yan. Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation. SIGIR 2011: 275-284
  29. Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan and Xiaoming Li. Comparing Twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Information Retrieval (ECIR'11) (full paper), pages 338-349, 2011. (20% acceptance)
  30. 赵东生, 单栋栋, 闫宏飞, "基于查询词出现的相关度改进," 情报学报, vol. 30, no. 4, 2011 [pdf]

  31. 毛先领,何靖,闫宏飞,“网页去噪:研究综述”, 计算机研究与发展 . 2010, (12)
  32. Xin Zhao, Jing Jiang, Jing He, Dongdong Shan, Hongfei Yan, Xiaoming Li. Context Modeling for Ranking and Tagging Bursty Features in Text Streams. CIKM’10. Toronto, Canada, October 25-29.(poster)
  33. Xin Zhao, Jing Jiang, Hongfei Yan, Xiaoming. Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid. in the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP'10), October 9-11, 2010, MIT, Massachusetts, USA.
  34. Jing He, Baihan Shu, Xiaoming Li, and Hongfei Yan, Effective Time Ratio: A Measure for Web Search Engines with Document Snippets , AIRS 2010: 73-84
  35. Chong Chen, Jing He, Dongdong Shan, Hongfei Yan, Optimize document identifier assignment for inverted index compression,Journal of Computational Information Systems, v 6, n 2, p 339-346, February 2010 (EI:20103913260766) [pdf]
  36. Chong Chen, Feng Li, Xianling Mao, Jing He, Hongfei Yan, Design and Implementation for Literature Search and Impact-based Summaries, in the proceeding of 2010 International Conference on Intelligent Systems and Knowledge Engineering, Nov.15-16 Hangzhou, China
  37. 陈翀,李峰,毛先领,何靖,闫宏飞, "文献搜索与基于影响的摘要系统的设计与实现," 广西师范大学学报, vol.28 no.1, 2010, pp. 135-138.

  38. [Shan, et al.,2009] Dongdong Shan, Dongsheng Zhao, Jing He, and Hongfei Yan, "PARADISE Based Search Engine at TREC 2009 Web Track," in TREC 2009. [pdf]
  39. 树柏涵 and 闫宏飞, "搜索引擎动态摘要算法," 郑州大学学报, vol. 41, pp. 56-59, 2009. (SEWM 2009 会议论文).
  40. 陈翀,闫宏飞, 网络资源命名及用户命名行为的分析, 情报学报, vol. 28, no. 4, pp. 582-592, 2009.

  41. C. Chen, H. Yan, and X. Li, "Classifying Digital Resources in a Practical and Coherent Way with Easy-to-Get Features," in Proceedings of the 7th International Conference on Practical Aspects of Knowledge Management. Yokohama, Japan: Springer-Verlag, 2008, pp. 185-196. [pdf]
  42. 闫宏飞 and 陈翀, "《计算概论》课程的教学实践与体会," presented at 第四届"大学计算机课程报告论坛, 中国武汉, 2008. [pdf]
  43. H. Yan, C. Chen, B. Peng, and X. Li, "On the Construction of a Large Scale Chinese Web Test Collection," presented at the Asia Information Retrieval Symposium 2008 (AIRS'08), Harbin, China, 2008. (full paper, 27% acceptance) [pdf]
  44. 李静静 and 闫宏飞, "中文网页信息检索评测集的构建、分析及应用," 中文信息学报, vol. 22, pp. 30-36, 2008. [pdf]

  45. T. Meng and H. Yan, "On the Peninsula Phenomenon in Web Graph and Its Implications on Web Search," Computer Networks, vol. 51(1), pp. 177-189, 2007.

  46. [Li and Yan,2006] J. Li, and H. Yan, "Peking University at the TREC 2006 Terabyte Track," in TREC 2006.
  47. 孟涛, 王继民, and 闫宏飞, "网页变化和增量技术及研究进展," 软件学报, vol. 17, No. 5, pp. 1051-1067, 2006.

  48. H. Yan, J. Li, J. Zhu, and B. Peng, "Tianwang Search Engine at TREC 2005: Terabyte Track," in TREC 2005.
  49. 闫宏飞 and 陈翀, "词汇与中心词的距离信息对问句相似度匹配的影响," 清华大学学报自然科学版, vol. 45,No.S1, pp. 1873-1877, 2005.
  50. 彭波 and 闫宏飞, "搜索引擎检索系统质量评估," 计算机研究与发展, pp. 1706-1711, 2005.
  51. 孟涛, 闫宏飞, and 王继民, "Web网页信息变化的时间局部性规律及其验证," 情报学报, vol. 24,No. 4, pp. 398-406, 2005b.
  52. 孟涛, 闫宏飞, and 王继民, "一个增量搜集中国Web的系统模型及其实现," 清华大学学报自然科学版, vol. 45,No.S1, pp. 1882-1886, 2005a.
  53. 陈翀, 彭波, and 闫红飞, "一种词汇共现算法及共现词对检索系统排序的影响," 清华大学学报自然科学版, vol. 45,No.S1, pp. 1857-1860, 2005.

  54. L. E. Huang, H. F. Yan, and X. M. Li, "Engineering of Web InfoMall: The Chinese Web Archive," presented at proceedings of the world engineerings' convention 2004, Shanghai, China, 2004. [jpg] page1, page2, page3, page4, page5, page6.
  55. H. F. Yan, L. N. Huang, C. Chen, and Z. M. Xie, "A New Data Storage and Service Model of China Web InfoMall," presented at the 4th International Web Archiving Workshop (IWAW04) of 8th European Conference on Research and Advanced Technologies for Digital Libraries (ECDL08), Bath, UK, 2004.[pdf]
  56. C. Chen, H. F. Yan, and X. M. Li, "CDAL: A Scalable Scheme for Digital Resource Reorganization," presented at The 3rd International Conference on Web-Based Learning (ICWL2004), Tsinghua University, Beijing, China, 2004. (LNCS 3143, Springer Berlin Heidelberg New York, 2004. ISSN 3-540-22542-0.).[pdf]
  57. T. Meng, H. F. Yan, J. M. Wang, and X. M. Li, "The Evolution of Link-Attributes for Pages and Its Implications on Web Crawling," presented at the 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI'04), Beijing, 2004.[pdf]
  58. J. J. Zhu and H. F. Yan, "A Model for Analyzing the Web of Multiple Dimensions and Its Application," Journal of the China Society for Scientific and Technical Infomation, vol. 23, pp. 553-560, 2004. (朱家稷, 闫宏飞, 一种Web多维分析模型及应用, 《情报学报 》, vol. 23, pp. 553-560, 2004.).[pdf]
  59. X. M. Li, Z. M. Xie, L. Sun, and H. F. Yan, "Web InfoMall: the Concept and Design for a Mass Web Pages Storage System," accepted by the special book of China national key research grant, 2004.

  60. X. M. Li, J. J. Zhu, and H. F. Yan, "A Model for Collecting and Processing Topical Information in the Web and Its Application," Journal of Computer Research and Development (in Chinese), vol. 40, pp. 1667-1671, 2003. (李晓明,朱家稷,闫宏飞,互联网上主题信息的一种收集与处理模型及其应用,《计算机研究与发展》2003年12月 Vol.40 No.12). [pdf]
  61. T. Meng, H. F. Yan, and X. Li, "An Evaluation Model on Information Coverage of Search Engines," ACTA Electronica Sinaca, vol. 31, pp. 1168-1172, 2003. (孟涛,闫宏 飞,李晓明,一种评价搜索引擎信息覆盖率的模型及其验证,《电子学报》2003(8) Vol.31 No.8).[pdf]

  62. H. F. Yan and X. Li, "On the Structure of Chinese Web 2002," Journal of Computer Research and Development, vol. 39, pp. 958-967, 2002. (闫宏飞,李晓明,关于中国Web的大 小、形状和结构,《计算机研究与发展》2002年8月).[pdf]

  63. H. F. Yan, J. Y. Wang, and X. M. Li, "A dynamically reconfigurable model for a distributed Web crawling system," presented at International Conference on Computer Networks and Mobile Computing, Beijing, China, 2001b.[pdf]
  64. H. F. Yan, J. Y. Wang, X. M. Li, and L. Guo, "Architectural design and evaluation of an efficient Web-crawling system," presented at Proceedings of 15th International Parallel and Distributed Processing Symposium, San Francisco, California, USA, 2001a. (Also published in Journal of Systems and Software, vol. 60, pp. 185-193, Feb 15, 2002.).[pdf]