• I am a researcher at Huawei Noah's Ark Lab.
  • I focus on building cooperative multi-agent systems using Deep Reinforcement Learning.
  • This page will no longer be updated.
  • Education Experiences

    Ph.D. @ PKU
    Sept. 2015 - Jul. 2020

  • School of Electronics Engneering and Computer Science
  • Institute of Network Computing and Information Systems
  • Advisor: Prof. Zhen Xiao
  • Research Focus: Multi-agent Deep Reinforcement Learning
  • Ranking 1st among all students (including master and Ph.D.) in our class
  • B.S. @ NUAA
    Sept. 2011 - Jul. 2015

  • School of Computer Science and Technology
  • Department of Software Engineering
  • Ranking 1st among all students in our department
  • Recommended to Peking University (PKU)
  • Selected Researches

    Research Intern
    2012-Noah's Ark Lab
    Mentor: Wulong Liu, Jianye Hao,
    Jun Wang & Jun Luo
    Jun. 2019 - Oct. 2019

    Neighborhood Cognition Consistent Multi-agent Reinforcement Learning.
  • I was awarded the Third Prize of Innovation Pioneer Award on December 26, 2019 by Huawei Central Research Institute (the second-level department of Huawei).
    1. Of all the people who were awarded the Innovation Pioneer Award, I am the only intern.
    2. Including the First, Second and Third Prize, there are only 13 awards in total at Huawei Noah's Ark Lab (the third-level department of Huawei).

  • The first author of NCC-MARL. Oral @AAAI20, top 5%.
    1. NCC-MARL is a general RL framework to handle large-scale multi-agent cooperative problems.
    2. We notice that agents maintain consistent cognitions about their environments are crucial for achieving effective system-level cooperation. In contrast, it is hard to imagine that the agents without consensuses on their situated environments can cooperate well.
    3. NCC-MARL decomposes all agents into much smaller neighborhoods. Furthermore, we assume that each neighborhood has a true hidden cognitive variable, then all neighboring agents learn to align their learned neighborhood-specific cognitive representations with this true hidden cognitive variable by variational inference. As a result, all neighboring agents will eventually form consistent neighborhood cognitions, and thus achieve effective cooperations.
    4. NCC-MARL achieves much better performance than many baselines, e.g., VDN, QMIX, MADDPG and ATT-MADDPG.
  • Research Intern & Team Leader
    2012-NGIP Lab
    Mentor: Zhibo Gong
    Jan. 2017 - Jan. 2019

    Distributed Reinforcement Learning-based Network Traffic Control.
  • This is a one-million-CNY level project. It concludes with ''EXCELLENCE'' in January 8, 2019.

  • The first author of ACCNet.
    1. ACCNet is a general RL framework to learn the beneficial communication messages among multiple distributed agents (e.g., routers).
    2. It shares a similar idea as the well-known MADDPG: the centralized critic (or coordinator) can stabilize the training of multiple agents, while the independent actor is suitable for distributed execution.
    3. Some ideas informally proposed in this work (e.g., concurrent experience replay and current episode experience replay) are also found useful by other researchers.
  • The first author of ATT-MADDPG.
    1. ATT-MADDPG is a special RL framework to explicitly model the dynamic joint policy of teammates in an adaptive manner.
    2. It enhances MADDPG with a special attention mechanism, which introduces a good inductive bias on the network structure to approximate a multi-agent Q-value function. This shares a similar spirit as the renowned single-agent Dueling Network.
    3. ATT-MADDPG not only outperforms the state-of-the-art RL-based methods and rule-based methods by a large margin, but also achieves better scalability and robustness.
  • The first author of Gated-ACML.
    1. Gated-ACML is an RL framework to learn the beneficial communication messages among multiple distributed agents (e.g., routers) under limited-bandwidth restriction.
    2. It introduces a gating mechanism to prune unprofitable messages adaptively to control the message quantity around a desired threshold.
    3. The proposed gating mechanism can prune a lot of messages with little impact on performance. Moreover, it is not specifically tailored to any specific DRL architecture, namely, it is applicable to several DRL methods. As far as we know, it is the first formal method to achieve this.
  • The first author of a patent.
    1. A framework of sparse communication learning with model-free distributed cooperative multi-agent environment. Patent Number: 201811505121.6.
    2. It combines the ideas of Gated-ACML and ATT-MADDPG.
  • Research Intern
    2012-NGIP Lab
    Mentor: Zhibo Gong
    Mar. 2016 - Apr. 2017

    Key Algorithms of Non-blocking Network.
  • The third author of a patent.
    1. A method of multiple-path flow distributary based on buffering queue model. Patent Number: 201610915269.1.
    2. We add a buffer in edge-routers so that the famous Flowlet can work. Multiple such routers are connected in series in our model.
    3. I am mainly responsible for the analysis of model complexity such as router queue length and average packet waiting time, based on queueing theory.
  • The second author of a patent.
    1. A method of network congestion propagation and traffic balancing based on Fourier transform. Patent Number: 201710681276.4.
    2. We use frequency-domain values to describe the original time-domain waveform statistics, resulting in a compression rate of 90%.
    3. I am mainly responsible for the design of interconversion between frequency-domain values and time-domain values.
  • The main proposer of "an elephant flow detection algorithm based on Hash and Count Sketch", which draws on ideas from ensemble methods.
  • The designer of "a traffic scheduling scheme based on edge decision", which is a variant of TeXCP.

  • Data Analysis Intern
    Internal Audit Department
    Mentor: Pengfei Ye
    Jun. 2015 - Sept. 2015

    ARPU-value Prediction and Verification.
  • The main participant of a tree-based ARPU-value predictor.
    1. I am mainly responsible for designing and implementing a decision tree regressor, which can predict the future order quantity of air ticket and hotel. The predicted quantity can be further used to verify the results of third-parties. The large difference between two data sources indicates that order quantity from the third-parties may be fraudulent.
    2. Before the end of the internship, this model has been used in the company with a desirable prediction performance.
    3. We choose decision tree as our initial model because it is more interpretable, and it is less sensitive to feature’s value and category.
  • Selected Publications

  • DAACMP: Learning Multi-agent Communication with Double Attentional Deep Reinforcement Learning.  
           Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong and Yan Ni.   Journal of AAMAS 2020, regular paper.
  • NCC-MARL: Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning.  
           Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang and Zhen Xiao.   AAAI 2020, regular paper with oral presentation (top 5%).
  • Gated-ACML: Learning Agent Communication under Limited Bandwidth by Message Pruning.  
           Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong and Yan Ni.   AAAI 2020, regular paper with poster presentation, acceptance rate: 1591/7737=20.6%.
  • ATT-MADDPG: Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG.  
           Hangyu Mao, Zhengchao Zhang, Zhen Xiao and Zhibo Gong.   AAMAS 2019, regular paper, acceptance rate: 189/781=24.2%.
  • TSTR: Topic-specific Retweet Count Ranking for Weibo.  
           Hangyu Mao, Yang Xiao, Yuan Wang, Jiakang Wang and Zhen Xiao.   PAKDD 2018, regular paper, acceptance rate: 57/592=9.6%.
  • ACCNet: Actor-Coordinator-Critic Net for Learning-to-Communicate with Deep Multi-agent Reinforcement Learning.  
           Hangyu Mao, Zhibo Gong, Yan Ni and Zhen Xiao.   Arxiv Preprint 2017.
  • PIFB: Identifying Users' Professions via the Microblogs They Forward.  
           Yuan Wang, Hangyu Mao and Zhen Xiao.   SocInf Workshop on IJCAI 2017.
  • PROLTS: Predicting Restaurant Consumption Level through Social Media Footprints.  
           Yang Xiao, Yuan Wang, Hangyu Mao and Zhen Xiao.   COLING 2016.
  • Selected Patents

  • 一种基于深度强化学习的集群资源管理和任务调度方法及系统
  • 一种基于多智能体深度强化学习的集群资源调度方法
  • 一种基于邻域认知一致性的多智能体协作方法
  • 一种在无模型分布式合作多智能体环境下学习稀疏通讯的框架
  • 一种网络流量监测方法及网络设备
  • 一种多路径流量发送的方法及装置
  • Selected Awards

  • 北京市普通高等学校优秀毕业生 (Ranking 1st among all students in our class)        北京市教育委员会; Jul. 2020
  • Excellent Graduate (Ranking 1st among all students in our class)        PKU; Jun. 2020
  • Excellent Intern (Top 10/~200)        Huawei; May 2020
  • Student Travel Scholarship        AAAI 2020
  • Ming-Lue (明略) Technology Innovation Scholarship        NC&IS, EECS, PKU; Dec. 2019
  • Third Prize of Innovation Pioneer Award (Top 13/~500, the only intern)        Huawei; Dec. 2019
  • Award for Scientific Research (6/~100)        PKU; Dec. 2019
  • Student Travel Scholarship        AAMAS 2019
  • Miao-Zhen (秒针) Innovation Scholarship        NC&IS, EECS, PKU; Dec. 2018
  • Tian-Chuang (天创) Scholarship        EECS, PKU; Dec. 2018
  • Award for Scientific Research (9/~100)        PKU; Dec. 2018
  • Outstanding Communist        EECS, PKU; Jun. 2018
  • Zhi-Tang (智唐) Technology Scholarship        NC&IS, EECS, PKU; Dec. 2017
  • Award for Scientific Research (7/~100)        PKU; Dec. 2017
  • National Scholarship (Top 1%)        NUAA; Dec. 2014
  • First Prize of the 9th Programming Competition of NUAA (3/~150)        NUAA; Apr. 2014
  • Pacemaker to Merit Student (Top 1%)        NUAA; Dec. 2013
  • Finalist Award of the 3rd "Zhongke Cup" National Software Design Competition (Top 60/763)        CSIA & ISCAS; Nov. 2013
  • Special Annual Award for Knowing and Doing (Top 0.5%)        NUAA; Dec. 2012
  • Professional Activities & Services

  • PC member of IJCAI 2020.
  • PC member of ECAI 2020.
  • Invited Speaker of the 1st Distributed AI Conference.
  • Reviewer of IEEE Transactions on Cybernetics.
  • Volunteer of AAMAS 2019.
  • Invited Speaker of the 1st Huawei Multi-agent Reinforcement Learning Workshop.
  • Please do not hesitate to email me for any research purposes: hy.mao@pku.edu.cn.
  • Miscellaneous

  • Social Experience: I served as the secretary (or committee member) of the Student Party Branch from 2012 to 2019, during which we held many activities such as the Introduction of Web 2.0 and the Introduction of AI. I was awarded the title of Outstanding Communist by the School of EECS, PKU in June 2018.
  • Programming Skill: Python (preferred), Tensorflow (preferred), RL/MARL, NMT, GNN, VAE, LDA, GAN, FM/FFM, xgBoost/lightGBM, Scikit-learn, C/C++, Go.
  • English Skill: I passed the College English Test-4 (CET-4) with a score of 531 in Jun. 2012, and CET-6 with a score of 506 in Dec. 2012.
  • Daily Pastime: I enjoy running, traveling, reading, swimming, badminton, table tennis, skating, driving and guitar. I accomplished the Beijing Half-marathon in 2016 and 2017.