参考文献
[1]ABADI M, BARHAM P, CHEN J, et al., 2016. TensorFlow: A system for largescale machine learning[C]//12th USENIX symposium on operating systems design and implementation (OSDI 16).[S.l.:s.n.]:265-283.
[2]ALLGEUER P, BEHNKE S, 2018. Hierarchical and state-based architectures for robot behavior planning and control[J].arXiv preprint arXiv:1809.11067.
[3]AN B, 2017. Game theoretic analysis of security and sustainability[C]//Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI).[S.l.:s.n.]:5111-5115.
[4]ASADA M, STONE P, KITANO H, et al., 1997. The RoboCup physical agent challenge: Goals and protocols for phase I[C]//Robot Soccer World Cup.[S.l.]:Springer:42-61.
[5]ASADA M, STONE P, VELOSO M, et al., 2019. RoboCup: A treasure trove of rich diversity for research issues and interdisciplinary connections[J].IEEE Robotics&Automation Magazine, 26(3):99-102.
[6]BELLO I, PHAM H, LE Q V, et al., 2017. Neural combinatorial optimization with reinforcement learning[C]//ICLR.[S.l.:s.n.].
[7]BERNER C, BROCKMAN G, CHAN B, et al., 2019. Dota 2 with large scale deep reinforcement learning[J].arXiv preprint arXiv:1912.06680.
[8]BERNSTEIN D S, GIVAN R, IMMERMAN N, et al., 2002. The complexity of decentralized control of Markov decision processes[J].Mathematics of Operations Research, 27(4):819-840.
[9]BOND A H, GASSER L, 1988. Readings in distributed artificial intelligence[M].[S.l.]:Morgan Kaufmann.
[10]BROWN N, SANDHOLM T, 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals[J].Science, 359(6374):418-424.
[11]BROWN N, SANDHOLM T, 2019a. Superhuman AI for multiplayer poker[J].Science, 365(6456):885-890.
[12]BROWN N, LERER A, GROSS S, et al., 2019b. Deep counterfactual regret minimization[C]//ICML.[S.l.:s.n.]:793-802.
[13]CAMERON C, CHEN R, HARTFORD J, et al., 2020. Predicting propositional satisfiability via end-to-end learning[C]//AAAI.[S.l.:s.n.]:3324-3331.
[14]CAPPART Q, GOUTIERRE E, BERGMAN D, et al., 2019. Improving optimization bounds using machine learning: Decision diagrams meet deep reinforcement learning[C]//AAAI.[S.l.:s.n.]:1443-1451.
[15]CAPPART Q, CHÉTELAT D, KHALIL E, et al., 2021. Combinatorial optimization and reasoning with graph neural networks[J].arXiv preprint arXiv:2102.09544.
[16]CLAUS C, BOUTILIER C, 1998. The dynamics of reinforcement learning in cooperative multiagent systems[C]//AAAI.[S.l.:s.n.]:746-752.
[17]CONITZER V, SANDHOLM T, 2006. Computing the optimal strategy to commit to[C]//EC.[S.l.:s.n.]:82-90.
[18]DAI H, KHALIL E B, ZHANG Y, et al., 2017. Learning combinatorial optimization algorithms over graphs[C]//NeurIPS.[S.l.:s.n.]:6351-6361.
[19]DASKALAKIS C, GOLDBERG P W, PAPADIMITRIOU C H, 2009. The complexity of computing a Nash equilibrium[J].SIAM Journal on Computing, 39(1):195-259.
[20]DENG Y, YU R, WANG X, et al., 2021. Neural regret-matching for distributed constraint optimization problems[C]//IJCAI.[S.l.:s.n.]:146-153.
[21]FANG F, NGUYEN T H, 2016. Green security games: Apply game theory to addressing green security challenges[J].ACM SIGecom Exchanges, 15(1):78-83.
[22]FENNELL R, LESSER V, 1975. Parallelism in ai problem solving: A case study of hearsay 2[R].[S.l.]:CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.
[23]FIKES R E, NILSSON N J, 1971. STRIPS:A new approach to the application of theorem proving to problem solving[J].Artificial intelligence, 2(3-4):189-208.
[24]FIORETTO F, YEOH W, PONTELLI E, 2017. A multiagent system approach to scheduling devices in smart homes[C]//AAMAS.[S.l.:s.n.]:981-989.
[25]FOERSTER J, FARQUHAR G, AFOURAS T, et al., 2018. Counterfactual multiagent policy gradients[C]//AAAI.[S.l.:s.n.]:2974-2982.
[26]FUKUSHIMA T, NAKASHIMA T, AKIYAMA H, 2018. Mimicking an expert team through the learning of evaluation functions from action sequences[C]//Robot World Cup.[S.l.]:Springer:170-180.
[27]GAO S, OKUYA F, KAWAHARA Y, et al., 2019. Building a computer mahjong player via deep convolutional neural networks[J].arXiv preprint arXiv:1906.02146.
[28]GARCIA J, FERNÁNDEZ F, 2015. A comprehensive survey on safe reinforcement learning[J].Journal of Machine Learning Research, 16(1):1437-1480.
[29]GULCEHRE C, WANG Z, NOVIKOV A, et al., 2020. RL Unplugged: Benchmarks for offline reinforcement learning[J].arXiv e-prints:arXiv-2006.
[30]HE H, DAUMÉ III H, EISNER J, 2014. Learning to search in branch-and-bound algorithms[C]//NeurIPS.[S.l.:s.n.]:3293-3301.
[31]HEINRICH J, SILVER D, 2016. Deep reinforcement learning from self-play in imperfect-information games[J].arXiv preprint arXiv:1603.01121.
[32]HEINRICH J, LANCTOT M, SILVER D, 2015. Fictitious self-play in extensiveform games[C]//ICML.[S.l.:s.n.]:805-813.
[33]HEWITT C, 1977. Viewing control structures as patterns of passing messages[J].Artificial intelligence, 8(3):323-364.
[34]HUHNS M N, 1987. Distributed artificial intelligence[M].[S.l.]:Pitman Publishing Ltd. London, England.
[35]JAIN M, KORZHYK D, VANĚK O, et al., 2011. A double oracle algorithm for zero-sum security games on graphs[C]//AAMAS.[S.l.:s.n.]:327-334.
[36]JIANG Q, LI K, DU B, et al., 2019. DeltaDou: Expert-level doudizhu AI through self-play.[C]//IJCAI.[S.l.:s.n.]:1265-1271.
[37]JOUPPI N P, YOUNG C, PATIL N, et al., 2017. In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th annual international symposium on computer architecture.[S.l.:s.n.]:1-12.
[38]KADURI O, BOYARSKI E, STERN R, 2020. Algorithm selection for optimal multi-agent pathfinding[C]//ICAPS.[S.l.:s.n.]:161-165.
[39]KAIROUZ P, MCMAHAN H B, AVENT B, et al., 2019. Advances and open problems in federated learning[J].arXiv preprint arXiv:1912.04977.
[40]KOOL W, VAN HOOF H, WELLING M, 2018. Attention, learn to solve routing problems![C]//ICLR.[S.l.:s.n.].
[41]KURITA M, HOKI K, 2020. Method for constructing artificial intelligence player with abstractions to markov decision processes in multiplayer game of mahjong[J].IEEE Transactions on Games, 13(1):99-110.
[42]LANCTOT M, ZAMBALDI V, GRUSLYS A, et al., 2017. A unified game-theoretic approach to multiagent reinforcement learning[C]//NeurIPS.[S.l.:s.n.]:4193-4206.
[43]LEDERMAN G, RABE M, SESHIA S, et al., 2020. Learning heuristics for quantified boolean formulas through reinforcement learning[C]//ICLR.[S.l.:s.n.].
[44]LESLIE D S, COLLINS E J, 2006. Generalised weakened fictitious play[J].Games and Economic Behavior, 56(2):285-298.
[45]LI J, KOYAMADA S, YE Q, et al., 2020. Suphx: Mastering mahjong with deep reinforcement learning[J].arXiv preprint arXiv:2003.13590.
[46]LI S, NEGENBORN R R, LODEWIJKS G, 2016. Distributed constraint optimization for addressing vessel rotation planning problems[J].Engineering Applications of Artificial Intelligence, 48:159-172.
[47]LI S, ZHANG Y, WANG X, et al., 2021. CFR-MIX:Solving imperfect information extensive-form games with combinatorial action space[C]//IJCAI.[S.l.:s.n.]:3663-3669.
[48]LITTMAN M L, 1994. Markov games as a framework for multi-agent reinforcement learning[C]//ICML.[S.l.:s.n.]:157-163.
[49]LIU S, LEVER G, WANG Z, et al., 2021. From motor control to team play in simulated humanoid football[J].arXiv preprint arXiv:2105.12196.
[50]LOWE R, WU Y, TAMAR A, et al., 2017. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//NeurIPS.[S.l.:s.n.]:6382-6393.
[51]LYU D, YANG F, LIU B, et al., 2019. SDRL:Interpretable and data-efficient deep reinforcement learning leveraging symbolic planning[C]//AAAI.[S.l.:s.n.]:2970-2977.
[52]MACALPINE P, TORABI F, PAVSE B, et al., 2018. UT Austin Villa: RoboCup 2018 3D simulation league champions[C]//Robot World Cup.[S.l.:s.n.]:462-475.
[53]MCMAHAN B, MOORE E, RAMAGE D, et al., 2017. Communication-efficient learning of deep networks from decentralized data[C]//Artificial intelligence and statistics.[S.l.:s.n.]:1273-1282.
[54]MCMAHAN H B, GORDON G J, BLUM A, 2003. Planning in the presence of cost functions controlled by an adversary[C]//ICML.[S.l.:s.n.]:536-543.
[55]MENDOZA J P, SIMMONS R, VELOSO M, 2016. Online learning of robot soccer free kick plans using a bandit approach[C]//ICAPS.[S.l.:s.n.]:504-508.
[56]MIZUKAMI N, TSURUOKA Y, 2015. Building a computer mahjong player based on monte carlo simulation and opponent models[C]//2015 IEEE Conference on Computational Intelligence and Games (CIG).[S.l.]:IEEE:275-283.
[57]MODI P J, SHEN W M, TAMBE M, et al., 2003. An asynchronous complete method for distributed constraint optimization[C]//AAMAS.[S.l.:s.n.]:161-168.
[58]MORAVČÍK M, SCHMID M, BURCH N, et al., 2017. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker[J].Science, 356(6337):508-513.
[59]MULLER P, OMIDSHAFIEI S, ROWLAND M, et al., 2019. A generalized training approach for multiagent learning[C]//ICLR.[S.l.:s.n.].
[60]NAIR V, BARTUNOV S, GIMENO F, et al., 2020. Solving mixed integer programs using neural networks[J].arXiv preprint arXiv:2012.13349.
[61]NAZARI M, OROOJLOOY A, TAKÁČ M, et al., 2018. Reinforcement learning for solving the vehicle routing problem[C]//NeurIPS.[S.l.:s.n.]:9861-9871.
[62]OCANA J M C, RICCIO F, CAPOBIANCO R, et al., 2019. Cooperative multiagent deep reinforcement learning in a 2 versus 2 free-kick task[C]//Robot World Cup.[S.l.]:Springer:44-57.
[63]PASZKE A, GROSS S, MASSA F, et al., 2019. PyTorch: An imperative style, high-performance deep learning library[J].NeurIPS, 32:8026-8037.
[64]PRATES M, AVELAR P H, LEMOS H, et al., 2019. Learning to solve NP-complete problems: A graph neural network for decision TSP[C]//AAAI.[S.l.:s.n.]:4731-4738.
[65]RABINOVICH Z, GOLDMAN C V, ROSENSCHEIN J S, 2003. The complexity of multiagent systems: The price of silence[C]//AAMAS.[S.l.:s.n.]:1102-1103.
[66]RASHID T, SAMVELYAN M, SCHROEDER C, et al., 2018. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//ICML.[S.l.:s.n.]:4295-4304.
[67]RASHID T, FARQUHAR G, PENG B, et al., 2020. Weighted QMIX:Expanding monotonic value function factorisation[J].arXiv e-prints:arXiv-2006.
[68]RAZEGHI Y, KASK K, LU Y, et al., 2021. Deep bucket elimination[C]//IJCAI.[S.l.:s.n.].
[69]REIJNEN R, ZHANG Y, NUIJTEN W, et al., 2020. Combining deep reinforcement learning with search heuristics for solving multi-agent path finding in segmentbased layouts[C]//2020 IEEE Symposium Series on Computational Intelligence (SSCI).[S.l.:s.n.]:2647-2654.
[70]RISLER M, VON STRYK O, 2008. Formal behavior specification of multi-robot systems using hierarchical state machines in XABSL[C]//AAMAS08-Workshop on Formal Models and Methods for Multi-Robot Systems.[S.l.:s.n.]:7.
[71]ROSENSCHEIN J S, GENESERETH M R, 1985. Deals among rational agents[C]//IJCAI.[S.l.:s.n.]:91-99.
[72]SAMVELYAN M, RASHID T, SCHROEDER DE WITT C, et al., 2019. The StarCraft multi-agent challenge[C]//AAMAS.[S.l.:s.n.]:2186-2188.
[73]SCHÖLKOPF B, LOCATELLO F, BAUER S, et al., 2021. Toward causal representation learning[J].Proceedings of the IEEE, 109(5):612-634.
[74]SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al., 2020. Mastering Atari, Go, chess and shogi by planning with a learned model[J].Nature, 588 (7839):604-609.
[75]SELSAM D, LAMM M, BENEDIKT B, et al., 2018. Learning a SAT solver from single-bit supervision[C]//ICLR.[S.l.:s.n.].
[76]SEMNANI S H, LIU H, EVERETT M, et al., 2020. Multi-agent motion planning for dense and dynamic environments via deep reinforcement learning[J].IEEE Robotics and Automation Letters, 5(2):3221-3226.
[77]SILVER D, HUANG A, MADDISON C J, et al., 2016. Mastering the game of Go with deep neural networks and tree search[J].Nature, 529(7587):484-489.
[78]SILVER D, SCHRITTWIESER J, SIMONYAN K, et al., 2017. Mastering the game of go without human knowledge[J].Nature, 550(7676):354-359.
[79]SILVER D, HUBERT T, SCHRITTWIESER J, et al., 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J].Science, 362(6419):1140-1144.
[80]SON K, KIM D, KANG W J, et al., 2019. QTRAN:Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//ICML.[S.l.:s.n.]:5887-5896.
[81]STERN R, STURTEVANT N R, FELNER A, et al., 2019. Multi-agent pathfinding: Definitions, variants, and benchmarks[C]//Twelfth Annual Symposium on Combinatorial Search.[S.l.:s.n.].
[82]STUTZ D, HEIN M, SCHIELE B, 2020. Confidence-calibrated adversarial training: Generalizing to unseen attacks[C]//ICML.[S.l.]:PMLR:9155-9166.
[83]SUZUKI Y, NAKASHIMA T, 2019. On the use of simulated future information for evaluating game situations[C]//Robot World Cup.[S.l.:s.n.]:294-308.
[84]TAMBE M, 2011. Security and game theory: Algorithms, deployed systems, lessons learned[M].[S.l.]:Cambridge University Press.
[85]TAMBE M, JIANG A X, AN B, et al., 2014. Computational game theory for security: Progress and challenges[C]//AAAI spring symposium on applied computational game theory.[S.l.:s.n.].
[86]TUYLS K, OMIDSHAFIEI S, MULLER P, et al., 2021. Game plan: What AI can do for football, and what football can do for AI[J].Journal of Artificial Intelligence Research, 71:41-88.
[87]VILLAGE D M. NAGA:Deep learning mahjong AI[EB/OL].https://dmv.nico/ja/articles/mahjong_ai_naga/.
[88]VINYALS O, FORTUNATO M, JAITLY N, 2015. Pointer networks[C]//NeurIPS.[S.l.:s.n.]:2692-2700.
[89]VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al., 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature, 575(7782):350-354.
[90]WANG J, REN Z, LIU T, et al., 2020. QPLEX:Duplex dueling multi-agent qlearning[C]//ICLR.[S.l.:s.n.].
[91]WATKINSON W B, CAMP T, 2018. Training a robocup striker agent via transferred reinforcement learning[C]//Robot World Cup.[S.l.]:Springer:109-121.
[92]XU H, KOENIG S, KUMAR T S, 2018. Towards effective deep learning for constraint satisfaction problems[C]//CP.[S.l.:s.n.]:588-597.
[93]XUE W, ZHANG Y, LI S, et al., 2021. Solving large-scale extensive-form network security games via neural fictitious self-play[C]//IJCAI.[S.l.:s.n.]:3713-3720.
[94]YANG Q, LIU Y, CHEN T, et al., 2019. Federated machine learning: Concept and applications[J].ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1-19.
[95]YANG Y, LUO R, LI M, et al., 2018. Mean field multi-agent reinforcement learning[C]//ICML.[S.l.:s.n.]:5571-5580.
[96]YE D, CHEN G, ZHANG W, et al., 2020. Towards playing full moba games with deep reinforcement learning[J].arXiv preprint arXiv:2011.12692.
[97]YEOH W, YOKOO M, 2012. Distributed problem solving[J].AI Magazine, 33(3):53-53.
[98]ZHA D, XIE J, MA W, et al., 2021. DouZero: Mastering doudizhu with self-play deep reinforcement learning[J].arXiv preprint arXiv:2106.06135.
[99]ZHU L, HAN S, 2020. Deep leakage from gradients[M]//Federated learning.[S.l.]:Springer:17-31.
[100]ZINKEVICH M, JOHANSON M, BOWLING M, et al., 2007. Regret minimization in games with incomplete information[C]//NeurIPS.[S.l.:s.n.]:1729-1736.
[1] 链接1-1。