青岛棋牌网下载,如何戒掉网络赌博,体育博彩网站排行榜足球(中国)·官方网站

您所在的位置：首頁 / 學術空間

Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion

來源：計算機與人工智能學院時間：2023-11-28 瀏覽：

講座編號：jz-yjsb-2023-y016

講座題目：Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion

主講人：夏俐教授中山大學

講座時間：2023年12月1日（星期五）下午15:30

講座地點：北京工商大學阜成路校區東區科教樓四層會議室

參加對象：計算機與人工智能學院信息管理系研究生及本科生

主辦單位：計算機與人工智能學院

主講人簡介：

夏俐，中山大學管理學院教授。分別于2002年和2007年在清華大學自動化系獲得學士和博士學位，博士生期間在香港科技大學聯合培養，博士畢業后分別在IBM中國研究院、沙特國王科技大學從事科研工作，2011年至2019年在清華大學自動化系任教，歷任講師、副教授（博士生導師），2019年調入中山大學。主要研究方向為馬氏決策過程、強化學習、排隊論、博弈論等理論研究，以及在能源、金融等領域的應用研究。發表論文100余篇，獲得美國專利3項、中國專利8項，主持4項國家自然科學基金項目、3項國家重點研發計劃子課題、多項華為公司等合作研發項目。擔任IEEE Transactions on Automation Science and Engineering、Discrete Event Dynamic Systems等國際權威SCI期刊的副主編（AE）等學術兼職。曾獲2021年和2014年教育部高等學校自然科學二等獎等學術獎勵。

主講內容：

CVaR(Conditional Value at Risk) is an important risk measure in finance engineering. Traditional studies on the optimization of CVaR metrics are usually for single-stage problem. When extended to multi-stage scenarios, the CVaR risk function is not additive per stage, which does not fit the standard MDP(Markov decision process) model and the principle of dynamic programming fails. In this talk, we study the MDP optimization problem for long-run CVaR criterion using a new tool called the sensitivity-based optimization. By introducing a pseudo CVaR metric, we convert the original problem as a bilevel MDP problem: the inner is a standard MDP optimizing the pseudo CVaR, the outer is an optimization problem for a single auxiliary variable. We derive a CVaR difference formula which quantifies the difference of long-run CVaR values under any two randomized policies. With this difference formula, we prove the optimality of deterministic policies. We also obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. We further develop a policy iteration type algorithm to efficiently optimize CVaR. We prove that the iterative algorithm can converge to local optima in the mixed policy space. Finally, we conduct a numerical experiment about portfolio management to demonstrate the main results. Our work may shed light on dynamically optimizing CVaR from a sensitivity viewpoint.