Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk 事件

Name: Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk
Start: 2026-05-29

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk arXiv:2605.29788v1 Announce Type: new Abstract: Critical sequential decisions are rarely single-timescale: a strategic decision causally shapes the context in which every subsequent tactical choice is made; standard bandit and reinforcement-learning theory does not capture this causal coupling between timescales. We formalise the problem class as Nested Contextual Causal Bandits (NCCBs), a hierarchical SCM where each lev

人工智能

关系图谱

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk 事件

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk · 相关技术

相关技术