can you provide the correlative paper about the cppo realised in your project?

if the Mathematical proof in the paper Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk can support the code of cppo in this project? I can not understand the variable cvarlam and nu. what is the relationship between carlam and CVaR. and why can we use nu as the threshold for bad trajectory. I think the nu is a cumulative rewards including all the steps. however ep_ret + v - r is the reward in one step. if nu and ep_ret + v - r are comparable? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can you provide the correlative paper about the cppo realised in your project? #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

can you provide the correlative paper about the cppo realised in your project? #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions