Skip to content

can you provide the correlative paper about the cppo realised in your project? #2

Description

@BigCakeLove

if the Mathematical proof in the paper Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk can support the code of cppo in this project? I can not understand the variable cvarlam and nu. what is the relationship between carlam and CVaR. and why can we use nu as the threshold for bad trajectory. I think the nu is a cumulative rewards including all the steps. however ep_ret + v - r is the reward in one step. if nu and ep_ret + v - r are comparable?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions