We propose a prototypical reward network that enables data-efficient reinforcement learning from human feedback (RLHF) for large language models....
Jan 1, 2024
We propose a semi-offline reinforcement learning approach for optimizing text generation in language models, balancing exploration and exploitation effectively....
May 1, 2023
A novel subgraph reasoning paradigm for fake news detection that provides explainability while improving generalization through reinforcement learning and hierarchical graph attention networks.
Jun 1, 2022