Skip to main content
Article

Nothing Comes Without Its World – Practical Challenges of Aligning LLMs to Situated Human Values through RLHF

Anne ArzbergerDelft University of TechnologyStefan BuijsmanDelft University of TechnologyMaria Luce LupettiPolitechnic University of TurinAlessandro BozzonDelft University of TechnologyJie YangDelft University of Technology
ABI

Abstract

Work on value alignment aims to ensure that human values are respected by AI systems. However, existing approaches tend to rely on universal framings of human values that obscure the question of which values the systems should capture and align with, given the variety of operational situations. This often results in AI systems that privilege only a selected few while perpetuating problematic norms grounded on biases, ultimately causing equity and justice issues. In this perspective paper, we unpack the limitations of predominant alignment practices of reinforcement learning from human feedback (RLHF) for LLMs through the lens of situated values. We build on feminist epistemology to argue that at the design-time, RLHF has problems with representation in the subjects providing feedback and implicitness in the conceptualization of values and situations of real-world users while lacking system adaptation to real user situations at the use time. To address these shortcomings, we propose three research directions: 1) situated annotation to capture information about the crowdworker’s and user’s values and judgments in relation to specific situations at both the design and use-time, 2) expressive instruction to encode plural values for instructing LLMs systems at design-time, and 3) reflexive adaptation to leverage situational knowledge for system adaption at use-time. We conclude by reflecting on the practical challenges of pursuing these research directions and situated value alignment of AI more broadly.

Topics

Identifiers

Citations and references

Cited by 00 references
Metrics — AkademScholar · Coming soon