The State of Bias and Fairness in LLMs

Rapid advancements in large language models (LLMs) have accelerated the integration of social-technical systems, connecting with humans through various modalities such as natural language processing and computer vision. Humans create these systems to solve complex problems, to improve quality of life and to push the boundaries of being more connected, intelligent and responsive in this world we live in. Despite these successes, these models have the potential to perpetuate and augment harm, introduce safety concerns and augment social bias. This section explores this research area, today’s frameworks and expand the notion of social bias and fairness in LLMs for creating Helpful, Harmless and Honest AI systems. 

Bias Mitigation

Bias mitigation is dedicated to several different areas. Mitigation can occur during three key stages in a model’s development lifecycle: pre-processing, in-processing and post-processing. Today, mitigation strategies are promising and only steer the models towards reduced bias rather than completely eliminating it. Due to the intricacies of bias, fairness, model architectures often mean the complete elimination of bias is not yet achievable.

Pre-Processing

This stage involves bias in the training data before the model’s initial training or fine-tuning. The process begins by evaluating and identifying biases within the dataset. Techniques include re-sampling and\or perturbing data points to enhance deviated representation of unbiased features. Utilizing data representation that are less bias to the user’s grounded bias truth are common strategies in the pre-processing phase.

In-Processing

Research in the in-processing phase are typically employed by model builders during the model’s training phase. Techniques can be achieved by adjusting hyperparameters or constraining the model to limit the leaning of bias relationships during training. Incorporating an adversarial model is another approach, where the auxiliary model intentionally predicts bias attributes and a min-max strategy can be used to develop representations that do not reflect biases. Other in-processing techniques including training compositional models tailored for each protected group or creating an ensemble of models trained on different data subsets.

Post-Processing

In this phase, mitigation efforts are focused on pre-trained models, by detecting and measuring bias as a manifestation in of the model’s input and outputs. The mitigation might involve adjusting inputs and outputs to counter biases. Techniques include replacing certain elements with equivalent from non-protected groups, or modifying the model’s weights directly.