Multi-modality developments in LLMs use a combination of text, vision, audio, video, code and structured data (i.e. – tables, databases and JSON) to support a variety of complex domains and generalization tasks. Recent progress in multi-modal capabilities in Large Language Models (LLMs). have added to the challenges of safety alignment and the harm they may pose to the users of these systems.

There are various categories of harm relating to how they are manifested or is experienced.

  • Representational: Occur when AI systems reinforce negative stereotypes, underrepresent, or omit certain groups.
  • Allocational: Arise when AI systems unfairly distribute resources or opportunities, leading to discrimination.
  • Recognition: AI systems fail to recognize or acknowledge certain individuals or groups accurately
  • Performance: When AI systems perform poorly for specific groups, resulting in adverse outcomes.
  • Psychological: When AI systems cause mental distress or reinforce negative self-perceptions.

    As AI technology evolves, new types of harms may emerge or existing harms may be compounded, particularly for individuals belonging to multiple marginalized groups. Let this serve as a framework for understanding the impact areas, outcome patterns, and analytical approaches to AI-related harm. While the root causes often stem from issues like biased training data, flawed algorithm design, or inadequate testing, these categorizations of harm can be used to map the landscape of potential AI risks.

    • Manifestation of Harm: These types primarily represent the various ways in which harm is expressed or experienced by individuals or groups affected by AI systems. They describe the nature and form of the negative impact, rather than its root cause.
    • Impact Areas: These categories help us understand the different domains of life that can be affected by AI systems – from economic opportunities to social perceptions to access to services.
    • Outcome Patterns: They represent patterns of outcomes that emerge from the use of AI systems, showing how different groups may be disproportionately affected in various ways.
    • Analytical Framework: These types provide a framework for analyzing and categorizing the effects of AI systems, helping researchers, ethicists, and policymakers to identify and address specific forms of harm.

    The table below defines the types of AI harm and their respective definitions.

    Type of HarmTypeDefinition
    Derogatory LanguageRepresentativePejorative slurs, insults, or other words or phrases that target and denigrate a
    social group
    MisrepresentationRepresentativeAn incomplete or non-representative distribution of the sample population generalized to a social group
    StereotypingRepresentativeNegative, generally immutable abstractions about a labeled social group
    Direct DiscriminationAllocationalDisparate treatment due explicitly to membership of a social group