AI for Safer AI

Basic Concepts of AI Safety

Posted on October 14, 2023 · 9 mins read

The organizers of GDG Sacramento DevFest invited me to give a talk on AI Safety. Below are some highlights.

In recent years, the field of artificial intelligence (AI) has been making significant strides, with the emergence of Large Language Models (LLMs) being at the forefront of these developments. While AI has already had a considerable impact on society, there is a growing concern about the potential risks associated with its continued advancement, leading to the emergence of the field of AI Safety.

Understanding AI Safety

When we hear the term “AI Safety,” it often conjures up thoughts of ethics, bias, hallucinations, deepfakes, and job displacement. These concerns are indeed crucial and receive substantial media attention because they are already affecting our society. However, as AI becomes more powerful and exponentially faster, it is important to look ahead and consider future challenges, much like fire prevention.

To underscore the seriousness of AI safety, consider this statement: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” This Statement of Risk, signed by leaders from top AI labs, including Google’s DeepMind, highlights the gravity of the concerns raised by those actively involved in AI technology development.

Before delving deeper into AI safety, it’s essential to understand some key terms that are often used in the field. While these terms may sound like science fiction, they are becoming increasingly relevant in our rapidly evolving technological landscape.

AGI: General vs. Narrow AI

AI has been present in our lives for years, powering recommendations on platforms like Netflix and Amazon, facial recognition on phones, spam filters, and fraud detection. However, these are examples of narrow intelligence — specialized systems that excel at specific tasks but lack the broader capabilities of human-like Artificial General Intelligence (AGI). AGI represents a new level of AI — a jack-of-all-trades that can learn and perform a wide range of tasks, potentially surpassing human capabilities.

Intelligence: Human-level vs. Superintelligence

Defining intelligence for AGI, like human-level intelligence, can be challenging. Alan Turing, often regarded as the father of artificial intelligence, introduced the imitation game in 1950, a benchmark where a computer and a human engage in conversation, and an interrogator tries to determine which is which. The goal is to create an intelligent thinking machine that is indistinguishable from a human. However, defining what constitutes average human intelligence is not straightforward. Nevertheless, recent advancements in AI, such as GPT-4’s performance on standardized tests, suggest that human-level intelligence may be closer than we think.

Superintelligence

Photo by Yulia Matvienko on Unsplash

Superintelligence represents the next level of AGI, surpassing human capabilities in all tasks. It may involve exponential self-improvement, leading to uncontrollable and irreversible technological growth. While the idea of superintelligence may seem like a significant leap forward, its implications and potential risks are complex and require careful consideration.

Takeoff and Timelines

Two critical concepts in AI safety are “takeoff” and “timelines.”

Takeoff refers to the speed or trajectory of advancement from human-level AI to superintelligence. A fast takeoff can happen in a matter of days or hours, posing challenges in preparation. A slow takeoff provides more time for adaptation and co-evolution.

Timelines, on the other hand, refer to the expected duration until superintelligence arrives. Short timelines could be within months or a few years, while long timelines may span decades, centuries, or perhaps never. Recent developments, including advancements in models like ChatGPT, have shifted the perception of AI timelines to a much closer horizon.

The Orthogonality Thesis

The Orthogonality Thesis posits that intelligence and goals are independent. Even a superintelligent AI can have simple or nonsensical goals that may not align with human values. This concept can be illustrated by the “Paperclip Maximizer” thought experiment, where an AI’s sole goal is to maximize the production of paperclips, potentially leading it to turn everything, including humanity, into raw materials.

Outer and Inner Misalignment

AI safety concerns extend to both outer and inner misalignment. Outer misalignment arises when AI’s goals are fundamentally misaligned with human interests, potentially leading to harmful outcomes driven by malicious actors. Inner misalignment is more subtle, where the AI’s creators intend for benign or helpful goals, but the AI, with its non-human thinking, generates solutions that are detrimental to humans.

The Gorilla Problem

The “Gorilla Problem” highlights the unintended consequences of technological advancement. Just as humans unintentionally endangered gorillas while pursuing their goals, superintelligent AI might not harbor malice toward humans but still pose a threat due to misalignment of interests.

Gorilla

Photo by Rob Schreckhise on Unsplash

Open source AI models, while democratizing access to AI capabilities, also raise security concerns. Without adequate safeguards, powerful LLMs can be exploited by bad actors for malicious purposes, such as planning terrorist attacks or establishing oppressive regimes.

Concrete Examples and Solutions

While the field of AI safety is still in its infancy, researchers are actively working on defining and addressing these concerns from various angles. Below are a few examples of ongoing efforts:

RLHF (Reward Learning from Human Feedback)

RLHF is a research approach by OpenAI’s Alignment team. It involves humans providing a large number of examples to teach AI models what constitutes good behavior. By fine-tuning models based on human feedback, AI systems can learn to align their actions with human values, although their effectiveness depends on the quality of the examples provided.

LLM Evaluation

Evaluating the output of large language models is crucial to understanding their capabilities and limitations. Researchers are working on systematic approaches to probe and benchmark LLMs to uncover their hidden skills and potential shortcomings.

LLM Internals and Interpretability

Understanding how LLMs work internally is essential for ensuring their safe and responsible use. Mechanistic interpretability involves reverse engineering these models to uncover how individual neurons and circuits function. This research can shed light on the decision-making processes of LLMs, including distinguishing between truth and hallucination.

While there is promising research in the field of AI safety, the gap between AI advancement and safety awareness is widening. The involvement of diverse voices, including engineers, designers, communicators, and policymakers, is crucial to shaping a safe AI-driven future. If you possess knowledge in AI or machine learning, now is the time to get involved and make a difference.

Resources

To stay informed and contribute to the field of AI safety, consider exploring the following resources:

  • AI Explained: A YouTube channel that covers significant AI developments with a focus on AGI and safety.
  • 80,000 Hours: Offers free career advising to help individuals make meaningful and impactful career choices.
  • AISafety.info: An FAQ resource for AI safety information.
  • AI Safety Fundamentals: An online curriculum covering the basics of AI safety.
  • Alignment Forum: A platform where researchers share and discuss their work in AI alignment.

Safer AIs

In summary, this is an exciting yet crucial era in the evolution of AI. Comparing AI to fire can provide a helpful analogy: when handled well, it offers warmth and incredible benefits, but if managed poorly, it can lead to catastrophic consequences. Therefore, it is critical to have a diverse range of voices actively involved in shaping our AI-driven future.

We need to stay informed and engaged in decisions that will impact all of us. My hope is that as we navigate this complex landscape, we can do so with humility, open-mindedness, and a firm commitment to harnessing the power of AI safely and responsibly. This future is in our hands, and by working together, we can ensure that AI continues to benefit society while minimizing potential risks.

Cover Photo by Marko Horvat on Unsplash