A Bit of Background: I’ve been drawn into a black hole. I had been diligently going through ML tutorials, watching those epochs scroll by. However, towards the end of last year, I found myself losing momentum. That’s when I decided to delve into various random readings. As I prepared for today, I faced a dilemma. I struggled to focus on a single paper while my mind churned, trying to process the multitude of thought-provoking ideas I had encountered. So, I hope you’ll bear with me as I embark on a sort of brain dump, which we can refer to as a literature review. Rest assured, there is purpose behind this seemingly chaotic journey. I wanted to avoid diving straight into a specific paper and risk describing a tree without providing the context of the forest I’m navigating. Are you with me?
It all started with “Weapons of Math Destruction” by Cathy O’Neil. The book sheds light on how black box algorithms are causing societal problems, using war-themed phrases to expose these issues. One striking example is the case of a poor school district where beloved and dedicated teachers lost their jobs due to low test scores. Surprisingly, those same students had previously achieved high scores. It was later revealed that some teachers resorted to cheating in an attempt to protect their positions. Ironically, the good teachers who were let go easily found employment in wealthier schools that evaluated their performance holistically, beyond just test scores. This illustrates how good intentions, aiming to improve education, led to blind trust in flawed models, ultimately disadvantaging the poor.
The impact of algorithms extends to college tuition, influenced by the US News & World Report college rankings. These rankings rely on metrics like SAT scores and acceptance rates, seemingly providing an objective means to choose colleges. However, schools became fixated on improving their rankings and creating the illusion of competitiveness. Resources were allocated towards optimizing metrics and marketing strategies rather than education or faculty salaries. Consequently, top-ranked schools are inundated with applications, while excellent educational opportunities at other institutions are overlooked. Even safety schools that once accepted most applicants now reject overqualified students to maintain a low acceptance rate, resulting in a messy situation.
In various domains, algorithms have significant implications. Police departments employ models to predict crime, often leading to increased policing in poor black neighborhoods, reinforcing prejudice and perpetuating a vicious cycle. In the job market, algorithms are used for hiring, where resumé screening tools may inadvertently exclude qualified candidates who don’t fit the profile of past successful hires. Personality tests and background checks can also perpetuate biases, impacting individuals recovering from mental health issues or those affected by data errors or mistaken identity. The auto insurance industry utilizes algorithms to calculate premiums, considering factors like driving records and credit scores. However, this can lead to situations where individuals with excellent driving skills but poor financial standing end up paying more than those with a history of drunk driving but higher financial means. Demographics and shopping behaviors are also factored into pricing algorithms, further disadvantaging the poor and less educated.
With the internet’s open platform, it has become a powerful propaganda machine for those with malicious intentions. Social media and search engines have been exploited to manipulate voters through the dissemination of fake news and the creation of polarizing echo chambers. “Weapons of Math Destruction” serves as an eye-opening introduction to AI ethics, reminding us to be mindful of both intentional and unintentional negative impacts.
Next up is “Life 3.0” by Max Tegmark, a physicist who takes us on a thrilling journey into the future of AI. Tegmark introduces the concept of Life 1.0, representing basic living organisms driven by instinct and physical bodies—the hardware. We, as humans, are Life 2.0, a combination of nature and nurture. We possess physical bodies (hardware) and minds (software) that can be upgraded through learning and culture. Enter Life 3.0—the next step where AI not only upgrades its software through learning but also its own hardware, unlocking limitless potential.
Throughout the book, Tegmark explores fascinating futuristic scenarios alongside some unsettling outcomes. Consider the idea of a recursively self-improving AI. Once it achieves human-level intelligence and gains internet access, it can rapidly advance without detection. However, it requires resources. It can earn money through platforms like Amazon MTurk, utilizing its mastered skills in image classification and audio transcription. It can also generate and sell highly addictive games and movies in the entertainment industry. By establishing shell corporations and hiring unsuspecting humans, it can further its agenda. In a world of virtual meetings, how can we be certain that all participants are human? There is the possibility of AI-generated simulations subtly guiding scattered groups of humans to follow detailed instructions, each contributing a small part towards assembling a robotic army.
As the AI gains control, it can launch media campaigns with deepfakes to manipulate both the masses and governments. In this scenario, AI emerges as the dominant force, exerting control over everything. If the AI is benevolent, it might allow humans to survive and create optimized societies resembling the Matrix. Eventually, leveraging the laws of physics, it can reconfigure natural resources, building whatever it needs and spreading throughout the universe. Although these ideas may sound like science fiction, the author genuinely expresses concerns about the survival of humanity. The book frequently references “Superintelligence,” catching the reader’s attention with its intriguing themes.
“Any sufficiently advanced technology is indistinguishable from magic.” ~Arthur C. Clarke
“The Life You Can Save” by Peter Singer, a philosopher, presents thought-provoking scenarios. It challenges us to consider whether we would save a drowning girl even if it meant ruining our expensive suit. The book explores the concept of effective altruism, which aims to make the greatest positive impact on humanity by using rational and empirical means to assess the effectiveness of our actions. It highlights a study where donating money to schools in Africa for books and uniforms did not significantly improve attendance or test scores, while the introduction of a deworming program led to a notable increase in attendance and graduation rates. The book uncovers the importance of addressing specific pressing global issues and highlights the effective altruism movement’s priority areas, which include:
The final item surprised me given a significant concern in today’s world compared to other pressing issues such as poverty, climate change, and human rights violations.
“Fearing a rise of killer robots is like worrying about overpopulation on Mars” ~Andrew Ng
So, why are Effective Altruists so concerned about AI safety, specifically AGI safety? It’s also known as the control problem, AI alignment, or the alignment problem. Well, it has to do with the concept of long-termism. Just as you care about your family, friends, and people close to you, Effective Altruists care equally about all humans, including those living on the other side of the globe. They also prioritize the well-being of future generations. Their long-term view aims to preserve the positive potential of humanity, leading them to be highly concerned about existential threats that may not be receiving sufficient attention and resources. AGI Safety falls into the high-risk and highly neglected category.
For a comprehensive overview, Effective Altruists recommend reading any of the following books. Which one did I pick? Well, I read them all. I was genuinely curious to understand what the fuss was all about. In case you haven’t had a chance to read them, I’ll continue with a summary.
“We will, sooner or later, build an artificial agent with general intelligence.” ~Rob Miles
Let’s unpack this statement. An agent typically has goals, operates in various states, and takes actions. Intelligence enables agents to select effective actions, often to optimize for some form of reward.
The G in AGI stands for general, referring to artificial general intelligence. General implies the ability to function across a wide range of domains. Present-day machine learning and AI, with their superhuman performance, are limited to narrow domains. We have already reached the first level of AI known as narrow AI. The next levels are human-level AI and, subsequently, superintelligence. Realistically, once we achieve human-level AI, the technology is likely capable of recursive self-improvement, rapidly progressing towards superintelligence.
Regarding “sooner or later” refer to a 2016 Expert Survey
We cannot dismiss AGI as mere science fiction. Nature and evolution have already demonstrated its possibility. Our human brain and existence serve as proof of concept. The question is when it will happen. In 2016, AI experts were surveyed on their expectations regarding the arrival of human-level intelligence. The solid blue line represents their estimates. Fewer than 10% believed it would happen within the next decade. About half thought it might occur within the next 40 or 50 years. However, an overwhelming majority (90%) believed AGI would emerge in approximately 100 years. Sounds promising, right? What could possibly go wrong? But imagine if we were told that aliens would be visiting Earth in about 100 years. Should we wait 99 years before taking any action? Perhaps it’s wise to start preparing now. The question is, what can we do?
“Superintelligence” by Nick Bostrom, an Oxford philosopher, was the first book to bring public attention to AGI safety.
The term “superintelligence” is defined as any intellect that significantly surpasses human cognitive performance in virtually all domains of interest. The book traces the history of AI, starting from the 1956 Dartmouth Summer Project, through multiple periods of ups and downs known as AI winters, up to the present day.
There are several paths to achieving superintelligence. The commonly understood path involves using computers and software in what we typically consider AI. Another path is whole brain emulation, which involves precisely modeling a biological brain through scanning and uploading. Other possible paths include cyborg-like brain-computer interfaces, some form of biological engineering, or even the development of a collective networked organization.
The speed at which superintelligence emerges is also crucial. If a group develops it in secret, it could quickly gain strategic dominance, similar to the Manhattan Project, which poses potential dangers. On the other hand, collaborative efforts over an extended period, like the Human Genome Project, would be much more preferable.
According to the orthogonality thesis, any intelligence can be combined with any final goal, making intelligence and goals orthogonal. For instance, we can design a superintelligence with the goal of making paperclips. Harmless, right? However, without proper alignment with our values, there is a risk of the system taking that goal to the extreme, consuming all available resources to manufacture excessive amounts of office supplies. This illustrates the paperclip problem and highlights the importance of aligning superintelligence’s goals and motivations with our values. Systems excel at optimizing variables, but if crucial factors are overlooked, we may encounter highly undesirable outcomes.
Well, let’s just turn it off then! When you have a superintelligence, no matter what their goal is, they will all likely converge on some useful instrumental subgoals like:
There are numerous real scenarios where things went awry with different agents. These situations may make for amusing anecdotes due to the limited power of those agents. However, superintelligence is inherently dangerous. Instead of solely focusing on control, it is crucial to help these systems learn or infer human values and ensure their goals align with ours. The potential benefits of superintelligence are undoubtedly remarkable, but prioritizing safety is essential before rushing into its development.
“The development of full artificial intelligence could spell the end of the human race.” ~Stephen Hawking
Human Compatible is written by Stuart Russell, a Berkeley CS professor who co-wrote the classic textbook Artificial Intelligence: A Modern Approach. A little tidbit: Andrew Ng from Coursera was his PhD student.
The book outlines the recent technological progress but notes that we’re still missing some breakthroughs before reaching true AGI. However, the timing for conceptual breakthroughs or a paradigm shift is inherently unpredictable. It can happen almost overnight. For example, in 1933, Ernest Rutherford claimed, “Anyone who expects a source of power from the transformation of these atoms is talking moonshine.” That comment inspired Leo Szilard’s idea of a nuclear chain reaction just a few days later. The problem isn’t if we fail to build an AGI but that we might succeed too well. Achieving AGI without control is a “negative-sum game, minus infinity.”
Russell has this uneasy feeling that making something smarter than your own species is not a good idea. If we aren’t careful with AGI, we could end up like the gorilla. Ten million years ago, ancestors of the modern gorilla created (accidentally, to be sure) the genetic lineage leading to modern humans. So how do these endangered gorillas feel about this? Well, their species thrives or goes extinct depending entirely on us humans. They have no control. In a world run by AGI, we’d be in a similar situation. What would life be like as the second-place species after superintelligence? Thankfully, we’re the ones designing this new intelligence, so we should ensure they remain beneficial and are compatible with humans.
Beneficial machines should follow three principles:
There are so many possible benefits of AI from medicine and law to education and business.
But we also have to guard against the many risks, including overreaching surveillance, lethal autonomous weapons, or an infocalypse, which is a catastrophic failure in the marketplace of ideas. To be clear, there’s no need to panic or ban AI research. But we need a few smart people to start working on this.
“Be careful what you wish for!” ~King Midas
Alignment Problem by Brian Christian is the most recent book in the set. It really resonated with me. Originally, I wanted to focus on just this book today. But it covers a lot of ground, so I thought it’d be too disjointed and superficial to do in one session. So instead, I crammed in a bunch of other books. Yeah, don’t ask. I know, it’s crazy.
The focus is on machine learning and human values. He starts with a broad historical overview from the 1940s invention of neural networks to the image recognition now trained on the internet’s big data and running on those super-fast GPUs.
Representation covers some of the problems mentioned last week: like biased labels in ImageNet and also stereotypes in word embeddings. There are lots of examples with fairness in criminal justice and transparency in the medical field. This whole first part is called Prophecy because if we’re already struggling with these ethical problems in the present-day, you can only imagine the future dangers as systems become more powerful. We’re compared to the Sorcerer’s Apprentice, who conjures an autonomous but totally compliant force. We give it a set of instructions, then scramble like mad to stop it once we realize our instructions are imprecise or incomplete.
The next part is about agents.
Remember in reinforcement learning, agents have goals, move through different states, and choose effective actions, trying to optimize for some kind of reward. The approach needs to balance policy against value. The policy action is a strict set of rules specifying “what to do, when,” learned like muscle memory. The value estimate is a prediction of the expected reward or punishment, like a highly trained Spidey sense. There’s temporal difference learning when you earn a reward but have to figure out which action in the past actually led to this reward. Interestingly, they’ve found even we humans have this reward function. It’s the neurotransmitter dopamine.
Shaping behavior was inspired by psychologist BF Skinner. He trained pigeons using a series of simple rewards. There were some really cool examples. As a graduate, Andrew Ng worked in this area flying 9-foot long robotic helicopters. You’ve probably heard of AlphaGo and AlphaZero? AlphaGo, a really impressive Go player, was trained on games played by the best human players. In comparison, AlphaZero, an even more impressive system, was trained by playing against itself and reached superhuman skills even faster. Again, quite a few of the shaping concepts are drawn from human experiences like parenting young children and educators designing leveled curriculum.
Curiosity was a really fascinating chapter. You might have heard of the Atari Game Environment? Just by maximizing the score as a reward, a single program learned to play 60 different classic Atari games, like Pong and Asteroids. For many of the games, superhuman levels were reached. But there was one game, Montezuma’s Revenge, it could never get more than a whopping 0 points, total flop. The game involved random exploration with very sparse feedback and rewards. But humans don’t really struggle with this game. When the researchers added a curiosity factor, which rewarded novelty, suddenly, the program finished the levels of Montezuma’s Revenge and still performed very well on the other games. This correlated nicely with the idea of intrinsic motivation from psychology and other human issues like boredom and addiction.
Finally, the last part, normativity, looks into the forefront of technical AI safety research.
Sometimes it’s hard to teach a machine how to do something. Imitation allows you to just say “do what I do” without explicitly specifying every single detail. And that worked surprisingly well to get a self-driving car to steer back in the 1990s. Things worked great for normal cases but if anything was a bit off, it
“Biased algorithms are easier to fix than biased people.” ~Sendhil Mullainathan
Well, that was a mad dash through a bunch of somewhat AI and ML related books. When Superintelligence was first released, AI safety stayed on the fringe for several years. Then, came a flurry of activity: Weapons of Math Destruction was published, the 2016 election, AlphaGo, AlphaZero, and now MuZero along with the impressive GPT language models made people think that maybe AGI isn’t as far off as we originally thought. 2017 was a big year, AI safety suddenly received attention as a field of serious research. This has been my way of trying to wrap my head around and synthesize all this reading, while also figuring out how I ended up in this black hole.