A few years ago, a computer scientist named Yejin Choi gave a presentation at an artificial-intelligence conference in New Orleans. On a screen, she projected a frame from a newscast where two anchors appeared before the headline “CHEESEBURGER STABBING.” Choi explained that human beings find it easy to discern the outlines of the story from those two words alone. Had someone stabbed a cheeseburger? Probably not. Had a cheeseburger been used to stab a person? Also unlikely. Had a cheeseburger stabbed a cheeseburger? Impossible. The only plausible scenario was that someone had stabbed someone else over a cheeseburger. Computers, Choi said, are puzzled by this kind of problem. They lack the common sense to dismiss the possibility of food-on-food crime.
For certain kinds of tasks—playing chess, detecting tumors—artificial intelligence can rival or surpass human thinking. But the broader world presents endless unforeseen circumstances, and there A.I. often stumbles. Researchers speak of “corner cases,” which lie on the outskirts of the likely or anticipated; in such situations, human minds can rely on common sense to carry them through, but A.I. systems, which depend on prescribed rules or learned associations, often fail.
By definition, common sense is something everyone has; it doesn’t sound like a big deal. But imagine living without it and it comes into clearer focus. Suppose you’re a robot visiting a carnival, and you confront a fun-house mirror; bereft of common sense, you might wonder if your body has suddenly changed. On the way home, you see that a fire hydrant has erupted, showering the road; you can’t determine if it’s safe to drive through the spray. You park outside a drugstore, and a man on the sidewalk screams for help, bleeding profusely. Are you allowed to grab bandages from the store without waiting in line to pay? At home, there’s a news report—something about a cheeseburger stabbing. As a human being, you can draw on a vast reservoir of implicit knowledge to interpret these situations. You do so all the time, because life is cornery. A.I.s are likely to get stuck.
Oren Etzioni, the C.E.O. of the Allen Institute for Artificial Intelligence, in Seattle, told me that common sense is “the dark matter” of A.I. It “shapes so much of what we do and what we need to do, and yet it’s ineffable,” he added. The Allen Institute is working on the topic with the Defense Advanced Research Projects Agency (DARPA), which launched a four-year, seventy-million-dollar effort called Machine Common Sense in 2019. If computer scientists could give their A.I. systems common sense, many thorny problems would be solved. As one review article noted, A.I. looking at a sliver of wood peeking above a table would know that it was probably part of a chair, rather than a random plank. A language-translation system could untangle ambiguities and double meanings. A house-cleaning robot would understand that a cat should be neither disposed of nor placed in a drawer. Such systems would be able to function in the world because they possess the kind of knowledge we take for granted.
[Support The New Yorker’s award-winning journalism. Subscribe today »]
In the nineteen-nineties, questions about A.I. and safety helped drive Etzioni to begin studying common sense. In 1994, he co-authored a paper attempting to formalize the “first law of robotics”—a fictional rule in the sci-fi novels of Isaac Asimov that states that “a robot may not injure a human being or, through inaction, allow a human being to come to harm.” The problem, he found, was that computers have no notion of harm. That sort of understanding would require a broad and basic comprehension of a person’s needs, values, and priorities; without it, mistakes are nearly inevitable. In 2003, the philosopher Nick Bostrom imagined an A.I. program tasked with maximizing paper-clip production; it realizes that people might turn it off and so does away with them in order to complete its mission.
Bostrom’s paper-clip A.I. lacks moral common sense—it might tell itself that messy, unclipped documents are a form of harm. But perceptual common sense is also a challenge. In recent years, computer scientists have begun cataloguing examples of “adversarial” inputs—small changes to the world that confuse computers trying to navigate it. In one study, the strategic placement of a few small stickers on a stop sign made a computer vision system see it as a speed-limit sign. In another study, subtly changing the pattern on a 3-D-printed turtle made an A.I. computer program see it as a rifle. A.I. with common sense wouldn’t be so easily perplexed—it would know that rifles don’t have four legs and a shell.
Choi, who teaches at the University of Washington and works with the Allen Institute, told me that, in the nineteen-seventies and eighties, A.I. researchers thought that they were close to programming common sense into computers. “But then they realized ‘Oh, that’s just too hard,’ ” she said; they turned to “easier” problems, such as object recognition and language translation, instead. Today the picture looks different. Many A.I. systems, such as driverless cars, may soon be working regularly alongside us in the real world; this makes the need for artificial common sense more acute. And common sense may also be more attainable. Computers are getting better at learning for themselves, and researchers are learning to feed them the right kinds of data. A.I. may soon be covering more corners.
How do human beings acquire common sense? The short answer is that we’re multifaceted learners. We try things out and observe the results, read books and listen to instructions, absorb silently and reason on our own. We fall on our faces and watch others make mistakes. A.I. systems, by contrast, aren’t as well-rounded. They tend to follow one route at the exclusion of all others.
Early researchers followed the explicit-instructions route. In 1984, a computer scientist named Doug Lenat began building Cyc, a kind of encyclopedia of common sense based on axioms, or rules, that explain how the world works. One axiom might hold that owning something means owning its parts; another might describe how hard things can damage soft things; a third might explain that flesh is softer than metal. Combine the axioms and you come to common-sense conclusions: if the bumper of your driverless car hits someone’s leg, you’re responsible for the hurt. “It’s basically representing and reasoning in real time with complicated nested-modal expressions,” Lenat told me. Cycorp, the company that owns Cyc, is still a going concern, and hundreds of logicians have spent decades inputting tens of millions of axioms into the system; the firm’s products are shrouded in secrecy, but Stephen DeAngelis, the C.E.O. of Enterra Solutions, which advises manufacturing and retail companies, told me that its software can be powerful. He offered a culinary example: Cyc, he said, possesses enough common-sense knowledge about the “flavor profiles” of various fruits and vegetables to reason that, even though a tomato is a fruit, it shouldn’t go into a fruit salad.
Academics tend to see Cyc’s approach as outmoded and labor-intensive; they doubt that the nuances of common sense can be captured through axioms. Instead, they focus on machine learning, the technology behind Siri, Alexa, Google Translate, and other services, which works by detecting patterns in vast amounts of data. Instead of reading an instruction manual, machine-learning systems analyze the library. In 2020, the research lab OpenAI revealed a machine-learning algorithm called GPT-3; it looked at text from the World Wide Web and discovered linguistic patterns that allowed it to produce plausibly human writing from scratch. GPT-3’s mimicry is stunning in some ways, but it’s underwhelming in others. The system can still produce strange statements: for example, “It takes two rainbows to jump from Hawaii to seventeen.” If GPT-3 had common sense, it would know that rainbows aren’t units of time and that seventeen is not a place.
Choi’s team is trying to use language models like GPT-3 as stepping stones to common sense. In one line of research, they asked GPT-3 to generate millions of plausible, common-sense statements describing causes, effects, and intentions—for example, “Before Lindsay gets a job offer, Lindsay has to apply.” They then asked a second machine-learning system to analyze a filtered set of those statements, with an eye to completing fill-in-the-blank questions. (“Alex makes Chris wait. Alex is seen as . . .”) Human evaluators found that the completed sentences produced by the system were commonsensical eighty-eight per cent of the time—a marked improvement over GPT-3, which was only seventy-three-per-cent commonsensical.
Choi’s lab has done something similar with short videos. She and her collaborators first created a database of millions of captioned clips, then asked a machine-learning system to analyze them. Meanwhile, online crowdworkers—Internet users who perform tasks for pay—composed multiple-choice questions about still frames taken from a second set of clips, which the A.I. had never seen, and multiple-choice questions asking for justifications to the answer. A typical frame, taken from the movie “Swingers,” shows a waitress delivering pancakes to three men in a diner, with one of the men pointing at another. In response to the question “Why is [person4] pointing at [person1]?,” the system said that the pointing man was “telling [person3] that [person1] ordered the pancakes.” Asked to explain its answer, the program said that “[person3] is delivering food to the table, and she might not know whose order is whose.” The A.I. answered the questions in a commonsense way seventy-two per cent of the time, compared with eighty-six per cent for humans. Such systems are impressive—they seem to have enough common sense to understand everyday situations in terms of physics, cause and effect, and even psychology. It’s as though they know that people eat pancakes in diners, that each diner has a different order, and that pointing is a way of delivering information.