As soon as Tom Smith got his hands on Codex — a new artificial intelligence technology that writes its own computer programs — he gave it a job interview.
He asked if it could tackle the “coding challenges” that programmers often face when interviewing for big-money jobs at Silicon Valley companies like Google and Facebook. Could it write a program that replaces all the spaces in a sentence with dashes? Even better, could it write one that identifies invalid ZIP codes?
It did both instantly, before completing several other tasks. “These are problems that would be tough for a lot of humans to solve, myself included, and it would type out the response in two seconds,” said Smith, a seasoned programmer who oversees an AI startup called Gado Images. “It was spooky to watch.”
Codex seemed like a technology that would soon replace human workers. As Smith continued testing the system, he realized that its skills extended well beyond a knack for answering canned interview questions. It could even translate from one programming language to another.
Yet after several weeks working with this new technology, Smith believes it poses no threat to professional coders. In fact, like many other experts, he sees it as a tool that will end up boosting human productivity. It may even help a whole new generation of people learn the art of computers, by showing them how to write simple pieces of code, almost like a personal tutor.
“This is a tool that can make a coder’s life a lot easier,” Smith said.
Codex, built by OpenAI, one of the world’s most ambitious research labs, provides insight into the state of artificial intelligence. Although a wide range of AI technologies has improved by leaps and bounds over the past decade, even the most impressive systems have ended up complementing human workers rather than replacing them.
Thanks to the rapid rise of a mathematical system called a neural network, machines can now learn certain skills by analyzing vast amounts of data. By analyzing thousands of cat photos, for example, they can learn to recognize a cat.
This is the technology that recognizes the commands you speak into your iPhone, translates between languages on services like Skype and identifies pedestrians and street signs as self-driving cars speed down the road.
About four years ago, researchers at labs like OpenAI started designing neural networks that analyzed enormous amounts of prose, including thousands of digital books, Wikipedia articles and all sorts of other text posted to the internet.
By pinpointing patterns in all that text, the networks learned to predict the next word in a sequence. When someone typed a few words into these “universal language models,” they could complete the thought with entire paragraphs. In this way, one system — an OpenAI creation called GPT-3 — could write its own Twitter posts, speeches, poetry and news articles.
Much to the surprise of even the researchers who built the system, it could even write its own computer programs, though they were short and simple. Apparently, it had learned from an untold number of programs posted to the internet. So OpenAI went a step further, training a new system — Codex — on an enormous array of both prose and code.
The result is a system that understands both prose and code — to a point. You can ask, in plain English, for snow falling on a black background, and it will give you code that creates a virtual snowstorm. If you ask for a blue bouncing ball, it will give you that, too.
“You can tell it to do something, and it will do it,” said Ania Kubow, another programmer who has used the technology.
Codex can generate programs in 12 computer languages and even translate between them. But it often makes mistakes, and though its skills are impressive, it cannot reason like a human. It can recognize or mimic what it has seen in the past, but it is not nimble enough to think on its own.
Sometimes, the programs generated by Codex do not run. Or they contain security flaws. Or they come nowhere close to what you want them to do. OpenAI estimates that Codex produces the right code 37% of the time.
When Smith used the system as part of a “beta” test program this summer, the code it produced was impressive. But sometimes, it worked only if he made a tiny change, like tweaking a command to suit his particular software setup or adding a digital code needed for access to the internet service it was trying to query.
In other words, Codex was truly useful only to an experienced programmer.
But it could help programmers do their everyday work a lot faster. It could help them find the basic building blocks they needed or point them toward new ideas. Using the technology, GitHub, a popular online service for programmers, now offers Copilot, a tool that suggests your next line of code, much the way “autocomplete” tools suggest the next word when you type texts or emails.
“It is a way of getting code written without having to write as much code,” said Jeremy Howard, who founded the artificial intelligence lab Fast.ai and helped create the language technology that OpenAI’s work is based on. “It is not always correct, but it is just close enough.”
Howard and others believe Codex could also help novices learn to code. It is particularly good at generating simple programs from brief English descriptions. And it works in the other direction, too, by explaining complex code in plain English. Some, including Joel Hellermark, an entrepreneur in Sweden, are already trying to transform the system into a teaching tool.
“We thought these tools were going to completely remove the need for humans, but what we learned after many years was that this wasn’t really possible; you still needed a skilled human to review the output,” Smith said. “The technology gets things wrong. And it can be biased. You still need a person to review what it has done and decide what is good and what is not.”
Codex extends what a machine can do, but it is another indication that the technology works best with humans at the controls.