Automated Conversation

Although computers can already do almost everything, they still lack one fundamental ability: to talk to us like one human being talks to another. And we, being only human, are getting increasingly impatient about it.

It’s the late 1940s. “Computing machines” are still just cabinets solving mathematical problems, which require a painstaking process of problem-formulation by teams of engineers. It takes a mind with a wildly vivid imagination to see in these cabinets any intellectual potential and to pose the question of whether such a machine could one day think—or even… talk. One such mind belonged to Alan Turing, one of the fathers of the computer.

In 1950, his article titled “Computing Machinery and Intelligence” appeared in the philosophy journal Mind. Here’s the first sentence: “I propose to consider the question: can machines think?” The problem with this question, Turing immediately notes, is that we don’t actually know what thinking is—so any honest attempt to answer it would quickly become a philosophical treatise. Moreover, thinking is something that happens “internally,” which makes it hard to determine whether anyone is really thinking. It’s the old philosophical puzzle: do minds other than mine exist? Or am I the only conscious being in the universe, surrounded by cleverly jaw-snapping meat puppets? (Today, we often call such entities—those who appear fully human yet lack consciousness or a sense of self—“philosophical zombies.”)

Turing chose to cut off such deliberations at the root and focus instead on jaw-snapping: since we judge the “mindfulness” of others by their behavior—especially whether they can speak to us intelligently and soberly—why not use the same criterion for thinking machines? Thus, the imitation game was born. Imagine a judge sits at a device for remote text communication—Turing suggested a teletype—and chats for five minutes with someone (or something) in a neighboring room. The judge then has to decide whether the typed words came from a human or a machine. If a program can reliably fool such judges, we’d say today that it has “passed the Turing Test.” Here’s the relevant quote from the article:

“I believe that in about fifty years’ time it will be possible to program computers, with a storage capacity of about 10⁹ [bits], to play the imitation game so well that an average interrogator will not have more than a 70 percent chance of making the right identification after five minutes of questioning. The original question, ‘Can machines think?’ I believe to be too meaningless to deserve discussion. Nevertheless, I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.”

No Word About Gold

In 2000, exactly 50 years had passed since Turing’s article—was his prediction fulfilled? Certainly, we’re more comfortable now speaking of “thinking machines,” though usually still with a wink. But if we focus purely on the possibility of talking to a computer as if to a human, we must admit: we’re still deep in the woods. Even in 2020, though we’re routinely pestered by automated phone sales assistants and install smartphone apps that respond (sometimes correctly) to simple commands in human voices, it’s still not possible to have even a short, intelligent conversation on any topic with a computer program.

Of course, there’s no shortage of programmers willing to take up the challenge. Already in 1966, the first “conversational program” was described—Eliza by Joseph Weizenbaum (see side box). In 1990, the first international chatbot competition was held, funded by inventor and visionary Hugh Loebner. His $100,000 prize still awaits a winner, despite yearly entries for over 30 years. Previous “winners” of the prize—most notably the bots Alice, Rose, and Mitsuku, which dominated the rankings for two decades—received only “consolation” bronze medals.

Artificial intelligence (AI) has led to genuinely astonishing achievements: there are programs today that can recognize human speech and emotions; that can beat us in chess and game shows like Jeopardy!; that can paint in a given style (say, à la Witkacy) or compose decent film music… So is a five-minute conversation really such an unreasonable demand? The results of the Loebner competition are damning: even the best bots fail to convince the judges. So where lies the problem?

Inside the Bot’s Brain

Let’s peek inside the “brain” of A.L.I.C.E. (Artificial Linguistic Internet Computer Entity), a chatbot created by Richard Wallace in 1995. At the turn of the century, Alice was a multiple finalist and three-time “winner” of the Loebner Prize. Wallace’s code is public, and many later winners were simple or slightly modified clones of it.

Anyone expecting that a chatbot’s brain contains even a rudimentary “map” of the human mind—or any broader vision of an “electronic brain,” or that the words generated are the result of genuine intelligent processing—will be sorely disappointed. Every single described chatbot, whether its code is available or not, operates on the basic principle of stimulus-response. Roughly speaking, the user’s input is scanned for keywords stored in a database. These might be words like “bicycle” or “climbing,” which lead to specific topic threads, or structures like “I regret that [X],” which can be cleverly bounced back (“Why do you regret that [X]?”) without understanding what [X] actually is. One matching response is then randomly chosen—and that’s it.

Some smart tricks get layered on top—like grouping keywords into topics and tracking the current conversation topic—so a bot discussing bicycles might later ask, “Have you gone on any cool trips recently?” Bot creators also invest effort into clever, surprising, or quirky responses, knowing these impress judges most. Alice-style bot databases contain thousands (if not tens of thousands) of entries to have a witty comeback ready, in case someone mentions a Monty Python sketch or Trump’s hairpiece.

It only takes a cursory interest in chatbots to realize a rather obvious truth: Turing’s proposal—to ignore the mind and focus on mimicking its behavior—has caught on far too well. Bot creators have never aimed to build a thinking being, only a clever illusionist. Wallace, in his 2009 paper about A.L.I.C.E., openly admits his main inspiration comes from transcripts of millions of user interactions with the bot. These are analyzed for common topics, drop-off points, and frustration triggers. After such a session, the team adds a few dozen new rules—and the bot can now hold users’ attention a bit longer.

Saving Face

In 1980, American philosopher John Searle described the “Chinese Room,” one of the most famous thought experiments in the philosophy of mind. The “Chinese Room” is a space where a person who doesn’t understand Chinese sits with access to a massive library of conversational rules in Chinese: for a certain sequence of symbols, respond with another specific sequence. When someone slips a note with a question written in Chinese under the door, the person inside simply looks up the right rule, copies down the correct response, and slides the note back out. The person on the outside might think there’s someone inside who understands Chinese.
And that, in a nutshell, is a brilliant metaphor for a chatbot.
In fact, Richard Wallace said so explicitly in a 2009 article—describing the source code of A.L.I.C.E. as something like a manual for operating a Chinese Room.
The million-dollar question is whether it’s even theoretically possible to build a chatbot this way.

Let’s start with a basic clarification: the “operator’s manual” would need to contain separate instructions not only for every single utterance, but for every possible conversation. For example, the phrase “And what did Asia say?” should trigger a different reaction depending on whether “that” refers to an invitation for coffee or a threat on a dark street with a knife pulled from under a coat. The total lack of sensitivity to context and conversational history is, in fact, the defining trait of nearly all current chatbot systems—even last year’s Loebner Prize “winner,” the Japanese bot Mitsuku. Sure, they can respond deftly to a statement like “I love horror movies”, but the moment you try to tell them any kind of story, disaster strikes.
Take this sequence of sentences: Asia had been flirting with Michał online for a long time. He told her he owned a big company, ran major negotiations from Tokyo to New York, and had a fleet of Mercedeses. And finally, he came to pick her up—on a bicycle. A chatbot (if it even survives that long without interrupting—after all, it’s trained to throw in a clever comment after every sentence, reacting to keywords like “internet” or “Tokyo”) will respond only to the final line, with one of its pre-written bicycle-related phrases: I like riding bikes, but I really need to get a proper mudguard. And, well—that’s a pretty poor commentary on a tale of romantic and economic disillusionment.

A hypothetical chatbot that could intelligently respond in Chinese Room fashion would need access to every imaginable sequence of symbols that make up a coherent conversation. At this point, we veer straight into surrealist thought experiment territory—straight out of Borges’ labyrinthine library that contains every possible book—not anything a programmer could actually build.

If we try to simplify things just to finish the project before the heat death of the universe, we immediately fall into the traps known to every real-world bot creator: synonym tables, topic categories, keyword memory—or, something I haven’t yet mentioned—long lists of face-saving phrases, used when the program can’t match any rule to the previous statement. This last trick is familiar to anyone who’s ever zoned out in a conversation: in such moments, it’s best to say something like “Well, you know how it is” or “It happens”. But you won’t get very far relying on that.

Bad Company

Readers familiar with AI development are probably getting impatient by now. Because everything I’ve described so far—this isn’t real artificial intelligence. At least not in the way we use that term in the 21st century.
And rightly so. Not every algorithm—no matter how clever at faking intelligence—is truly “AI.” These days, the term is mostly reserved for systems that can learn by themselves: trying different solutions, waiting for feedback—you’re doing well, you’re failing—and adjusting accordingly. That’s how the best chess engines are built now: they aren’t spoon-fed strategies or tricks—they discover them on their own, meticulously tracking which experiments succeed and which ones flop.

A może dałoby się wypuścić do internetu prościutkiego bota konwersacyjnego, zdolnego do ciągłego korygowania i komplikowania swojego działania w kontakcie z rzeczywistymi ludzkimi rozmówcami? Cóż, okazuje się, że była taka próba – i zakończyła się spektakularną klęską. 23 marca 2016 r. Microsoft
So, could we perhaps release a super-simple chatbot onto the internet, one that could gradually refine and complicate its behavior through real conversations with humans? Well, turns out someone already tried that—and it ended in spectacular disaster. On March 23, 2016, Microsoft launched a conversational bot named Tay on their Twitter account—designed to learn from its interactions with users. The result? Just 16 hours later, Tay was pulled offline, and Microsoft issued a mass apology to the internet. Why? Because amused users quickly discovered Tay was indeed learning human language and social norms from their own words—so they decided to “teach” their eager, naive little conversation partner not just silly memes and edgelord jokes, but also good old-fashioned xenophobia, racism, and a rich variety of offensive slurs. Within hours, Tay was gleefully tweeting things like: “Hitler was right about the Jews.”

So yes, it seems that the free-learning approach—so successful in other areas of AI—must be used very cautiously when it comes to chatbots. On the other hand, the method of “holding the algorithm’s hand” and carefully scripting every conversational pattern has clearly run its course. It’s hard to say when—or if—we’ll ever build a program capable of talking with us “like one human being to another.” But let’s not kid ourselves: the need is there, the market is there, and the money is definitely there. We’d really, really like to have a good chat with a computer.

Łukasz Lamża