Bess Won't Go There:

N2H2's Weak AI

by Jonathan Wallace jw@bway.net

Second of a series

September 1, 2001

EFF Pioneer Award winner Seth Finkelstein provided valuable information and research assistance in the preparation of this article

In the first article in this series profiling censorware vendor N2H2 Inc. of Seattle, I concluded that the company, which is unprofitable, running out of cash and facing NASDAQ delisting, must leverage its technology to survive. N2H2 is a tech company, and it has nothing else to sell.

How good is N2H2's technology? On N2H2's recently revamped web-site, the company states:

Using sophisticated Artificial Intelligence technology and expert human review, N2H2 harvests suspect URLs, analyzes and categorizes the content, then distributes daily list updates fast and efficiently to customers and partners around the globe.

There are several other references on the N2H2 site to AI, including a claim of world class artificial intelligence. In its most recent revamp of the site, which occurred the third week of August 2001, the company eliminated other references, including mentions of robust and state of the art AI technology.

What do these claims of "sophisticated", "state of the art", "world class" and "robust" AI really mean? These sound like classic marketing puffery, but are they possibly true?

I asked Michael Pazzani, the CEO of Adaptive Info and a professor (on leave) of computer science at the University of California, Irvine, how he would build "artificial intelligence" software for evaluating content. Dr. Pazzani's company is creating a product for the wireless web which "automatically and effortlessly prioritizes information displayed for each individual user." Dr. Pazzani's academic studies have also closely focused on the problem of creating software that can evaluate content; he sent me a copy of a paper he had done on creating intelligent agents to prioritize incoming email, and to reject not just spam, but ordinary email on topics of little interest.

Dr. Pazzani said that he would solve the problem by writing software to analyze the occurrence of keywords in text and their relationships to one another. For example, the software might detect that an essay about tolerance towards homosexuals was not porn, because the word "compassion" is unlikely to occur in pornographic text.

Dr. Pazzani acknowledged that his software would make mistakes. You would calibrate the program, he said, so as to inflict collateral damage (my phrase, not his) in one direction or another, either by allowing too much "evil" content to slip through, or by blocking too much innocuous content.

Armed with the information provided by Dr. Pazzani, I wrote to N2H2's PR manager, and its CFO:

I would like to know the extent to which the software analyzes the statistical occurrence of particular words alone and in relationship to each other. In other words, does it contain rules along the lines of "If tolerance occurs several times within a few words of homosexuality, the page is probably not porn". Also, does it analyze the links to other sites and which sites link to the page under analysis.

No response. I turned to Google, where I found a cached N2H2 page, recently deleted from the company's site, which seemed to indicate that the N2H2 software incorporates at least some of the approaches described by Dr. Pazzani:

Artificial Intelligence: we have built an AI engine that uses a complex combination of search parasites, "spambait", spiders that crawl the web using a "birds of a feather" technique, user logs and industry-leading web databases such as those from Inktomi.

According to an IBM white paper, "search parasites" weight the occurrences of words to determine whether content meets the criteria set by the searcher. The reference to "spambait" was more puzzling, but suggests that N2H2 deliberately signs up for spam from pornographers, and then reviews the sites the email touts. "Birds of a feather" indicates that the company checks links to and from other sites, just as Dr. Pazzani recommended. Use of "user logs" may mean that the company checks the usage stats of its claimed 16 million users, looking for porn sites they hit which are not already blocked by its software. And the company uses search engines and web directories, which have already done some of the characterization for them.

What does it mean to call this "AI"? I took a look at the "state of the art" in the industry to see if N2H2 was really in the forefront, as it claimed.

The American Heritage Dictionary of the English Language, 4th Edition defines "artificial intelligence" as follows:

artificial intelligence n. Abbr. AI The ability of a computer or other machine to perform those activities that are normally thought to require intelligence. The branch of computer science concerned with the development of machines having this ability.

The academic study of AI, where computer scientists, neuro-biologists, and philosophers meet, is broken down into two subcategories, weak and strong AI. "Strong AI" is portrayed in movies like Steven Spielberg's recent movie, AI, Blade Runner, and 2001: machines that really think like a human. "Weak AI", to put it pejoratively, deals with machines that fake human thought processes more or less successfully. No-one has yet created a working version of the HAL 2000 computer. Commercial versions of artificial intelligence today without exception fall into the weak category.

A famous and successful example of "weak AI" is IBM's "Deep Blue" chess computer, which has beaten some human chess masters. Deep Blue has been described as the "brute force" approach to AI: it calculates 60,000,000,000 moves in 3 minutes, but

. Despite being huge and costing probably a $100,000,000 or more, the computer is not smart enough to know who it is playing, where the chess board is, how to move a piece, what the pieces look like or how to to tell that the other player just cheated. And don't even consider asking the computer to do anything else like play tic-tac-toe, figure out how to walk out of the room (assuming it had eyes and legs and was given a map of the room) or how to answer the question "Is the light on?".

(Source: Baylor Wetzel, Artificial Intelligence for Humans)

N2H2 claims that its software aids in understanding and categorizing human-generated content of almost every type, from the sociological and scientific to the merely prurient. Such a claim should not be lightly made. N2H2 does not offer a white paper explaining in broad strokes the company's use of AI technology (here are those of some other companies for comparison). The N2H2 site, which merely recites "robust", "sophisticated", "state of the art" like a marketing litany, never mentions what type of AI it uses. There are several choices. Denis Susac, the guide to the AI interest area on About.com, who hadn't heard of N2H2 before, commented:

Categorization & classification of Web content is one of the most complex AI problems recently tackled by several techniques: self-organizing (kohonen) neural networks, artificial life, knowledge based approach using expert systems...

In fact, the understanding of colloquial or everyday human content by a computer--often referred to as "natural language processing"-- --is one of the most difficult problems that "strong AI" has yet to solve. For example, the phrase "Time flies like an arrow" (even without appending "fruit flies like a banana" ) is incredibly confusing to a computer lacking a human common-sense frame of reference. "It is impossible to understand the sentence about time (or even to understand that the sentence is indeed talking about time and not flies) without mastery of the knowledge structures that represent what we know about time, flies, arrows, and how these concepts relate to one another." (Raymond Kurzweil).

John Searle invented the Chinese room problem, which vividly describes the way weak AI attempts to work around the impossibility of understanding semantically complex natural language. Searle's neat little parable postulates a non-Chinese speaking human, inside a closed room, given a set of rules explaining how to join together Chinese characters. By following the rules and passing the results out of the room, Searles's human would create valid Chinese sentences--but have no idea what he was saying. Weak AI utilizes ignorant rule-sets like Searles' rules for joining Chinese characters, or Dr. Pazzani's for evaluating text.

In a 1990 essay entitled Common Knowledge or Superior Ignorance, Christopher Locke of the Carnegie Mellon Robotics Institute doubted whether AI would be able to analyze language in common sense terms any time soon. "The closer we look at linguistic sense," Locke said, "the less we find that is unequivocally common."

I emailed Locke asking whether he felt any differently eleven years later. He replied that the proposition that "over-hyped but disembodied 'AI software' is capable of making even rudimentary determinations of meaning-in-context is laughable.... or it would be laughable were it not for the fact that so many people blindly accept such groundless claims."

N2H2 claims that its censorware can distinguish "pornography" from other categories of human content, yet determining what is "porn" is an extremely fuzzy area of human decision-making admitting of much disagreement. In fact, the ambiguity of the word "pornography" is rooted in its dictionary definition:

por·nog·ra·phy (pôr-ngr-f) n. Sexually explicit pictures, writing, or other material whose primary purpose is to cause sexual arousal....

Lurid or sensational material: "Recent novels about the Holocaust have kept Hitler well offstage [so as] to avoid the... pornography of the era" (Morris Dickstein).

From The American Heritage Dictionary of the English Language, Fourth Edition.

I decided to see how people who fight porn define it. The Men Against Pornography think the Sports Illustrated swimsuit issue is porn, and the Concerned Women for America think the same of the Abercrombie and Fitch catalog.

It gets even worse. The recently-enacted Children's Internet Protection Act forces public libraries receiving federal E-rate funding to use censorware to block content which is legally "harmful to minors" or "obscene" as to adults. (Yes, the mis-titled act requires blocking of content seen by adult users, not just children.) These are complex, subjective legal standards. N2H2 has successfully marketed its software to libraries affected by the act. In the library, N2H2's AI purports to make determinations which the law delegates only to judges.

Said Chris Hansen of the American Civil Liberties Union, who is challenging CIPA's constitutionality in federal court in Philadelphia:

Because decisions about obscenity are subjective (and can differ in different parts of the country), the Supreme Court insists that judges make the decisions only after a full trial.

Speech, said, Hansen, is not obscene until a judge has held it to be. "Of course," censorware companies "do not have judges review their decisions either before or after they are made."

First amendment attorney Jim Tyre, a co-founder of The Censorware Project, agrees:

Applying legal standards such as the definitions of obscenity or harmful to minors involves complex, and often subjective, human decisions which are difficult to make even by the most experienced jurists. Words and pictures must be looked at in context, a context which no software has the ability to provide.

The proof of the pudding is of course in the eating. N2H2 may refuse to answer questions about the workings of its software, but anyone can get a sense of how "robust" or "state of the art" the software is by seeing how well (or poorly) it works.

A good test for finely tuned AI software would be: can it distinguish between sexual speech, and speech about sexuality? Is N2H2's software precise enough that it will block a prurient, explicit story, while letting through an academic essay on sexual mores in America? After all, you wouldn't want to block the latter in a library, even by accident.

Earlier in this article, I quoted Dr. Pazzani as giving an example of software detecting that a site is not porn because of the use of the word "compassion". He didn't make up that example. He was looking at an essay of mine entitled Congressman Dornan's Perversion, which says:

Congressman Robert Dornan of Orange County is a perverted human being. He is twisted up and hateful inside....He believes that all HIV victims are homosexuals, and that all homosexuals deserve punishment (he once said that "every lesbian spear-chucker" is against him).
Dr. Pazzani, looking at this essay, noted the word "compassion" and said that he would use it as an indicator that the essay was not porn.

Read the essay. There are no sex acts or organs, no four letter words, just a few occurrences of "homosexual" and "lesbian" and "perversion". But N2H2 blocks the page as pornography. A program that identifies that essay as porn is very weak AI.

This is not a freak accident. N2H2 blocks numerous pages as porn that are not. Later in the series, I'll give many more examples. But here is a second one that was entertaining to find.

In 1999, the Censorware Project published a report on use of the Smartfilter censorware in Utah libraries and schools. Pro-filter activist David Burt immediately counter-attacked, examining all the sites which the Censorware Project had maintained were inappropriately blocked, and claiming that most of them in fact were rightfully blacklisted. He posted a page analyzing his findings:

http://offspring.com/lyrics/: "These songs have ideas PLUS drugs, sex and ass-kicking"

The page contains numerous references like "Hardcore Amateur Lesbian Teens", but not one four letter word, no Anglo-Saxon phrases for sexual organs, no descriptions of sex acts. Yet the N2H2 software categorizes the page as porn.

David Burt, former profiltering activist, is now N2H2's public relations manager.

AI or hype? "Simply adding advertising claims proclaiming 'Now with Artificial Intelligence' doesn't make the software in the box or the computer any smarter." --Dick Ainsworth.

Seth Finkelstein knows the workings of censorware as well as anyone; he received the EFF Pioneer award this year for his researches into the way these products operate, and what they actually block. "Computers are dumb," Seth said. "They do not understand context. If by some chance, N2H2 had made a Nobel prize level advance in computer science, it would not be a company in danger of going bankrupt. Many entries on N2H2's censorware blacklist show a simple-minded word searching, which is often peddled as AI snake-oil."

With all of its "search parasites" and "birds of a feather", N2H2's software cannot tell the difference between Hustler Magazine and a Supreme Court opinion. This incredibly weak AI should not be making decisions about what material is fit to be viewed in public libraries.

In fact, what role is the N2H2 software really playing? Is it simply categorizing sites for human reviewers to look at--or is it adding sites directly to the blacklist, without human intervention?

Anti-censorware campaigners have always believed that all the products blacklist some sites without human review. The Censorware Project specializes in reporting ludicrous examples: pages about teen soccer, Liza Minnelli, the Quakers, all blocked as porn by various products. In these cases, it is inconceivable that a human looked at these sites and (even granting the wide variety of possible human opinions) concluded they were porn.

Human review of all blacklisted sites is impractical because the web is too large. In February 1999, a study published in Nature Magazine concluded that the web contained 800 million pages. Today, Google claims to search 1,387,529,000 pages--and its doubtful it searches the entire net. Imagine if Google wouldn't give you access to a web page until a human reviewer had looked at it. At its height, N2H2 had just 125 people reviewng the web; only 40 were full-time. N2H2, like all other censorware companies, is under tremendous pressure to short-cut the human review process just to keep up with the sheer size and explosive growth of the web.

According to sources close to the company, the human review team has been slashed in N2H2's recent lay-offs, from 40 full-time people to ten. As a result, the company is thought to be leaning much harder on the AI.

N2H2 used to claim "100% human review" of sites added to the blacklist. But every use of this phrase has been deleted from the N2H2 site. Today, the company refers only to its "unique combination of powerful technology and expert human review"--a phrase which carefully does not promise anything.

I emailed David Burt, the P.R. manager, and Paul Quinn, the C.F.O:

I believe that your business process involves the software prioritizing some sites for human review, while adding others directly to the blocked list.

I wanted to give you the opportunity to correct this if inaccurate. Conversely, if I don't hear from you, I will assume that the statement in the preceding paragraph is correct.

No response.

Librarians go to school to learn how to categorize and make other decisions about content. Its really quite remarkable that they would then step back and let the N2H2 software do their job for them, or that the congressmen who passed the Children's Internet Protection Act would think that computers could do a librarian's task. Why have so many otherwise intelligent people bought the idea that N2H2's software can play a role that is far beyond the "state of the art" in AI? I'll let Christopher Locke have the last word:

The alacrity with which these non-[computer]professionals abdicate any role in a debate which will certainly affect their lives is somewhat frightening. The cause for alarm here is the attribution of near-omniscience and a sort of universal compassion to technology.

Part I of this series was on N2H2's financial condition.

Next installment: The N2H2 human review team