The Unz Review • An Alternative Media Selection$
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 BlogviewJohn Derbyshire Archive
Drilling Through Data
The Numerati, by Stephen Baker
Email This Page to Someone

 Remember My Information



=>

Bookmark Toggle AllToCAdd to LibraryRemove from Library • B
Show CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
AgreeDisagreeThanksLOLTroll
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

The world is buried in data, great banks and drifts of the stuff. In recent years a new technology has emerged: computer programs that will drill through it all to pick out hidden patterns and trends — information that may be useful to marketers, politicians, employers, doctors, match-makers, or national-security analysts. Such programs are extraordinarily sophisticated, and their creators need to be very clever indeed. A doctorate in math or computer science is pretty much required. Stephen Baker calls such whizzes the Numerati. Using “data mining,” they seek out veins of useful ore in the mountains of facts that computers accumulate every day.

In The Numerati, Mr. Baker offers a highly readable and fascinating account of the number-driven world we now live in. He shows us, for instance, how political consultants, mining databases that track consumer and “lifestyle” preferences, sort us into tribes by behavioral proxy. Cat owner? Likely Democrat. NRA member? Probably Republican. Mailings and phone calls can then be targeted more accurately. Health professionals, especially when treating older patients, are now monitoring such things as weight, body temperature and pulse by having a computer follow data streams from sensors on clothing or even from sensor-laden “magic carpets” laid around the house. Disturbing patterns prompt the computer to signal a problem. The Numerati are taking over dating services, too. How do you find that special one in a million? By mining the data of the million. How do you improve your own chances of being found? By the same techniques that companies use to show up first in a Google inquiry — “search engine optimization,” now a flourishing industry.

The Numerati are even mining the output of bloggers, those stream-of-consciousness online diarists and self-promoters. “What makes the blog world especially valuable to marketers,” Mr. Baker writes, is “its unfiltered immediacy.” What do consumers think of your new product? What desires are still not satisfied by products of this kind? You can commission a poll or wait for the sales figures to come in … or you can read the blogs. Better yet, you can hire Numerati to write programs that will read them for you, since there are now more than 20 million blogs in the U.S. alone.

There is active advertising to be done on blogs, too. If you read these things, or write one, you know that Google’s Adsense service will automatically place context-related ads on a blog page, splitting the click-measured revenue with the blogger. So far, so good. But Adsense has set in motion an ugly arms race online as robot bloggers — clever computer programs — have generated hundreds of thousands of spam blogs, or “splogs.”

ORDER IT NOW

A splog, though unreadable, is seeded with words that will attract Google ads. A computer-user may be annoyed at finding himself staring at a screen full of gibberish but click on an ad anyway, allowing the robot blogger to harvest revenue. This sleight of hand has the Numerati hard at work getting their software to distinguish between a blog and a splog. Mr. Baker gives a helpful sketch of the math involved, each blog reduced to a vector in a space of several dozen dimensions.

In Mr. Baker’s chapter on terrorism, we meet Numerati who seek traces of the abnormal and unexpected in their data sets and who must then try to identify the individual “subjects of interest” who are generating those traces. The task of matching abnormal data to actual individuals, though, presents problems — their names, for example. Researching a book about math once, I turned up 32 different Latin-alphabet spellings of the Russian name “Chebyshev.” Arabic, Indian, Chinese and African names present especially daunting challenges. Mr. Baker quotes a Numeratus, a Ph.D. in computational linguistics, who has researched the electronic recognition of names for more than 20 years: “Untangling global names,” he says, “will continue to confound us for generations.”

To make things worse, terrorists themselves are data-savvy and skillful exploiters of the Internet. “Hundreds of Dutch Web Sites Hacked by Islamic Hackers” reads the headline on a technical news site I was just reading. Jihadists may want to take us back to the seventh century, but they are willing to detour through the 21st to get us there. It doesn’t help that our National Security Agency, the proper home of anti-terrorist Numerati, is restricted to hiring U.S. citizens and paying civil-service salaries while their competitors in recruitment — Yahoo, Google, IBM Research — can cast their net world-wide and engage in bidding wars for top talent.

So the Numerati follow the electronic trails that we all now leave behind us as we work, shop, travel, date, trade, or fall sick: What then of our privacy? What if the NSA, having scrutinized my data and determined that I am not a terrorist, sees that I may be cheating on my taxes? Or that I am running for public office while subscribing to a pornography service? Mr. Baker cites Jeff Jonas, a security Numeratus who got his start working for casinos (places also keen to spot “subjects of interest”). “We technologists,” Mr. Jonas warns, “had better spend a little more time thinking about what we’re creating.” Mr. Baker acknowledges that privacy is a problem — we are, after all, the raw material of data mining. Are we also its beneficiaries? He offers a qualified “yes.”

(Republished from The Wall Street Journal by permission of author or representative)
 
• Category: Science • Tags: Review, Statistics 
Current Commenter
says:

Leave a Reply - Comments on articles more than two weeks old will be judged much more strictly on quality and tone


 Remember My InformationWhy?
 Email Replies to my Comment
$
Submitted comments have been licensed to The Unz Review and may be republished elsewhere at the sole discretion of the latter
Commenting Disabled While in Translation Mode
Subscribe to This Comment Thread via RSS Subscribe to All John Derbyshire Comments via RSS