Friday, December 17, 2010

New Study Using Google Books

The Boston Globe reports on a fascinating cooperative effort where GoogleBooks has created a new tool, the Google Books Ngram Viewer which allows a researcher to sift through the materials scanned into the Google Books project, and automatically calculate the frequency of a word and watch it change over time. You can then compare the changing frequency of different words across the decades or centuries.

It can be very interesting. The link above demonstrates at Google Labs with "Atlantis" and "El Dorado." But perhaps meatier questions (ha, ha) are raised by the examples in the Globe article. The online article reproduces what I saw in my print paper, and you can see it better online. They looked at changing frequencies of appearances of food terms: sausage, ice cream, hamburger, steak, pizza, pasta, and sushi. You can imagine that in English language publications, instances of pizza and sushi in particular, and pasta, a bit, have really only begun appearing since their popularization by returning World War II veterans. Increasing acceptance of ground beef, improved food inspection perhaps as well, and certainly the rise of fast food chains have increased the frequency of "hamburger."

They also tested the changing terms for types of influenzas. They looked at the frequency of the use of the word "God." This last in particular, allows the reporter to explain that this new tool is merely that. It is a new addition to the scholar's tool chest. It does not take the place of the scholar. The scholar eventually will have to sit down and read at least a portion of the literature. It makes a great difference if the appearance of "God" is in a prayer or an ejaculation or a discussion of theology. So the graphs carry a certain amount of meaning, but to really understand WHAT it means, the scholar still needs to visit the literature.

It's tempting to play with the data, but you really need to download a whole bunch of tiny files to begin. You have to be dedicated to this.

No comments: