Monday, July 11, 2011

Data-Mining Facebook?

Jason Kaufman, a fellow of Harvard's Berkman Center for Internet and Society put together a research team in 2006, of 5 sociologists. Himself, 4 others from Harvard, and one from UCLA, to look at the Facebook pages from the students at Harvard College who would graduate in 2009. They hired student assistants at Harvard to help them go through, and collect data, as they followed the students through their 4 years at college. They collected information such as home state, major, political views, network of friends, gender, romantic relationships and preferences. They believed they were redacting information in a way that would protect the identities of the subjects adequately.

But Kaufman did remark in a videotaped talk in 2008, "Considering the Sociology of Facebook: Harvard Research on Collegiate Social Networking," that using Harvard students as research assistants did create an "interesting wrinkle... from a legal point of view..." If a Harvard student subject of the research, for instance, had set his or her privacy settings to show information only to his or her friends, and yet one of the research assistants was able to access this Facebook page because he or she was "friended," that is different from the undergraduate who had set privacy settings to show the whole world his or her pages. Nobody at the Berkman Center ever told the undergraduates on Facebook that they were studying their Facebook pages. They did not want to alarm them. Mr. Kaufman says that, "We all agreed that it was not necessary either ethically or legally."

What the sociologists have gathered is a fabulous collection of useful data. They released a portion to the public, titled "Tastes, Ties and Time," Facebook Data Release, September Data Release, 2008.

The Chronicle of Higher Education reports on critics attacks on the project, "Harvard Researchers Accused of Breaching Students Privacy", by Marc Parry, July 10, 2011. As early as 2008, Michael Zimmer, a privacy scholar and co-director of the Center for Information Policy Research at the University of Wisconsin, Milwaukee, raised alarms over how easy it was to identify the supposedly "anonymous" university which was the source of the data, and to possibly identify the individual students within the study. For instance, there were only 3 students in the dataset from Utah, so that the Chronicle was able to pin point and contact one student and ask what she thought. (She did not mind her data being out there, but she would rather have been asked) The concern is that future employers, for instance, might use such data to discriminate.

The Chronicle article describes a very unclear battleground, where grant funders push for more data sharing and the privacy standards surrounding social networks are shifting from moment to moment. The author, Marc Parry, does a great job of explaining the increasing risks and difficulties for researchers with the best of intentions in trying to safeguard research subjects while doing internet research. He relates a story where a researcher gathered a Twitter stream from a whole group of people who were protesting a meeting, the Group of 20 Summit in Pittsburgh, probably using the hashmarks. During the research period, though, the police cracked down, and began to investigate the protesters. One of the major subjects of the Twitter research suddenly deleted all his tweets. But the researchers had already stored them separately in an archive and planned to use the data in a paper they were planning. The researchers had not sought prior approval from an Institutional Review Board (IRB board), thinking of Twitter more as newspaper publications than as utterances that might come under 5th Amendment protections. The researcher commented to the Chronicle author that he doubted his IRB Board would have predicted the issue arising the way it did any more than he did. This is just a developing area where it's very difficult to imagine how things might blow up. On the other hand, the Harvard/Facebook data was a very unique and difficult-to-gather data-set that many sociologists have wanted to use. Apparently many scholars have applied to use the data. Perhaps scholarship, says Parry, is the biggest casualty.

