A recent survey performed by Princeton senior Sauhard Sahi under the supervision of Edward Felten attempted to discover the types of files available on BitTorrent, a popular file distribution website. According to the study, 10 percent of the shared files contained music:
“For the music category, the predominant encoding format for music was MP3, there were some albums ripped to WMA (Windows Media Audio, a Microsoft codec), and there were also ISO images and multi-part RAR archives. There is still a bias towards recent albums and songs, but it is not as strongly evident as it is for movies—perhaps because people are more willing to continue seeding music even after it is no longer new, so these torrents are able to stay alive longer in the DHT. In descending order, we found that 78% of music torrents in our sample were in English, 6% were in Russian, 4% were in Spanish, 2% were in Japanese and Chinese each, and other infrequent languages appeared 1% each.”
The survey analyzed a uniform, random sample of files “via the trackerless variant of BitTorrent, using the Mainline DHT.” The files, totaling 1021, were then organized based on their file type, language, and apparent copyright status. Before providing their results, Sahi and Felten also explain that their results only apply to the Mainline trackerless version of BitTorrent (admitting that “other parts of the BitTorrent ecosystem might be different”) and that “all files that were available were equally likely to appear in the sample.”
Depending on your personal experience with BitTorrent, the survey’s results may or may surprise you. With regard to file types, Sahi discovered that non-pornographic movies and shows comprised 46% of the sample, games and software 14%, pornography 14%, books and guides 1%, and images 1%, leaving only 10% of the files as music files followed by an unclassifiable 14%. Of these categories, each classification did not necessarily have a dominant file format, although each group was dominated by files in English.
Each file was also analyzed based on apparent copyright status and was classified as “likely non-infringing” if it “appeared to be in the public domain, [was] freely available through legitimate channels, [and was] user-generated content,” an admitted judgment call. Based on this definition, Sahi and Felten concluded that only 10 of the files in the entire sample were “likely non-infringing,” meaning that roughly 99% of the entire sample was illegal with each of the total 98 music files classified as “likely infringing.”
Although Sahi and Felten deserve to be commended for their work, the study either omits or ignores a number of important factors. Because the survey focused only on the trackerless version of BitTorrent, it’s likely that the users are knowingly distributing copyright-infringing files, so the percentage of “likely infringing” files is not very surprising. The post, including only a summary of the survey’s results, also fails to provide an accurate time frame or a geographic focus as well as specifics regarding the surveying process, all of which could be very helpful in further analyzing the available data.
Furthermore, although this survey (well, at least the summary) provides a decent amount of information about file availability on BitTorrent, it should not be considered representative of torrents or “likely-infringing” material as it offers a very limited scope of information and fails to mention torrenting websites completely dedicated to a specific file type (all music, all movies, etc).