The US Navy scientists want to build a global social media archive containing 350 billion digital data as part of ongoing research efforts at the Monterey Naval Postgraduate School, CA, through the Defense Department and the Naval Postgraduate School.
As detailed in his summary, the military research project aims to “providing an improved understanding of the fundamental social dynamics, shaping linguistic communities ‘ evolution, and new ways of collective expressing themselves throughout the country, over time, and throughout the country.”
In addition, the archive “should include messages written in a minimum of 60 languages, with at least 50% of messages in non-English language” but, as the project summary also says, the information collected “must be composed solely of publicly available information.”
The remaining minimum requirements for the archive’s 350 billion records include:
Each record in the archive must provide the full text of a social media post, unaltered from its original content and formatting, with all publicly available meta-data, including country, language, hashtags, location, handle, timestamp, and URLs, that were associated with the original posting.
All records must include the time and date at which each message was sent and the public user handle associated with the message.
Approximate location information, providing self-reported user hometowns, or other publicly available geo-location information, must be included for at least 20% of the records
In a summary of the research project, the data is also used for pedagogic purposes to provide “students with new opportunities for their theses and the development of” large data “analytical skills.”
The military research team wishes to “acquire a large global archive of social media data which offers the complete text of all social media publications across all countries and the lanes of the country.” The main researcher for the project, Camber Warren, told Bloomberg.