Researchers, scared by their own work, hold back 'deepfakes for text - [technology]
05:19 PM EST - Feb,15 2019 - post a comment
OpenAI, a non-profit research company investigating "the path to safe artificial intelligence," has developed a machine learning system called Generative Pre-trained Transformer-2 (GPT-2 ), capable of generating text based on brief writing prompts. The result comes so close to mimicking human writing that it could potentially be used for "deepfake" content. Built based on 40 gigabytes of text retrieved from sources on the Internet (including "all outbound links from Reddit, a social media platform, which received at least 3 karma"), GPT-2 generates plausible "news" stories and other text that match the style and content of a brief text prompt.
The performance of the system was so disconcerting, now the researchers are only releasing a reduced version of GPT-2 based on a much smaller text corpus. In a blog post on the project and this decision, researchers Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever wrote:
Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights. Nearly a year ago we wrote in the OpenAI Charter: "we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research," and we see this current work as potentially representing the early beginnings of such concerns, which we expect may grow over time. This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas.