Statistical Method Identified to Detect Fake or Computer-Generated Text
Researchers at Harvard John A
Paulson School of Engineering and Applied Sciences (SEAS) and IBM have
developed a statistical method to detect computer-generated or fake text from
human generated text.
Researchers Sebastian Gehrmann and Hendrik Strobert(IBM) found that natural-language generators are trained on tens of millions of online texts and mimic human language by predicting the words that most often follow one another. For eg I followed by 'have' and 'am'. Using this idea they developed a method that identifies predictable text instead of flagging errors. Gehrmann and Strobelt’s method, known as GLTR, is based on a model trained on 45 million texts from websites — the public version of the OpenAI model, GPT-2. Because it uses GPT-2 to detect generated text, GLTR works best against GPT-2, but it does well against other models, too.