The fakebox doesn’t detect fake news, it detects articles which are factual/real and everything else is labeled as “fake”.
Where’s the dataset? How did you verify the ground truth? Where are the annotation/labeling guidelines?
What’s the definition of factual/real articles? The dataset appears to be created by the author - which isn’t necessarily wrong but to paraphrase Karl Popper (in the context of human knowledge and scientific endeavors):
There are no ‘pure’ facts available; all observations are functions of subjective factors such as interests, expectations, wishes etc.
This article is the 5%.
A more accurate way of detecting "fake news" would be interesting, but I fail to see how such a thing could be designed, past simple detection of wishy-washy and avoidant word patterns.
Accuracy is not a sufficient measure of a classifier. Better to report precision and recall, or any number of other combination measures.
The OP does not say the label distribution of the training data; it's entirely likely that the split is not balanced 50/50, which would make "95% accuracy" as an indicator of quality misleading.
This is one of the reasons why I recommend that Medium thought pieces disclose their data and code instead of just saying "I did AI magic!" to sell a product (and they do charge for their product on their website).
> I found myself drifting in my own interpretation of fake news, getting angry as I came across articles that I didn’t agree with, fighting hard against the urge to only pick ones I thought were right. What was right or wrong anyway?
A good question and I'm not surprised he went a bit crazy.
> The problem of truth is in a way easy to state: what truths are, and what (if anything) makes them true. But this simple statement masks a great deal of controversy. Whether there is a metaphysical problem of truth at all, and if there is, what kind of theory might address it, are all standing issues in the theory of truth. We will see a number of distinct ways of answering these questions.
He’s not detecting fake news. He’s detecting articles that match the writing style of a handful of publications and labeling everything else “fake”.
The model is not based on domains. Just the text of the article. Can confirm there was an even number of real and notreal news examples. Data set was eventually broken into two categories; written with bias, or without bias. For example, a NYT Opinion piece was considered notreal news.
Where's the demo?
He needs to release/train at least 3 versions with whitelist-blacklist variations for rt, al jazeera and fox news.
The author describes a "fake news detector AI", that is actually a "typically legitimate source of news" data model, combined with a fake news domain blacklist. It doesn't detect fake news. It detects whether a story possibly came from a source you find to typically be legitimate.
This article is fake news.