Daniel Brandt helping the Wikipedians Do No Harm
Posted: Mon May 14, 2018 1:42 pm
If you can believe it, Daniel Brandt is back, engaging with the Wikipedians on his favourite topic, how Wikipedia supposedly protects living people from harm. Or more accurately, how he can avoid Wikipedia being a conduit in allowing the world to know anything about him via Google (since he successfully ensured there is no actual Wikipedia result when entering his name directly).
Firstly, he has spotted a flaw in how they make sure Google doesn't scrape the contents of talk pages, namely by ensuring the page includes the code "__NOINDEX__" if deemed appropriate. The bug occurs when talk pages are archived, which apparently drops the code, thereby allowing Google to scrape what it contains. As we know, once that happens, it's out there, forever.
A more effective method might have been to have developed a culture, technology framework and safeguarding system by now where they can be reasonably sure that libel or other harmful content doesn't even make it onto their servers, let alone stick around long enough to be dumped into an internal archive page. Even the busiest of talk pages don't get archived for days (and worryingly those are probably the more likely places where harmful content can be seen, since on Wikipedia talk pages, controversy equates to traffic), so they really have no excuse, other than the excuse they use to explain away all their faults - it's a big website with many pages that anyone can edit, and they're all just volunteers.
Pending the US adopting some effective legislative means to ensure people have a right to be forgotten, Brandt is instead engaging with the Wikipediots through their bug reporting system, as a first attempt to get it fixed.
https://phabricator.wikimedia.org/T194561
He also has higher aspirations, namely to modify Wikipedia's Biographies of Living Person's policy to define content as eligible for immediate removal if the supporting source is dead or is only available via an internet archive. He's suggesting this on the basis that Wikipedia's use of automated technology to rescue all dead links with internet archive urls where they exist, doesn't fully take into account the reasons why a source might become dead (such as due to a retraction). And because the internet archive services are themselves are often simply automatically generated and often never removed even in cases like a retraction, any automated technology used to add their links to Wikipedia, will simply end up perpetuating information that a victim might have previously successfully got taken down in the original source.
Again, the question has to be asked, surely a better system would be to ensure any and all sources are proactively archived by Wikipedia at the point of insertion (which could in turn be integrated with a reliable source whitelist feature), and if the contents of the source changes against their internal copy (or hash, given it doesn't need to be human readable), as determined by a periodic automated review, their system should ensure the source is removed unless or until it can be ascertained the change was for non-harmful reasons, which typically would be the source being updated or moved. Whatever the reason, under their own model, it is unlikely summoning a human eye to it, would be a waste of their time, and in a small number of cases, would actually ensure it is prevented.
Given what is obvious about the Wikipedia cult, namely for reasons of PR, they will do anything to meet their aspiration of Do No Harm, as long as it doesn't interfere with the basic business model. Which, of course, necessarily makes it easy to do harm, and hard to prevent it. Doing anything remotely responsible, undercuts the very thing that made them a success, namely not having the overheads of a traditional encyclopedia, or indeed any kind of media company which doesn't hide behind Section 230. Which cost money. Lots of money.
As such, Daniel may well get the archiving bug fixed for free by the volunteer code monkeys, if it really is as easy as it sounds to fix, but not the BLP change he wants. Or anything remotely like it. He seems to think there is a legal imperative for the WMF to intervene here and force such a change, but I would argue, for all the usual reasons, they'll be more than happy to stand by the position that they are only obliged to act in individual cases which are brought to their attention, and have no jurisdiction over how Wikipedia uses internet archive services as a matter of policy, not even ones they have arguably contributed to the problem as a liable publisher, through initiatives such as this....
https://blog.wikimedia.org/2016/10/26/i ... ken-links/
Firstly, he has spotted a flaw in how they make sure Google doesn't scrape the contents of talk pages, namely by ensuring the page includes the code "__NOINDEX__" if deemed appropriate. The bug occurs when talk pages are archived, which apparently drops the code, thereby allowing Google to scrape what it contains. As we know, once that happens, it's out there, forever.
A more effective method might have been to have developed a culture, technology framework and safeguarding system by now where they can be reasonably sure that libel or other harmful content doesn't even make it onto their servers, let alone stick around long enough to be dumped into an internal archive page. Even the busiest of talk pages don't get archived for days (and worryingly those are probably the more likely places where harmful content can be seen, since on Wikipedia talk pages, controversy equates to traffic), so they really have no excuse, other than the excuse they use to explain away all their faults - it's a big website with many pages that anyone can edit, and they're all just volunteers.
Pending the US adopting some effective legislative means to ensure people have a right to be forgotten, Brandt is instead engaging with the Wikipediots through their bug reporting system, as a first attempt to get it fixed.
https://phabricator.wikimedia.org/T194561
He also has higher aspirations, namely to modify Wikipedia's Biographies of Living Person's policy to define content as eligible for immediate removal if the supporting source is dead or is only available via an internet archive. He's suggesting this on the basis that Wikipedia's use of automated technology to rescue all dead links with internet archive urls where they exist, doesn't fully take into account the reasons why a source might become dead (such as due to a retraction). And because the internet archive services are themselves are often simply automatically generated and often never removed even in cases like a retraction, any automated technology used to add their links to Wikipedia, will simply end up perpetuating information that a victim might have previously successfully got taken down in the original source.
Again, the question has to be asked, surely a better system would be to ensure any and all sources are proactively archived by Wikipedia at the point of insertion (which could in turn be integrated with a reliable source whitelist feature), and if the contents of the source changes against their internal copy (or hash, given it doesn't need to be human readable), as determined by a periodic automated review, their system should ensure the source is removed unless or until it can be ascertained the change was for non-harmful reasons, which typically would be the source being updated or moved. Whatever the reason, under their own model, it is unlikely summoning a human eye to it, would be a waste of their time, and in a small number of cases, would actually ensure it is prevented.
Given what is obvious about the Wikipedia cult, namely for reasons of PR, they will do anything to meet their aspiration of Do No Harm, as long as it doesn't interfere with the basic business model. Which, of course, necessarily makes it easy to do harm, and hard to prevent it. Doing anything remotely responsible, undercuts the very thing that made them a success, namely not having the overheads of a traditional encyclopedia, or indeed any kind of media company which doesn't hide behind Section 230. Which cost money. Lots of money.
As such, Daniel may well get the archiving bug fixed for free by the volunteer code monkeys, if it really is as easy as it sounds to fix, but not the BLP change he wants. Or anything remotely like it. He seems to think there is a legal imperative for the WMF to intervene here and force such a change, but I would argue, for all the usual reasons, they'll be more than happy to stand by the position that they are only obliged to act in individual cases which are brought to their attention, and have no jurisdiction over how Wikipedia uses internet archive services as a matter of policy, not even ones they have arguably contributed to the problem as a liable publisher, through initiatives such as this....
https://blog.wikimedia.org/2016/10/26/i ... ken-links/