My writing is “like” Arthur C.Clarke’s. A few posts are written “like” David Foster Wallace, James Joyce, Kurt Vonnegut, and Anne Rice. So proclaims the I Write Like website. Some (Wallace, Joyce) I have never read. The others I haven’t read in this century. Yet a similar writing analysis site by Mark Allen Thornton of Princeton gives a different result.
For example, iwl.me says my post on Sri Lanka’s immortals is “like” Arthur C.Clarke’s. Yet Mark’s system doesn’t list Clarke as a probable writer for the same text. It is an obvious result of the difference criteria and data used by each systems.
The I Write Like site compares input text with fifty writers. Its algorithm uses a Naive Bayes classifier. Mark Thornton’s system uses data from Project Gutenberg. Its algorithm and toolsets, are different as the results.
In any other sphere of life, such variations in criteria will raise howls of “bananas vs mangos” comparisons. Such concerns are lost in the “magic” of machine learning. There is no public discussion on the criteria/logic of the algorithms in such systems. It’s too complicated for “public” “layman” discussion. So the “public” response is to marvel at the computer magic without a world on their inner workings.
Machine learning and AI algorithms power system are spreading through the hidden plumbing of modern life (third world or not). They have a say in financial systems that can wreck nations, the ads we see on social media, to how our digital lives are scrutinised by global spy agencies. It extends to how data consuming organisations classify us. Whether we are ideal targets for ads, gay, a potential terrorist, pregnant, ill and much more.
Yet there is no widespread push for scrutiny of how these systems work.
iwl.me and Mark’s system have their code and thinking open to the public. The larger systems impacting society are hidden by the veil of proprietary secrecy. Once it’s coded by a FEW guys in a FEW offices in Northern California and a FEW other places, that’s it. The cycles of machine learning will then evolve the core algorithms. They will reach a stage where no human can understand them. By then we will be grovelling before our digital gods and the priests who control them.
Thankfully the likes of Sara Wachter-Boettcher have made a start. I have yet to read her book TECHNICALLY WRONG Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech. The title has the ring of making the issue accessible for the “non techie”. For the sake of all of us, I hope it does.
It was Yudhanjaya’s blog post that got me writing this post in the first place. So it’s all his fault.