Why Microsoft’s Copilot AI Falsely Accused Court Reporter of Crimes he Covered

0
332

by Simon Thorne, Activist Post:

When German journalist Martin Bernklau typed his name and location into Microsoft’s Copilot to see how his articles would be picked up by the chatbot, the answers horrified him.

Copilot’s results had asserted that Bernklau was an escapee from a psychiatric institution, a convicted child abuser and a conman preying on widowers. For years, Bernklau had served as a court reporter and the artificial intelligence (AI) chatbot had falsely blamed him for the crimes he had covered.

TRUTH LIVES on at https://sgtreport.tv/

The accusations against Bernklau are not true, of course, and are examples of generative AI “hallucinations”. These are inaccurate or nonsensical responses to a prompt provided by the user and are alarmingly common with this technology. Anyone attempting to use AI should always proceed with great caution, because information from such systems needs validation and verification by humans before it can be trusted.

But why did Copilot hallucinate these terrible and false accusations?

Copilot and other generative AI systems like ChatGPT and Google Gemini are large language models (LLMs). The underlying information processing system in LLMs is known as a “deep learning neural network”, which uses a large amount of human language to “train” its algorithm.

From the training data, the algorithm learns the statistical relationship between different words and how likely certain words are to appear together in a text. This allows the LLM to predict the most likely response based on calculated probabilities. LLMs do not possess actual knowledge.

The data used to train Copilot and other LLMs is vast. While the exact details of the size and composition of the Copilot or ChatGPT corpora are not publicly disclosed, Copilot incorporates the entire ChatGPT corpus plus Microsoft’s own specific additional articles. The predecessors of ChatGPT4 – ChatGPT3 and 3.5 – are known to have used “hundreds of billions of words”.

Copilot is based on ChatGPT4 which uses a “larger” corpus than ChatGPT3 or 3.5. While we don’t know how many words this is exactly, jumps between different versions of ChatGPT tend to be orders of magnitude greater. We also know that the corpus includes books, academic journals and news articles. And herein lies the reason that Copilot hallucinated that Bernklau was responsible for heinous crimes.

Bernklau had regularly reported on criminal trials of abuse, violence and fraud, which were published in national and international newspapers. His articles must presumably have been included in the language corpus which uses specific words relating to the nature of the cases.

Since Bernklau spent years reporting in court, when Copilot is asked about him, the most probable words associated with his name relate to the crimes he has covered as a reporter. This is not the only case of its kind and we will probably see more in years to come.

In 2023, US talk radio host Mark Walters successfully sued OpenAI, the company which owns ChatGPT. Walters hosts a show called Armed American Radio, which explores and promotes gun ownership rights in the US.

Read More @ ActivistPost.com