Personal photos of Australian children secretly used to train AI

Is your child's image being used to generate AI pictures.jpg

Is your child's image being used to generate AI pictures? Source: SBS News

Get the SBS Audio app

Other ways to listen

It's emerged that photos of Australian children have been used in a massive dataset used to train Artificial Intelligence. Researchers from Human Rights Watch made the discovery, following a supply chain investigation linked to the release of deep-fake images of up to 50 girls from a high school in Melbourne last month.

Listen to Australian and world news, and follow trending topics with

SBS News Podcasts.

TRANSCRIPT

A new report from Human Rights Watch has revealed that personal photos of Australian children are being used to train artificial intelligence tools, without the consent or knowledge of either the children or their family.

The analysis from Human Rights Watch has found that a data set used to train powerful A-I tools, which was built by scraping data from most of the internet, contains links to identifiable photos of Australian children.

Hye Jung Han is the Human Rights Watch children's rights and technology researcher; she told SBS that many of the photos they found in the data set were of intimate family moments.

“They we're not intended for anyone other than family and friends to see, I mean, one of the photos that I'm thinking about is, it's the first seconds of a child being born, you know, the doctors holding the baby in their hands and the child is covered. The baby is covered and amniotic fluid still connected to it's Mother, you know, this is not a photo that a family member took in order for it to be scraped into an AI tool to then be weaponised against other kids into creating sexual deep-fakes. Not at all.”

Earlier this year, about 50 girls from a high school in Melbourne were victims of the non-consensual creation of sexually explicit deep-fakes created using A-I image tools.

Attorney General Mark Dreyfus recently introduced in parliament reforms banning the non-consensual creation or sharing of sexually explicit deep-fakes of adults.

“Digitally created and altered sexually explicit material that's shared without consent is a damaging, and deeply distressing form of abuse. This insidious behavior is degrading, humiliating and dehumanizing for victims. Such acts are overwhelmingly targeted at women and girls, and perpetuate harmful gender stereotypes and gender based violence. This bill delivers on a commitment made by the Albanese government following the national cabinet held in May to address gender based violence.”

These reforms however, noted that such imagery of children would continue to be treated as child abuse material under the Criminal Code.

“The new offenses will apply to material depicting adults with child abuse material continuing to be dealt with comprehensively, in a separate division of the Criminal Code, which includes detailed offenses and heavy penalties.”

Hye Jung Han says this approach misses the deeper problem at hand, that being that children's personal data remains unprotected from misuse.

“You know, actually children and families absolutely have the right to post joyful moments of their lives online and at the same time, they absolutely have the right to expect that their personal images will be protected by law, by the government against any kind of misuse. So it's actually not fair to expect children and parents to have to try and protect themselves against a technology that's fundamentally impossible to protect against. It's actually the responsibility of the government to finally pass a comprehensive child data privacy law, which on paper they said they would do soon in August. That would really protect children's data privacy. “

Human Rights Watch says that included with some of the photos were children's full names and ages, with some data even including the name of the school or pre-school they attended.

The findings show that most of these photos are not available anywhere else on the internet, meaning they were likely scraped from private sites like personal blogs, school photographer uploads and other private photo sharing sites.

Simon Lucey is the director of the Australian Institute for Machine Learning at the University of Adelaide, he says web scraping tools now collect massive amounts of data, making it even more difficult to monitor.

“It's unfortunate, but it's a nature of sometimes how some of these models are built in the sense that they use these things called web crawlers that essentially sort of like, go through the internet and collect images and text and the reality of modern AI is the more data that you get, the better results. So we're in this sort of weird scenario where, say 10 years ago, it would have been much easier to go and check a data set because of the smaller scale. But with these data sets, it's sort of extraordinarily big.”

Another concerning finding in the Human Rights Watch report was the use of images of First Nations children in Australia.

Hye Jung Han says many of the photos have been scraped from parts of the internet that aren't usually accessible to everyone and the use of First Nations children's photos in the data set poses specific harms to First Nations communities.

“Some of these photos come from schools websites, where schools wanted to share with parents and kids the images they'd taken at school events and they posted these images of kids on parts of the website that's not really publicly accessible. And yet, this data set, this AI data set was scraping from that. And the same thing has been happening, whether it's on YouTube or Flickr, or other video and photo sharing platforms as well . And the other thing is really astonishing to me is that this is the first time that anyone's reported that images of First Nations people, much less First Nations children, are scraped into these data sets, and that, of course, this particular harms for First Nations children. “

Human Rights Watch says that once data is scraped, even if the original collector of that data removes it from their dataset, the A-I model has already been trained with it and will not unlearn the information.

Of even further concern is the potential use of A-I tools in the creation and manipulate sexually explicit imagery of children.

Simon Lucey says that even when companies say they protect the data, the rapid scale of expansion in the A-I space means there are often data leaks and misuse.

“There's been some recent studies too, around how these new AI models can potentially leak data. I think a lot of people in AI sort of have stood behind this wall, thinking that. 'All right, well, we're essentially just getting statistics were just averaging things, and so there's no way for individual images to leak out'. And there's been some very interesting research showing that in some instances that can happen and I think the confluence of these two things together, have got people worried. “

Simon Lucey says the issue can be worsened by the fact that legislation around new technology often lags, with harm reduction and prevention measures only able to fully address the problems once potential harms are realised.

He says that if Australia is to effectively deal with the rise of A-I, it can't simply stand back and follow someone else's lead.

“We actually have to be driving for the innovation piece on this, we actually have to actively be investing in AI, responsible AI research, how can we make sure that AI has been used in a responsible way, Because part of this is actually we can't just turn it off. We want to make sure that we are actually ingesting data and using data in a way that makes sure that people's privacy is preserved. But also we're getting the benefit. AI is doing some wonderful things at the moment finding new drug therapies, new ways of building catalysts and batteries to battle climate change. And so, our society in the world would actually be poorer if we were just to pause all this, but it has to be a balanced approach between legislation and innovation." ]]

The government is looking at making changes to the Privacy Act which would better protect children.

So, while the reports findings raise many concerns among parents about the privacy of their children online, Hye Jung Han says the promise of new legislation in the coming months will hopefully ensure better protections.

“I think it is promising that yes, usually laws take a long time to be adopted and to be enforced. But this is one instance in which the government has promised that next month they would announce a child data privacy law and I think everyone has to hold the government accountable to actually doing so in doing so in a way that would actually really respect children's rights. “