The Digital Deal Podcast

Data Lords

Hito Steyerl & Karen Hao Season 1 Episode 1

Send us a text

In this episode, we unpack the ownership and production of data that feeds today's hungry algorithms. What has thus far been described as a process of extraction reveals itself more and more as production. We talk to artist, filmmaker, and writer Hito Steyerl and award-winning journalist Karen Hao about the hidden labour behind the so-called data 'extraction', its appropriation through practices reminiscent of colonialism, and what needs to change for the AI industry to stop perpetuating harmful practices in which data is yet another commons turned into a commodity.

Resources:
AI Colonialism Series by Karen Hao, Heidi Swart, Andrea Paola Hernández, Nadine Freischlad
Power and Progress - Our Thousand-Year Struggle Over Technology and Prosperity by Daron Acemoglu & Simon Johnson


Host & Producer: Ana-Maria Carabelea
Music, Editing & Mixing: Karl Julian Schmidinger
________________________

Host & Producer: Ana-Maria Carabelea
Editing: Ana-Maria Carabelea
Music: Karl Julian Schmidinger

The Digital Deal Podcast is part of European Digital Deal, a project co-funded by Creative Europe and the Austrian Federal Ministry for Arts, Culture, the Civil Service and Sport. Views and opinions expressed in this podcast are those of the host and guests only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor the European Education and Culture Executive Agency (EACEA) can be held responsible for them.


Ana-Maria Carabelea: Welcome to The Digital Deal Podcast. My name is Ana Carabelea, and today I am joined by Hito Steyerl and Karen Hao. Hito Steyerl is a German filmmaker and writer whose prolific work occupies a highly discursive position between the fields of art, philosophy, and politics and is a deep exploration of late capitalism's, social, cultural, and financial imaginaries.  

Karen Hao is an award-winning journalist covering the impacts of artificial intelligence on society. She's a contributing writer at The Atlantic and was formerly a foreign correspondent covering China Tech for The Wall Street Journal, as well as a senior editor for AI at MIT Technology Review.  

In this episode, Data Lords, we try to draw a parallel between old capitalist practices, like primitive accumulation or accumulation by dispossession, even colonialism, and the datafication of everything that we're witnessing today. This brings up the question of ownership and its relation to information in the digital age. Can truth be owned; can we think of who the truth owners are? Or is it more the case that what is owned is the ‘raw material’ - the data, in which case, what does that make us, humans? 

As I started thinking about this, two cliches that we often hear came to my mind: Data is a new oil, and If you're not paying for theproduct, you are the product. One speaks about the value of data, and the other one about the value and commodification of human experience that is being turned into products that can be sold and implicitly owned. Do you think these cliches simply obscure different relationships between human experience and data and those who act as rightful owners of these commodities? 

Karen Hao: I still find these cliches quite useful. I feel like Data is a new oil in particular is a gift that keeps on giving. I've been spending a lot of time recently researching the need for mining and the emissions generation of data centers. In this context, data as the new oil becomes a great parallel for understanding the environmental impact of data production and AI production. Maybe it obscures the relationship in some ways, but I still find it helpful in kind of framing the parallels between what we see in AI now and industries past.  Similarly, If you're not paying for the product. I do think it succinctly continues to capture what we're seeing with new AI technologies today. In the conversation around the new version of AI that we're seeing today, I sometimes feel people have forgotten all of the conversations we had before critiquing how AI should be developed and the limitations of the technology. So in a way, having these cliches to remind us that these things haven't gone away, and these are themes that we should still be thinking about is useful for me.  

Ana-Maria Carabelea: So, in a sense, they're useful in identifying the patterns that we kept repeating and we keep repeating apparently.   

Hito Steyerl: I also think that they are still quite useful. I think the second one, Data is a new oil is still very much relevant if we think through the infrastructure of digital industries, especially in Austria. I think we should even modify it to the question of whether data is the new gas. Because if we look at gas pipelines and the modes of extraction and the dependency on gas - especially in Austria from Russian gas - and its implication on the war machine, we see that it's very, very tricky to be unilaterally dependent on one sort of infrastructure which is extremely unequal and lopsided. So in the same way that we are talking about gas pipelines, we should consider data pipelines and pipelines of digital workflows as infrastructures that can perpetuate dependency and exploitation.  

The second one, If you're not paying for the product, could be now upgraded to If you're not getting paid, you are the producer because this is what's being obscured usually - that the production of data is not like exploiting a natural resource, but that data is the product of labour. Also, looking at the people who are making all of this infrastructure - basically training data inside training sets gets rid of all the authorship, annotation, origin, etcetera. It's like washing and afterwards it looks like, not necessarily clean, but as some kind of supernatural artificial general intelligence. But what it does is basically remove the social connections of the training data and the infrastructure. 

Ana-Maria Carabelea: So, if you start mentioning manufacturing or production, then you're forced to think of the labour that goes into it which is exactly what you were saying Hito that we’re producers not necessarily products. Karen, you've written extensively about ghost workers and the perpetuation of colonial practices. Can you tell us a bit about that?  

Karen Hao: With the AI industry today, we're just seeing so many patterns where Global North companies are going to the Global South to get their labour and their resources and claim ownership over labour and resources that are not theirs to have. And unfortunately, because a lot of Global South countries have underdeveloped economies from the legacies of colonialism, they are in situations where they have these large bases of cheap labour and a lot of resources that they're willing to sell to companies because they need the money, and they need the opportunity.  

I really love what you said about thinking of the labour and data as production, not just as a natural resource to be extracted. That is so true because there's hundreds of thousands of workers around the world, predominantly in the Global South, that are part of the ‘manufacturing’ of this so-called pristine data that then gets fed into these systems. And the way that these workers are treated and the way they're told to process the data gives us a lot of insight into how these companies think about the role of technology and about the human experience. Like we're seeing now with workers in Kenya where there's this huge movement among those working for the data annotation industry. They are starting to organize and demand labour rights because they have not had any kind of basic minimum protections around their wages or their working hours or their working conditions while working in this industry. I actually think there is a perfect parallel between that experience and what we're seeing now with Hollywood protesters and the kind of devaluing of the labour that is happening among writers and actors in Hollywood. So looking towards the workers within the AI industry is sort of a canary in a coal mine for helping us understand what technology will be developed from this process and how will that technology ultimately affect all of us with its specific worldview. 

 Ana-Maria Carabelea: Indeed, we now speak about faux automation, which is exactly a veiling of the actual quite manual labour that goes into it, that it’s more convenient not to talk about. 

Hito Steyerl: Can I just add a few things? Thank you so much, Karen, for opening up this huge field of research, which was foundational for me to even figure out that this was happening. This was and still is such an important area of investigation. Now, let me tell you, this has spread much further. For example, within the European Union, we were able to find out that this kind of ghost work, especially in the form of content management, is now also applying to people with refugee status. For example, migrant legislation, or anti-migrant legislation, let's call it, is being exploited to create cheap labourers and labourers under, let's say, inadequate legal protection to be exploited by data companies. We talked to Syrian refugees who are now doing content management inside the EU, in German workplaces. This pattern of exploitation that you have uncovered is proliferating also into the so-called Global North in the form of these pockets of insufficient legislation and labour rights protection. So, it's super important to keep monitoring this spread of underpaid professions that benefit the AI industry. 

Karen Hao: That's such an important point. The use of refugees and also the use of prisoners. This is starting to happen too. A lot of prisoners are being asked to provide data annotation services for basically no money because they don't have any other option. It reminds me of this German professor that I interviewed when I was doing these stories, Florian Alexander Schmidt. He made this really great point. At the time, I was looking at how this was playing out in Venezuela when Venezuela was going through a devastating economic crisis. And he said this is a playbook. It's not a coincidence that companies will go to a country in crisis to find this labour. The political incentives, and the economic incentives, all point them in that direction. And as more crises proliferate around the world, economic crises, climate crises, and other types of crises, we will start to see more and more populations fall under this exploitation. The refugees are such a perfect example. When you have a geopolitical crisis or climate crisis that creates more refugee populations, that all becomes more labour for the data annotation industry. 

Hito Steyerl: Let me give you another example. Recently, we've been talking to people who do this kind of annotation labour for self-driving cars. They are in a refugee camp in the Middle East, and they have very, very limited mobility. They cannot travel extensively because there is no bus service. It's not that they are prisoners, but their mobility is extremely limited. But they are being shown images from cities like Berlin, for example - we saw our own environment in the images being annotated - and they make the mobility of other people possible or the mobility of self-driving cars. So, who is actually driving the cars? In fact, it's the people with extremely reduced mobility, who are also prevented by visa regulations from even leaving their country that are enabling other people's so-called mobility. 

Ana-Maria Carabelea: Can we see two layers of this? On the one hand, this obvious colonial pattern is being replicated, and as you said, Karen, the more you see countries in crisis, the more it will be replicated wherever the crisis is. But can we also see the same sort of pattern in how our day-to-day is being treated, in terms of us providing our data: the day-to-day as the new frontier, the new space that's being colonized, whether it's going to the supermarket or whatever activity we do on a daily basis that seems insignificant, but somehow becomes part of this big data set? 

Hito Steyerl: Whoever produces data one way, or another gets treated like this so-called natural resource. As a filmmaker, it's kind of obvious to me that any kind of data you produce, whether it's images or not, ends up as training data and thus basically divorced from your authorship, from any copyright claims. I'm not the IP person necessarily. I'm fine sharing. But if I know that this is being used to develop models that inflict harm on other people, then I'm not okay with that. 

Ana-Maria Carabelea: Then comes the issue of consent, where you might be open to sharing, but with whom and for what? 

 Karen Hao: Indeed. I did a story as part of my series on anti-colonialism about the Maori people in New Zealand. And there was a group of journalists who ran an organization called Tahiku Media. It was a nonprofit radio station, and they did a lot of work to try and figure out how to develop AI technologies that are not colonial. And a huge part of that was trying to figure out how to preserve consent and agency around data. One of the main leads of that project always said data is the final frontier of colonization. If we just allow people to take our data and repurpose it for any means, ultimately it always just comes back to harm people like us. And he was speaking as an indigenous person from Hawaii who had moved to New Zealand with his Māori partner. It was incredible reporting that story because they went through so much painstaking effort to try and figure out how they wanted to develop AI language technologies that would help revitalize the Māori language. So, they needed the AI technology for the purpose of decolonizing, but they wouldn't use any big tech tools. They didn't want to partner with any big tech platforms or anything like that. They were building everything from scratch. They ended up creating a special data certificate licensing mechanism where anyone who wants to partner with them and use that data to develop their own technologies would have to go through a rigorous application process and review. And it involves all the elders in the community, they have elder council meetings to discuss every different new application for the data and decide whether or not it will ultimately benefit them or harm them. It also involves compensation to all the producers of the data – everyone that they collected data from consented and also documented their name down so that anytime their data is used, it should return to them in some kind of benefit, whether it's the product or the service being developed, being given to them for free, or actual financial compensation.  Watching them re-engineer everything to try and not be colonial or not perpetuate colonial dynamics, made me see even more how much the norm, the global norm around data collection and data use is extremely colonial. 

 Hito Steyerl: I think that's cutting-edge science and technology: to develop new social forms of sharing and cooperating. I guess what you're describing is a form of data commons whereby a common repository of data is held and governed in common and there is a kind of constitution made for it and around it, and the revenue is probably shared. That's a really cutting-edge, revolutionary technologically innovative social application and this is where progress could be and hopefully will be made. But this whole process of talking to one another and figuring out the new rules is a labour which is not being acknowledged. It's much simpler and cheaper to simply expropriate data and privatize it, than go through this painstaking process of acknowledging the cooperative nature of machine learning models and applications. 

 Ana-Maria Carabelea: And perhaps not in the interest of the current economic system that we operate in, which I think is probably the biggest issue. I think we would all be more willing to take the time and put the effort into figuring out these new models if we knew that we were working in a system that was conducive to these sorts of models.  

Karen Hao: The one other dimension that I would add, which I don't talk about very often, is that it's not just the economic incentive of these companies. It's also the political incentive of countries that see AI as some sort of way to derive more state power. We're seeing this very much now with the US-China clash. The US does not want to even implement data privacy regulations, in part because they think that that will somehow weaken their ability to develop AI and derive power from that and continue to be the leading global power against China. And there's this persistent desire, not just from Global North countries, but you see this even more in Global South, where Global South governments have this deep-seated desire to "catch up” and to participate in this. The workers I interviewed in Kenya for example, now have a lawyer that represents them and tries to push legislation through the Kenyan government. She was saying that this is a huge problem and that the Kenyan legislators are worried about implementing labour protections for these workers because then what if the companies go to Uganda instead? What if they go back to Venezuela instead? So, in providing their country's labour, providing their workforce to big tech, they frame it as being part of the technology revolution, having a seat at the table. I think there are huge political and geopolitical dynamics at play that continue to just perpetuate the way that we are developing AI now. It's hard to stop when both the political and the economic incentives are pointing in the same direction. There's so little incentive right now for countries to regulate certain things because they think that somehow that's going to help strengthen themselves, strengthen their economies, and bring benefits to everyone. I think that's a narrative that we need to slowly unravel.  

Ana-Maria Carabelea: Indeed, and it's a case of either all or none. If it's going to be done, it's got to be done by all. Otherwise, it will always be this imbalance of power. 

Hito Steyerl: In the meantime, this sort of arms race you're describing is creating an even more dangerous situation like the rapid integration of so-called AI into military applications now also being amplified through the Ukraine war where companies just barge in with some kind of new solutions. I mean, just imagine Chat GPT going to war. You don't really want that, right? On all fronts, it's not something one wants to implement at all. Not just in a hasty way, but that shouldn't be done. In the wake of that technological arms race you're describing these are the developments that are happening right now. 

Ana-Maria Carabelea: To wrap up, I usually ask my guests to give our listeners a bit of homework and name an article, a research piece, a book, artwork, or whatever comes to mind that you've recently come across and you think everyone should know about and is relevant to today's episode topic. 

Karen Hao: I would recommend a book that I'm reading right now called Power and Progress. It looks at the last 1000 years of technological development and questions what we mean when we say technological progress. Who is actually getting the progress from technology advancement? In re-examining the last 1000 years, the authors conclude through a political economy frame, that essentially, we've always seen the same story: that it's the people in power that get the progress at the expense of the people. 

Hito Steyerl: Unless everyone else organizes and claims part of the technological benefits, then it's a different story. But this part is not automatic. There is no automatic progress for technology to benefit everyone. 

Ana-Maria Carabelea: That was also going to be one of my last questions. Can you think of any ways out? 

Hito Steyerl: Organization on every level, taxation, regulation. And the dismantling of this ludicrous narrative of automatic progress through technology that’s just not working anymore. It's also grown so stale. We've been hearing it for the past 30 years now, or even a thousand more as you just mentioned. 

Karen Hao: In the book, the authors mention that labour cuts across so many things. Having labour rights for workers, for the data producers, for the ghost workers, for all of the other things that can essentially be framed as labour in the production of this system is one of the baseline things that the authors really push for as a way to counterbalance the power that we're creating. This was a new frame for me in thinking about labour as kind of the constant minimal unit or fundamental building block of so many of the things that we see that AI needs to perpetuate within the industry. I guess that would be what I would encourage as an exit route out: better labour laws internationally as a way to protect artists work and protect the workers within the industry. And we can go from there. 

Ana-Maria Carabelea: Thank you so much for joining us today. 

Karen Hao: Thank you so much for having us. 

Hito Steyerl Thank you. Thank you. 

Ana-Maria Carabelea: That's it for today. Thank you for listening! New episodes of The Digital Deal podcast come out every two months wherever you get your podcast. If you like what you heard, please subscribe or follow us or share the show with someone you think might like it. The Digital Deal Podcast is part of the European Digital Deal, a three-year project co-funded by Creative Europe that investigates the accelerated and often unconsidered adoption of new technologies and their impact on society. If you want to find out more about the project, check out our website www.ars.electronica.art/eudigitaldeal. 

People on this episode