The Digital Deal Podcast

Truth Makers (Part I)

Kasia Chmielinski & Ndapewa Onyothi Season 1 Episode 2

Send us a text

In this episode, we discuss how machine learning and artificial intelligence adopt hegemonic discourse and existing biases and, what’s more, amplify them, and we talk to some of the people out there who are fighting back. Our guests are Kasia Chmielinski (they/them) - Co-Founder of the Data Nutrition Project, an initiative that builds tools to mitigate bias in artificial intelligence - and Ndapewa Onyothi Wilhelmina Nekoto - an independent researcher & community builder, part of Masakhane 

Resources:
QT.bot - Sitting here with you in the future by Lucas LaRochelle (CA)
Ceux sans qui la terre ne serait pas la terre by David Shongo (CD)

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? by Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell
Braiding Sweetgrass by Robin Wall Kimmerer


Host & Producer: Ana-Maria Carabelea
Editing: Ana-Maria Carabelea
Music: Karl Julian Schmidinger
________________________

Host & Producer: Ana-Maria Carabelea
Editing: Ana-Maria Carabelea
Music: Karl Julian Schmidinger

The Digital Deal Podcast is part of European Digital Deal, a project co-funded by Creative Europe and the Austrian Federal Ministry for Arts, Culture, the Civil Service and Sport. Views and opinions expressed in this podcast are those of the host and guests only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor the European Education and Culture Executive Agency (EACEA) can be held responsible for them.


Ana-Maria Carabelea: Welcome to The Digital Deal Podcast, the series where we talk about how new technologies reshape our democracies and how artists and critical thinkers can help us make sense of these changes. We're recording this episode at the Ars Electronica Festival 2023 in our beloved location, Post City, so at times you'll hear trains in the background. My name is Ana Carabelea, and for the first part of this episode, I'm joined by Kasia Chmielinski and Ndapewa Onyothi.

Kasia is the co-founder of the Data Nutrition Project, an initiative that builds tools to mitigate bias in artificial intelligence, and a fellow at Stanford University focused on building responsible data systems. Ndapewa is an independent researcher and community builder, part of Masakhane, a grassroots organization whose mission is to strengthen and spur NLP research in African languages for Africans by Africans.

In our last episode, we talked about how machine learning and artificial intelligence, as technologies coming out of a particular political and economic system, perpetuate a logic of extraction, accumulation, and appropriation, but also give rise to new forms of these practices. In this episode, we talk about how machine learning and artificial intelligence integrate hegemonic discourse and existing biases, and what's more, amplify them, and we talk to some of the people out there who are fighting back.

Kasia and Ndapewa, welcome! Thank you so much for joining me. I wanted to start from the very basics: the problems with the data sets that we're feeding algorithms. Let's try to frame that for our listeners.

Kasia Chmielinski: Well, first, thanks for having us. It's really nice to be able to meet in person. Still feels special. So, the problems with the data sets.

Ana-Maria Carabelea: Where do you begin?

Kasia Chmielinski: Where do you begin? Yeah, it's such a real question. I think it makes sense maybe to talk a little bit about the ecosystem of data that we have, because there are different problems based on different parts of the ecosystem. I think that we are living in an age, we're going to look back on pretty soon, I hope, and say that was crazy, and I'm so glad that it's changed. But we're currently still living in an age where there's just data everywhere. And the ability to access data is pretty universal. If you have certain skill sets, you're from certain parts of the world, you speak certain languages, there's a lot of data out there, and there's a lot more, kind of every day. So, there's a lot of found data and there are a lot of ways to create data sets. You can scrape the web, you can build little robots that go out and capture “everything" that's on the web. Again, this is very specific as to what parts of the Internet, and even just the Internet itself and who's able to contribute to that as a repository of information. And then there is a lot of data that you can buy on people, and those are collected by corporations, mostly aggregated, and then sold. So, you have lots of different problems across this. There are issues about who's actually represented in this data and who's not there. We have massive issues around digital divides globally - who's there and who's not is kind of a basic question. Then you have access to that - who has the money and the ability to get that data? The fact that a lot of the data is being bought and sold means it doesn't really belong to the people that it's about. It really belongs to the corporations. And then that data is used to do things, "for us or on us”, that have to do with marketing and advertising and statistical analysis. And all this means that if you are building systems on data - which is what most of our applications, our software, AI are doing - the quality of those applications will never be better than the quality of the data. So, all those problems are just going to get perpetuated and propagated through all the systems that are built on that data.

Ndapewa Onyothi: I totally agree. I mean, I come from the Global South. We don't have data that represents us, we don't have data in which apps can work for us in the ways that they should. Apps do not have the significant impact that they should. And because the data does not exist in the quality that we want it because of our history and being a young continent with a young population, there's that need to create data rather than collect it. And that becomes a struggle because you're looking at the costs of going out to capture these data sets, create these data sets, communicate with the knowledge holders, but also embody these data sets so that they are a reflection of the society - of who we are and how we have developed - and there's an appreciation, an appropriateness to culture, to the society, to our norms, to who we are. In urban settings, the worry is also that the knowledge holders are severely threatened, some nearly extinct. And that is very worrying in terms of what our future will look like. So, it's not in a situation of adopt, adapt, and it'll work because we've seen that, and that reality is very scary.

Ana-Maria Carabelea: So, we spoke about working with the data that's already out there or having to go out and collect the data, which very much speaks to our last episode, where we spoke about the appropriation of data and how that's just being grabbed - we drew this parallel between the land grabs and the data grabs. In terms of having to collect the data or ‘manufacture’ this data, there's, again, an imbalance there when, as you were saying, coming from the Global South, you're being forced to actually ‘manufacture’ this data, which is costly, whereas we have Big Tech that just took it, making it so much more difficult to break that pattern of grabbing and collecting data and using it for your own profits and purposes. What I would like to talk about now - after we’ve talked about the collection side of things - is the preprocessing side of things, where we use words like ‘cleaning’ the data, which sort of alludes to this need to make order in something messy - and that's something that needs to be done to make these algorithms more inclusive and improve the way they see and make sense of the world. Following that, we have the part of them [the algorithms] identifying patterns within the data. I'd like us to unpack the two processes and sort of see how they affect the very notion of truth or how they affect the production of knowledge.

Ndapewa Onyothi: I want to start with the appropriation of data. If you asked me three months ago, I would have agreed with you that Big Tech just takes whatever it is that they find. And in these last three months, I realized that small tech within our communities would do the very same thing if our data was online. And I'm so glad we're at Ars Electronica, where I saw this booth with empathy, community, unity. These are qualities we no longer embody. And we see that in young startups, people from a certain cultural community, wanting to become the next big tech in their country, they don't embody these values of where they come from, either. Forgotten. And if our data was online, I have no doubts that many of them would do the same. And that is worrying. But fortunately, it's not online, and there is still hope to rectify that, where we're able to remind them of what it is to be human and do it right and do it better.

Coming back to cleaning data and preprocessing, we really worry about this stage. I think many of us follow this design thinking concept, or systems thinking concept, which starts with empathy, and that becomes the source of our intent. And as we go out to then create these data sets, we're limiting the preprocessing because that process becomes so natural. We have this quality data set because through empathy, we are challenged to think, we're challenged to unlearn and relearn certain things. And when we then create these data sets or capture the narratives of all stakeholders, we're minimizing the preprocessing stage. We're minimizing so many other things and ensuring that there's integrity, ethics, empathy as part of that. And we see that in large corporations, because, either they don’t invest in the communities to shape these technologies, or they don’t learn from the past that has harmed so many societies, it becomes sort of colonialism 2.0, just forced extraction, and I'm not sure how we got back there.

Kasia Chmielinski: That's really good. Yeah, so many things you said. I feel like this notion of more data is better. That's even being challenged right now in some of the scientific literature with the large language models, where that has honestly been the biggest shift. It's not necessarily a mathematical one. It's really about how much data they can shove into these models when they train them. And at the same time, they're starting to realize that more is not necessarily better and actually, higher quality data is more important. And so we're seeing a shift away from just putting lots of random data in, towards something that looks more like a blended model, where they're also feeding it higher quality information that's been looked at, that's been labelled, that's been categorized and annotated by people in the Global South, actually. So there's a very real labour component to data extraction, where it's extracted from communities, and then the same communities are given very little value back to work on categorizing and adapting that for use in technology. And I think you're totally right. This colonialism 2.0, it's absolutely the way it has been and the way it continues to be. When I think about the tension as well between openness and rights, there's also a notion of openness is good: open science is the best way to do it, open source is the best way to release something. And I think that in the West, we have to get a little bit smarter about that and critical about what it means to just release things or what it means to grab things that have been released. Because there's a trade-off. If you put the data out there and someone uses it and they aren't using it in a way that aligns with the original intention, if you gather a bunch of data that's personally identifiable or identifiable within a community, or even just makes certain knowledge that was community or locally specific now available for anyone to use, that's inappropriate. Right? And so, there is a real trade-off between openness and respecting people's rights. And there's just a whole bunch of mechanisms that we don't yet have around guardrails for the generation and use of data. I run a project that believes this, that we should be collecting data after determining why we want to collect the data. And so there needs to be, I mean, you use the word intent. There needs to be some kind of intent that drives the collection of the data, the haphazard collection of things, the scraping. I'm not saying that there's never a case where you would want to or have to do that, but generally speaking, having an intention before you do something ... we should probably do that.

And then on the other side, when you're using the data, there should be uses that the data was intended to be used for. And then there should be kind of off-label usage, which maybe is not sanctioned and that is irresponsible. And we need big tech and small tech, I agree with you, small tech is even less regulated and flying under the radar because they're not in the public view. We need all of those folks to then actually abide by the usage restrictions or agreements as they exist. And that just doesn't exist either. We have all these tensions and these trade-offs around the processing, the cleaning of the data, and then the usage of that. And we have no mechanisms or cultural mechanisms to use the data responsibly.

Ana-Maria Carabelea: There are obviously people like you that do this type of work. But what would you need for your work to become more visible and be inscribed into these practices? Is regulation a solution, and then what sort of regulation, how would that look like? I think that's the question that a lot of people have. When you say regulation, it can sound scary. Obviously, big tech makes it sound scary, but there are perhaps also [real] traps there. What other type of support do you need?

Ndapewa Onyothi: We have a number of stakeholders that should be involved. And what we've seen traditionally is government - which represents the people - often coming up with these policies, but they're only at the top. And a top-down approach has often failed. And that is why we have grassroots communities addressing the problem and really advancing language technologies, data sets, and building these technologies from the ground up. When we apply that same approach, communities creating data sets that are relevant to them - either for documentation or to be used for revitalization, or to be used to power a particular service that they truly need - should be leading discussions on regulation, and they should be part of these steering committees. So, it's a good thing that we have traditional councils, and then we have various union councils, and then we have government, and then we have civil society. If we start at that very level with a traditional council, we then move into more application use - for example, identify farm boundaries, predict weather patterns so you inform local farmers, and make informed decisions that are aligned with their traditional ways of living to keep their norms alive. The problem with regulation, and where regulations do not serve the society, but benefit large corporations with money, is that the custodians of data are not part of the discussion. They're eliminated. Their voices, their opinions, their concerns, and their values are not protected. They're exposed. Different stakeholders need to be part of the conversation when we talk about regulation because if you have an intent on why this data needs to be generated, there are people who agree to it and communities who know the benefits of this data creation (the end result, use case and the ways that it will benefit them), but they're often eliminated from regulation.

Kasia Chmielinski: Yeah, I agree with all of that. I think maybe I can talk about two examples to kind of illustrate some of what you're saying because I completely agree. The first example isn't necessarily about data, but I think highlights that people, when consulted, have opinions about their own data and have very strong opinions about what they want to have happen. It's often just a problem of education, or not even education but transparency into what's happening, right? So, there’s not even a way to be educated about it because we don't know what's happening. One example of that is Apple introduced this privacy label on every phone or device when you downloaded an app that suddenly started asking people: do you want to allow this app to track you? And it's really the first time that a lot of people realized that when you download an app, you're giving them permission to track you, to track your device, but that device is probably on you or on some people that share your device with you, so it's now tracking wherever that device goes, which is a good proxy for tracking where you go. And something like 80% of people said, no, I do not want them to track me. It's such a strong indication that when people are given a choice and they understand what's happening, they don't want to be tracked, they don't want that data to be collected on them. That, to me, highlights the importance of giving people information in a way that is consumable and then giving people meaningful choices. In this case, you could still download the app, you just tell them not to track you. Prior to that, if you knew that you were going to be tracked and you didn't want them to, the only thing you could do was not download the app. That's not an actual choice, right? If you need that app for something, if your work requires that you download something, you have to do it anyway. So that's one example, of how people should be informed. They should be part of the process, and they have opinions about their own data.
Another example is the analogy that started the organization that I run, which is around nutrition labels for data sets. When you look at food in the US and many other countries, we have this fact label on the back of all food packages. It allows you to make a decision about what food you want to buy and put in your body. And the example I use there is, when I now walk into a bakery, a local bakery, and there's a bunch of really delicious-looking cakes. I love a cake. They have not printed out a label and put it on a cake, right? I mean, they don't do that because the cake is in the window, and they made it there. But I'm thinking about the nutritional value of the cake. And that to me, is a combination of the top-down regulation - where all the foods that are packaged and sold must have this standardized information -, and culture change. And I think we need the same thing. Regulation isn't going to fix everything, but those companies are not going to tell you what's in their food unless they have to. Same thing that companies are not going to tell you what's in their data unless they have to. Will that solve everything? No. But what it will do is help people start to get their minds around what they should know and what they should be asking about a data set, just like I now, unfortunately, am thinking about the nutritional content of the cake. And when I encounter the cake, I think, okay, what's actually in this? Is this good for me? I'll know to ask if I have allergies, whether it contains these ingredients, and so on and so forth. So those are two examples, maybe to support all the things that my colleague here said that I agree with it.

Ana-Maria Carabelea: This has sort of already addressed my next question about citizenship and political representation and whether we can have any sort of agency within this. Can we refuse to be tracked, to be then analysed? Can we refuse to become data basically, and be part of data sets? Do we have any sort of agency?

Ndapewa Onyothi: So, there's data and creation, there's data and processing, there's data and storage. Yes, we've created it ethically. We've created it with a community but where are we going to store it? Do we understand the T's & C's that come with where we choose to store? Will it still be our data, or do they have the right to make use of it however they wish? Then it is accessibility - open source to who? It was probably funded by the government or an NGO, and it needs to be archived somewhere and it's open source to anyone for academic purposes or whatnot. But open source to who? This is where the intent comes in. If your intent is not aligned with the reason we created the data sets or how it should benefit the custodians of data, then, no, it does not apply to you.

Kasia Chmielinski: Yeah, I would add to that too, that in many places - my context is the US - our rights framework is very individual based, and so all of our rights come from the individual. So, if you have a problem, you can sue somebody ... very litigious, and you as a person can prove that there was harm that was caused to you personally. And then you can try to seek recourse in some way. And we don't have a great way of understanding collective rights. And I think that the government is supposed to work in the public interest, but which public, which part of the public? We do have that as kind of a way to come down on organizations, companies, technologies that are causing harm to a community. But our rights framework is not actually that strong, at least in the US, around what to do if you believe that a technology is systemically or systematically disadvantaging or harming a community. You actually have to find - and I'm not a lawyer but my understanding is - you have to find individual cases of harm to prove that there was harm against a group, as opposed to being able to say before the fact that there are harms that are going to be, that there are risks, let's say, and act on behalf of a group at large. It has to be through individual cases. And so that's really tough in an era of data because you're talking about harms that happen at a collective level or decisions that are made at a collective level. That's one thing that I think is really challenging.
But to your question about agency, there's also intentionally messing with a system, right? I mean, you can intentionally create noise. You can create data that's crappy, data that throws people off. You can also intentionally be absent or somehow evade data collection or perhaps if you're part of an identity group or a community that is not easily categorized. Yes, there's a sadness and a real issue with not existing in the data, but there's also sometimes a beauty in that because then you're out of the system. So, I think that there is some agency in that direction as well, which is either messing up the system a little bit by injecting noise and reducing the signal or reducing the signal by not being there.

Ana-Maria Carabelea: In terms of the beauty of the uncertainty, the chaos, and the incomputable life that one can have and, on the other hand, what comes out of trying to compute everything, compute life in general and the implications of that for society, politics and the future in general: what kind of futures can we build on past data and patterns (that may be past patterns, but they might also just be made up patterns in the data or illusions of patterns in the data)? And where does that leave us in imagining a way out?

Kasia Chmielinski: Unfortunately, a lot of these systems are optimized for profit, and so they will go where the money is. And so that's a little bit challenging. It depends on what the systems are optimized for and what patterns they're looking for. But sure, I think that there are “erroneous” patterns that show up all the time. A classic example is when Amazon tried to build that hiring algorithm and they fed it all of the historical resumes, and the pattern was that women weren't hired. And so, the result was that the algorithm said, don't hire women. And it's not erroneous in the sense that in the past, women weren't hired, but it's erroneous in the sense that you should never hire women. I do think that obviously, it'll pick up on patterns whether or not they're accurate. I don't even know what the right word is: unbiased, accurate, true. Whatever's in the data is seen as the ground truth, right? Is seen as the truth regardless of whether it's socially accurate or representative or appropriate.

Ana-Maria Carabelea: Or even desirable in the future. Because, as you say, that might be the case now, and it might have been the case in the past, but that's not to say that that should be the case from here on.

Ndapewa Onyothi: I'm trying to think of three scenarios. One, I imagine a veteran or a war hero or a freedom fighter, as we would call them, narrate their 20 years liberating a country - how they survived, what they were fighting for, how they lived, and whether their methods were sustainable. Historical data and narrative is thriving because it has learned from past patterns and this has not been optimized for profit, but it has been optimized for sustainability. Another example is treasured communities who've lived centuries as they have, like hunter-gatherers or nomads, and their traditional ways of tracking and living is what is shaping sustainability efforts in the world.

Ana-Maria Carabelea: Yeah, that's a very good point, that past data can mean different things in different contexts. To end on a more positive note, maybe we can talk about the work that you two do. I think it's really important that you're coming to it from very different angles. Kasia, you're more focused on how data misrepresents certain things in certain communities. And Ndapewa, you, with Masakhane, you're focused on how communities are underrepresented and actively collecting data. So, it'd be interesting to hear more about that.

Kasia Chmielinski: Our project is called the Data Nutrition Project. We're about five years old, but we became a nonprofit just a year or two ago. It was a research project that was sparked by a hypothesis. It was me and some folks sitting around a table, eating lots of cookies and cakes, and talking about the problem of AI. This was like 2018. And it was around that time, that a number of other people and organizations were having very similar conversations - we're by far not the only people in the space - about how everyone is so focused on the AI being biased, AI being racist and sexist, and no one was talking about the fact that the AI was just doing what we trained it to do. And it was trained on data that was unrepresentative or biased in some way. But those of us sitting around the table - a very interdisciplinary group - had personal experience in building, auditing and using these systems. And so, all of us were just kind of scratching our heads and saying, why is no one talking about the data? And at this point, I feel like we and others have put a lot of work in because people do talk about the data. And I feel very happy about that. I'm guessing that no one listening to this is going to be surprised about bias in, bias out, garbage in, garbage out, and that kind of a thing.
So, what we decided that we would try to do, or what we thought would be fun to try, is build a nutrition label, like for food, but for data. A simple analogy is that you should know what your machine is about to eat before it eats it. You should understand how it should or should not be eaten, because we also believe that there's no data set that's perfect and there's no way to eliminate all bias, but it's about knowing what the bias is. We spent a lot of time and worked with others in the space - so, there's data sheets for data sets, there are model cards, system cards, there are fact sheets. There's all kinds of parallel work that has happened in the space, for sure. There are even some that are domain-specific, like language processing, in particular. We borrowed from others and lent to others and came up with a structure of information that we thought was really important. It goes along the pipeline from intent collection, preprocessing, and then usage. Although storage is really good, and we should probably add that. So, we have all this standardized information that we gather on every data set that wants to build a label. And then we also worked with some very talented designers to create a standardized visual component that makes it very easy to communicate what we learn from asking all those questions. And our hope is that people can glean information immediately - if they have 10 seconds, because I know, as a product manager for almost 20 years, you don't have that much time to choose a data set. So, within 10 seconds or 10 minutes or 10 hours, you're going to find different levels of information. And what we hope this will do is raise the floor so that your documentation for data exists - in a lot of cases, it doesn't yet exist or doesn't exist on these data sets that are just out in the world -, and on the other hand, it isn't hitting you over the head with a 60-page PDF of documentation - which is amazing, that there's so much information, but very hard to consume. That's what we do and our hope is that we can drive culture change and be part of these top-down conversations, too, about regulation, about how data should be understood and presented so that people can make meaningful choices.

Ndapewa Onyothi: At Masakhane we have linguists, technologists - the first part, which is the data set creation, we have language experts and community linguists who then deal with creating the data model and using or advising on various data model documentation, advising on various methods to either trigger conversation and capture or collect, create the data. And then we have the techies who come in to optimize the models and develop the evaluation metrics. So, this is where the participatory approach comes in, we don't do everything.
I always start the day by recording my parents and this is how design thinking happens. I'm like, okay, so how did you sleep? And I'll find something interesting. It just flows this natural dialogue, but I'm always recording with consent, and I have to pay them. I think to date, we have over 40 hours just talking to my parents. And that is how it starts. We start talking about a topic. Let's say it's a national holiday or something I saw in the newspaper. It would then flow like that: How do you feel about this? What are your opinions, perspectives, share? And then I'd trigger them, I'd be on the opposing side. My grandmother will go off. But that is what I want because it's very hard for me to find a rich morphology or command of the language anywhere else in everyday life because it's all English or some other colonial language. If I'm interacting with my peers, there's a lot of code-switching because we're not learning the language in school and we are far out somewhere in the world or a different part of the country where we don't get to interact with the language so much. And these triggers, these methods, or stimuli work to get them to express themselves so profoundly with a rich command of the language.
And, I really want to be different to say that communities should be well compensated for creating data. We're looking towards things like data conservancies, where, if the data is from a particular community and it is used by X organization, for the time that this data is powering their model, they should be compensated. The money does not come to me. Yes, Masakhane will receive the money on their behalf, like an NGO, and have a very small fee to keep it running, because someone from Masakhane was leading this project, there's a data set, X amount of hours, whether it's speech or text that was used. This money goes back to the community, and the language experts who worked on it. Having the architects of the system present in advising how labour laws should look like now and how they imagine the future of labour law, these are the directions we want to go in. Because at the end of the day, the money comes back to them, they're defining how the data needs to be used, who needs to use it. The fundamental is really to get all stakeholders involved to agree and build consensus. And this is the same thing I've done with the youth that I've worked with. It's to agree on certain definitions.

Ana-Maria Carabelea: It’s a very different type of data that you're getting from these interactions with your family, than the data that you find on the Internet. The way people use language on the Internet, is very specific, whereas this is much more like language in its natural habitat in the way it's being used. I think it'd be quite interesting to compare, to see what comes out of this and how is it different. Because the Internet has its own language, and it's not necessarily how we speak.

Kasia Chmielinski: Totally, the way that we communicate has been shaped by the Internet. I mean, the way that I google something, I rather keyword stuff. It's not a real sentence. I say, Linz, traffic, app, no Google. And what I'm trying to say is Google won't index Linz's transportation. I know there's an app, but I forget what it's called. And it's not a real sentence. It's just like words, right? And normally I speak in sentences, you know? We've adapted to the technology, and then if we go back and we scrape from the technology and assume that's language, I mean, that's an echo chamber where we're going to then build technology that communicates in a way that we've adapted to the technology. Oh, my God. Right?

Ndapewa, it sounds like you're building an archive. And I feel like archives are cultural. First and foremost, they're not for monetary gain, and they're not so you can build something on them. It's an archive of culture, right? So that we can learn from our past and so that we can have empathy, so we can engage with people who came before us. But first and foremost, you get to hang out with your grandmother and learn stories.

Ndapewa Onyothi: It's amazing.

Ana-Maria Carabelea: At the end of the podcast, I ask my guests to send our listeners off with a homework and name an article, a research piece, whatever you can think of, an artwork, an exhibition that you've seen and think people might be interested in, that's sort of connected to what we spoke about today.

Kasia Chmielinski: I'm getting some looks across the table, so I guess I'll go first. An artwork that I think is showing this year and also has in previous years - I think it's a continuation of a longer project - by Lucas LaRochelle. It's called Queering the Map and QT bot. I think QT bot is this year, and Queering the Map is an existing project. It's a map, a webpage and on it, you can drop a pin anywhere and just mention a queer memory of some sort. But importantly, you don't sign in, you don't give your information, and no one checks whether it's true, or whether the thing happened in the place, or whether you chose the place in which it happened. It's just this collective mapping project of queer memories. And I think it's really beautiful.
And then this year's artwork is using that as a data set to generate new memories. So, it becomes the foundation on which all of these "new”, but obviously informed, memories, voices, pictures, all of this kind of show up in this video. What I love about it is that it really complicates and maybe questions what a data set is, what truth is, and how to collect memories and culture without policing it. And so, whatever is true for the person, is true for the person. We don't know who they are. We don't know how old they are. We don't know what their race is or their gender. It's just this beautiful kind of human experience. And then using that to generate new information as a way to also protect, I think, the actual memories from being rebroadcast in their entirety. There's kind of a safety mechanism in there. And there's also just like, a beautiful, dreamy quality of creating from that as a basis, additional things that may or may not be true, just like the originals may or may not be true. I think that's a really beautiful work. And then, if I can, if people want to go in a different direction and read a research paper, I would recommend that folks take a look at Stochastic Parrots, which is a paper written recently by Bender, Gibru, and Macmillan Major. I think it's fairly accessible, talking about the dangers of some of the AI that is being generated today from an environmental perspective, and a social perspective - a lot of things that are still kind of flying under the radar and I think are really important. They're also just exacerbated issues from previous generations of AI, it's a continuation of some of the risks and the harms. So, I think that's also a great paper to check out.

Ndapewa Onyothi: I'll start with art. I liked a piece yesterday by David Schongo. I don't know if he has a name for it, but it was beautiful. Picture kids playing in waterfalls with their hands and feet in the water, splashing. And he brings various instruments into it. And it's just kids, carefree. I'm not sure if it's Nigeria or Senegal, somewhere in Africa. And that was just beautiful. It reminded me of me. Just carefree. Hope for a future. You don't worry about the issues of the now. It's just feeling, music, and then imagining. There's a quote I love that went with it: My gift of fantasy has meant more to me than my talent for absorbing positive knowledge.
And then a book that I read recently is Braiding Sweetgrass. Have you heard of it? I love this book. It's a researcher from a treasured community who attends school or varsity in the West. And then she comes back, she wants to document things and she's using these methods and she realizes that they would not work. And she's just unlearning and relearning so much. And it's just a very, very beautiful book for anyone who's looking into creating quality data sets, working back to basics, starting conversations with senior citizens, or just biocultural values, going back to that and seeing travel differently when we go to distant spaces with so much culture, history and wisdom, what we can learn, talks about how to be more sustainable. The wisdom is just really great and just lovely. So that was uplifting because I read it recently, and I've been recommending it to everyone.

Ana-Maria Carabelea: Thank you so much, Kasia and Ndapewa, for being here with me today. And thank you for listening. The Digital Deal Podcast is part of the European Digital Deal, a three-year project co-funded by Creative Europe. If you want to find out more about the project, check out our website, https://ars.electronica.art/eudigitaldeal.

People on this episode