E127

REAL Resistance: The AI Information Warp

Read Transcript
Listen on:

Show Notes

Klaudia Jaźwińska, Aliya Bhatia, and Yanick Kemayou come together to answer: What happens when a small collection of AI companies start colonising the truth?

More like this: The Toxic Relationship Between AI & Journalism w/ Nic Dawes

This is REAL Resistance, a collection of conversations produced in collaboration with Real ML, featuring the experts and advocates who make up Real ML’s global network.

In this conversation, three guests from the network share all the ways in which AI is shaping our relationship with news, language, and learning:

  • People are increasingly using AI for search and to get their news; but Klaudia Jaźwińska’s research finds that chatbots have a huge citation problem
  • Many LLMs claim to be ‘multi-lingual’ — but also only tested in US English? Aliya Bhatia explains that porting outputs from English into all other languages in not gonna cut it
  • Higher education in Africa is completely disconnected from local knowledge, which is why Yanick Kemayou founded Kabakoo: an alternative learning platform that leverages and preserves local knowledge with the use of technology, and without exploitation

As Yancik says in this episode, “openness without governance is extraction with extra steps”.

Further reading & resources:

**Subscribe to our newsletter to get more stuff than just a podcast — we run events and do other work that you will definitely be interested in!**

Computer Says Maybe is produced by Georgia Iacovou, Kushal Dev, Marion Wellington, Sarah Myles, Van Newman, and Zoe Trout

Hosts

Alix Dunn

Release Date

June 26, 2026

Episode Number

E127

Transcript

This is an autogenerated transcript and may contain errors.

Klaudia: [00:00:00] The overwhelming finding of our study was that chatbots are really bad at identifying the original sources of information, and they're bad at being transparent about when they can't answer something correctly.

Aliya: Language is so tied to identity, and I, I fear that we're using linguistic diversity to actually get away with just putting users between a rock and a hard place.

Yancik: Openness without governance is just abstraction with extra steps.

Alix: A handful of tech companies have enormous power over the systems we rely on, and the choices these companies make and the networks of power that surround them are impacting people all over the world. And yet conversations about that impact are often super vague and superficial.

So we hear, like, really highfalutin language like "humanity" and "It's gonna benefit everybody" and, you know, "There may be harms." But we need experts and advocates focused on understanding what's actually happening, not in [00:01:00] abstractions, but in specific contexts and communities around the world. Wouldn't it be great if those people existed?

And better still, were connected with one another to knit together a more global understanding that could empower resistance? This is a series platforming that very network. The network is called Real ML, and this series is called Real Resistance. We talk with members of that community who have the receipts, the stories, the research, the insights to challenge the most extractive uses of technology.

These are both critics of the tech industry and also people working deeply inside these questions to test ideas, build communities, and create tools and practices that point toward a future that isn't governed by tech billionaires. We made this series to share their stories, their strategies, and what it might take to challenge tech business as usual.

In this conversation, we're exploring how AI is shaping our relationship with news, language, and learning. And one note before we jump in, we recorded with Yannick separately. Um, dastardly internet access issues when you're working with a global [00:02:00] network. So you'll hear him pop in throughout the episode, sharing some stories about how and why he built Kabakob.

Yancik: I'm Yanick Kemayou. I am co-founder and chief learning officer of Kabakua Academies based in West Africa, part-time in Bamako in Mali, and if I'm not in Bamako, I am in Lomé in Togo.

Aliya: Hi, I'm Aliya Bhatia, senior policy analyst at the Center for Democracy and Technology.

Klaudia: My name is Klaudia Jazwinska. I am a journalist and researcher at the Tow Center for Digital Journalism at Columbia Journalism School.

Alix: Klaudia, at risk of asking you to give us, like, a eulogy to the state of journalism, do you wanna talk a little bit about how LLMs have reshaped what journalism is and kind of how people access information online?

Klaudia: So basically, AI has been integrated into a lot of the way people search now. So some people do go directly to ChatGPT or Perplexity to [00:03:00] ask questions about anything and about the news specifically.

A more subtle way that people engage with AI in search is through AI overviews and AI mode and all the different iterations that it takes on Google. So if you ask a question or even plug in a couple of keywords, it is likely that you'll get an AI-generated summary at the top of your search results. And the effect for news publishers has been a lot of them, especially smaller news publishers, have seen significant declines in click-through traffic.

Historically, for the last 20 or so years, news publishers have built a business model around getting click-through traffic primarily from platforms like Google. Now, that traffic isn't coming anymore or there's significantly less of it, which means that there's a lot less advertising revenue coming to news publishers.

They're also getting a lot less visibility into who is accessing their content and when. At the same time, the results can be really dubious. We've all seen examples of, like, Google telling people to put, like, [00:04:00] glue on pizza or to eat rocks or whatever. I think the results have gotten a little bit better some of the time, but even when they're citing news content, they're not necessarily citing it accurately.

You know, there could be issues about which details are included and which ones aren't, or whether things are in the right order temporally, or whether they're pulling information from an opinion piece and presenting it as fact, those sorts of things. So my colleagues at the Tao Center and I have tried to pursue a couple of auditing projects where we've tried to get a sense of, like, how reliable are these tools actually when it comes to things like citation of news sources.

And so there was a big study that we ran last year where we basically asked a bunch of chatbots to do a very simple task, which can be accomplished by a search engine, which is if you take an excerpt of text, um, in our case it was quotes from news articles, and plug it in and you ask for the original source of those quotes, can it answer correctly?

Can it identify the original [00:05:00] source correctly? And we were trying to test multiple things in this experimental design. In addition to see whether these chatbots can perform a simple task that a search engine can, we wanted to see what happens if they can't access content. Because one of the challenging dynamics is There are some publishers who, as I said, have like content licensing agreements, and there's some who are in lawsuits, and there's a lot who block the crawlers of AI companies.

And so we wanted to see what happens if they can't access a bit of information because of a crawler block or a lawsuit. Do they just say no, or do they make something up? And the overwhelming finding of our study was that chatbots across the board are really bad at identifying the original sources of information, and they're bad at being transparent about when they can't answer something correctly.

So one major trend that we observed was confident inaccuracy, and this trend was worse if you used a higher tier [00:06:00] model, like one that you would pay for, it was more likely to be confidently incorrect than like a free Model. So all of this is to say people are turning to, either turning to chatbots because they find them more convenient for their search queries or they're using them without realizing because they just, like, appear at the top of your Google search results.

Right now there's a lot of reasons why they shouldn't trust that information, but there's emerging evidence that people do tend to like the user experience of using a chatbot, so there's concern that people will continue using those tools. It will lead to a degradation of the kind of information that is put into the information ecosystem.

I think what's tough, and I haven't studied this myself exactly, so this is more speculation, but I think there's a concern that one is, like, practically if publishers don't have the money to continue existing, their work will no longer feed into the information ecosystem. But the other is if you're using a [00:07:00] tool that is intended to aggregate information, oftentimes it tries to find the middle or the average of a bit of information.

And so if you're trying to understand, like, a deeply contentious topic about which there is reporting across the spectrum, the summary that it might provide is likely to be somewhere in the middle. As you said, there's like a loss of texture or nuance, and it might result in, like, a flattening of different ideas and experiences that you wouldn't get if you went directly to a bunch of different news sources and you tried to understand how, you know, they relate to one another.

Similarly, I think one of the things we've seen, I, I alluded to this a little bit, but, um, there's been complaints in certain countries, like in Australia, a study found that if you ask questions about what's going on in Australia, you're likely to get American news sources. I think because a lot of the deals that have taken place are with American or sometimes European publishers, and their content is safer for the AI companies to use than any content from publishers that they don't [00:08:00] have deals with.

That tends to affect the kind of news that they get and the perspectives into the world that they're living in, that they have access to. I've experienced this, like, anecdotally, like if I ask ChatGPT a question about my home state of New Jersey, for some reason it'll cite The Wall Street Journal and The Associated Press and the New York Post, and I know for sure that there's a fairly robust local news ecosystem in my state.

And it's even tougher when you live in a news desert and there's only one or two outlets for a multi-county radius. And if those outlets maybe don't have deals or they deliberately try to block those companies from crawling their content then you might not get any information about where you live at all.

Alix: Which is such a terrifying prospect that we might have dramatically increased the amount of connection we have, or like the feeling of connection to information, but the actual, like, quality and relevance of information just becomes this kind of mush [00:09:00] that's, that's been constructed, um, about where we are, where we care about, the issues we care about, et cetera.

Um,

Klaudia: yeah. Yeah. I think, like, these companies are making a ton of tiny little editorial decisions that we have no insight into, and I think that's the most troubling bit about kind of like having them be the intermediaries between the producers of news and the audience.

Alix: User-generated content.

Klaudia: Mm.

Alix: Well, I feel like this is a good-- I mean, as we start talking about the extent to which people can access information about what's happening directly around them, it feels like language becomes an extremely important dimension of that.

So maybe we can turn to you, Alia, and hear a little bit about the research that you've done to understand the way that language plays a role in the outputs within large language models and kind of how you approach that, that question of localization.

Aliya: Yeah. Thank you so much, Alex, and great to be on the show.

Big fan. So why don't I start off by talking a little bit [00:10:00] about some of our research that sort of answered the question you're asking, and then where we are today. My co-author and former colleague, Gabriel Nicholas, and I were super excited to present some work that we published in 2023 called Lost in Translation back at Real ML in 2022.

And this research really began by first asking the question of why do social media companies, and just, like, those intermediaries that we're thinking about that make information around the world available to users, sort of why can't they get it right when it comes to content moderation in languages other than English?

So we went into this study trying to understand what the sort of technical limitations were. At the end of 2022, we found that actually companies, through a lot of, like, their, like, industry research arenas, releasing papers that sort of looked at large language models used not only in generating text, but also analyzing text.

They were really enthusiastic about the same LLMs that power ChatGPT or [00:11:00] Claude, their utility in analyzing and moderating content. And so we were like, you know, we can sort of examine the claims that a lot of these systems are making and, you know, they were really breathlessly enthusiastic about these systems' ability to solve the, like, multilingual content moderation problem.

That sort of assertion rested on a very real issue in the community, which was that there just wasn't a lot of high quality labeled data, structured data in languages other than English. And this was really well documented by Pratik Joshi, Sebastian Santi, a bunch of researchers in a paper called State and Fate of Linguistic Diversity in NLP, where they said that there's a real dearth of high quality labeled data sets, high quality and, and high quantity of data sets in languages other than English, despite just, like, the sheer millions, if not more, of people who [00:12:00] are speaking these languages.

And that paper sort of coins this concept that we rely on in our paper called The Resourcedness Gap, where there's a lot more data to train systems in English than in any other language in the world. And so this resourcedness gap that's been around for a long time had previously limited these companies, according to them, from building sort of like machine learning based classifiers to detect harmful speech, hate speech, et cetera.

And so what multilingual LLMs promise to do, or the LLMs sort of architecture promise to do, was learn a lot of just like universal, quote unquote, "universal linguistic rules" using sort of predominantly English language data and then apply them into non-English language settings or those settings where there wasn't enough data.

And that sort of premise or that sort of logic these systems were built on sort of relied on two really flawed assumptions. One, that every [00:13:00] language in the world operates the same as English So for example, if you're learning that like a generic English language sentence is like subject-verb-object, you can train these systems and then use that like syntactic rule and apply it in a non-English language setting.

And then it also assumed that people were speaking like that online, or people wanted to receive information like that online. Those two sort of assumptions have now resulted in this paradigm where there's a extensive research and also anecdotal insights about how a lot of outputs of these large language models sort of just failed to consider cultural or multilingual nuance in a lot of world languages, including indigenous languages.

It sort of results in this paradigm where you have something that researchers Francisco Kurch and Mark Graham are calling the Silicon Gaze, where the outputs of large language models or the applications of large language models in content moderation settings [00:14:00] sort of follow the logic of either Western or Anglo-centric assumptions.

They sort of apply the rules that are present in the West, in Silicon Valley, in the global north onto these other settings, potentially leaving up a lot of harmful speech or putting out outputs that are just not that relevant or, you know, have the sheen of Western viewpoints, even if it's in legible sounding text in another language, which in many cases that's also not true.

So that was sort of what we were trying to cover in this paper. You know, again, this was in 2023, and since then we've seen a lot more interest and focus on this topic of language. But the question of how companies are like adding on or sort of making more robust these multilingual capabilities is, is still very much in question.

Yancik: The story of Kabakwé is quite linked to my own story. I was born and raised in Cameroon, in Central Africa. [00:15:00] I come from a small town, so, uh, not far away from the border to Nigeria. After high school, I went to the capital for university, and coming from a small town, it was a big deal. I get to the capital, big city for university, and you think that you made it.

So, uh, the first six months at university was kind of great, so I was really learning, enjoying. Then six, seven months in, I started, like, making new friends there and especially talking to older students, and I realized that a lot of them were graduating and still remaining at the university. And I asked one of them, I do-was that, "Okay."

Was like a re- a naive question, like, "You already graduated, what are you... Why are you, why are you still around here with us?" He just look at me and he smile, he told me, "You will understand." That was the answer. So I was like, "Okay, okay." Long story short, I ended up understanding what he meant, especially the fact that there are no jobs.

So basically, you go [00:16:00] to university, you study, and then, then you don't find job because the formal sector is just not producing enough job for everyone. If I take the case of Mali, for example, we know that the formal sector produce every year 15,000 jobs, and we also know that every single year we have 300,000 young people get in the job market.

So 300,000 and 15,000. The math d- that doesn't add up. I had then the chance to, uh, go to Germany. I did my PhD there, my PhD in economics. Then in one of my research project, I was trying to understand or to, to, to, to analyze the productivity of African companies and so on. That's how I really get hooked with some friends about this idea of reconceptualizing learning for the African context, because we believe that right now it's not working.

It's not working because learning is disconnected from the local context. So local languages, local knowledges are kept outside of the formal education system, and that's basically [00:17:00] what we are working towards. We are working towards how the integration of the local languages, the local knowledge system inside the educational system, because that's how we believe that we can actually make learning work.

We develop an approach that we call the HyDigInos. So HyDigInos for high-tech and indigenous. And the core of the HyDigInos basically is to develop context-attuned learning pathway- around digital skills, also what we call mindset work. So basically this idea of working on your aspiration, on your beliefs and, and so on.

So basically we are-- so digital upskilling and mindset work and all of that in a context and cultural actual way. So that's basically our work. We collaborate with public organization, with governments, and right now we are working with the University of Lomé in Togo. So really trying to bring this idea, this indigenous idea within public [00:18:00] systems.

Alix: Alya, do you want to talk a little bit about translation versus localization and, and more specifically potentially like what we would see if models were not, um, just sort of being English plus plus , um, but were instead sort of taking seriously the communities that were trying to use these models in languages other than English?

Aliya: I've talked to researchers who have used chatbots in Bengali, and they'll get a completely different output. They'll actually get output in Hindi. They're getting outputs that are not necessarily actually words that are in use in that language, or words that are in use in the context in which the language is spoken.

Which then speaks to that question of like, are these systems doing the thing where the linguists have like documented in the past where there's this push towards the use of a dominant language by the use of these systems when you realize the system actually [00:19:00] can't operate in your language, and so you just end up using your dominant language rather than your mother tongue when you use these systems.

There are also a lot of gaps in the knowledge of these large language models. So for example, if a system is not trained in non-English language data on concepts of pharmacy or astronomy, they, these systems may learn that there are no analogous concepts in those languages, which is not true and is potentially also just stereotype inducing, right?

That's perpetuating some sort of like notion of which communities have these like scientific breakthroughs and inquiry and which communities don't. So This can have like a really blinder-like perspective on the world and just like a homogenizing perspective on like how the world works. Like English is supreme and everything else needs to comport to the like logics of English and comport to the sort of like inquiry [00:20:00] or political views or whatever of English.

I think we're seeing that change a tad from translation to localization. The way we're seeing that is, you know, the most recent AI summit or in other arenas, governments are now pressing these companies to offer localized versions of chatbots powered by these large language models, and they're citing linguistic diversity.

As an advocate in the linguistic diversity space, like, yes, I want more linguistic diversity, but the way companies are complying gives me a lot of pause. You know, most recently we saw Semafor write about how OpenAI wants to work with the UAE government to create a UAE version of ChatGPT and other models.

What that entails is not completely clear, but in a paradigm where what that entails is the government creating data sets to document or sort of digitize languages spoken [00:21:00] in the UAE and then give them over to train or fine-tune existing systems, that could potentially create a very narrow version of what it means for a system to work in a specific language.

If the government is deciding the contours of what belongs in a dataset and what doesn't, that could very much mediate the level of access of information users in that region get. And so just one example of that is the UAE. It is illegal in the UAE to be and engage in, like, homosexual activities, like to be in homosexual relationships.

Does that mean that if you, you know, this UAE dataset, this, like, government-powered dataset, the government decides sort of like what is the contours of the information that is available, what is and is not culturally aligned, what is and is not aligned with the languages that are spoken there? Like, that I think is very different [00:22:00] from a sort of community-powered or language, like linguistic community-powered paradigm for these systems.

Klaudia: One question I have, Alia, and I, I don't know how much you've thought about this, is when someone asks, let's say there's like a contentious conflict, for example, across like two different cultural groups or in two different countries, and someone asks a chatbot about some, you know, like ongoing or historical event in one language versus another, like, will they be presented with a different narrative based on how they phrase the question and the language in which they ask it if the chatbots are trained separately?

Aliya: No, that's a really good question, and it's certainly one we're asking as well because increasingly companies are saying that they're grounding responses with sources, like both because people are increasingly, like, becoming more sophisticated about how they ask chatbots for questions. Like "Hey, give me information about this, but cite your source," or, "Give me information on this based on [00:23:00] XYZ, based on coverage here."

And so they are being asked to be more transparent or have more structured, systemic ways to ground information. One way we're seeing multilingual capabilities of these language models improve is through the development of alternative, smaller regional models that are trying to compete with OpenAI, Anthropic's services.

So we've seen, like, Afro LM in Africa. We are seeing Sunbird AI just released a model that works in 200 Ugandan languages. Bhashini in India is smaller regional compute economic, like, cost-effective versions of models. That's all cool, but what happens when we create, like, a fragmentation of information sources where I don't trust ChatGPT because it's producing, like, Western-biased outputs, um, but I'm gonna use this, like, other system in my language.

Like, what happens to, like, the sort [00:24:00] of shared information source of truth in the world? So I feel like we're putting... at least in the language context, like we're putting users in this bind of having to choose between their language and integrity, news integrity or rights or... you know? And it's just, like, an impossible trade-off that is really just upsetting.

Like, language is so tied to identity, and I, I fear that we're using linguistic diversity to actually get away with just putting users between a rock and a hard place. On

Yancik: Fluxide, what you realize when we introduce Bambara is that of course the quality was not there. We had to Go back in the community, collect some voice data ourselves, some audio data, build the model, and try to fine-tune and so on based on the data we collected.

So we had to do a lot of engineering work to make the model kind of useful for our people. Once we did that, we realized that language, like, [00:25:00] structure the way you think or even the question you ask. Because once we introduce the Bambara, uh, option in our, in our product, realize that the question the learner will ask in Bambara, the topic they will discuss were not the same topic they will discuss in French.

Talking about digital scale, for example, like, okay, I want to use TikTok for my e-commerce business and so on, this will more likely be in French. But, uh, the questions related to, let's say, uh, I have this, I have this issue with my co... with my friend, I'm facing this issue at home with my parents, and so on.

Uh, or I need help, I need help structuring my, my myself, my thinking. So all of those type of, let's say, social, personal issues were in Bambara. Uh, in Bambara. We s- we still see that in the data today. So that mean that when we talk about language issue, it's much more about translate, much, it's much more than translation, right?

It's also about, like, which ideas even come to the mind of people [00:26:00] depending on the language they're talking with. So the community contributed the data, right, the audio data. We, in one of our community sessions, we asked them, "Okay, how do you want this to be governed? What should we do with the data?" And everyone, most of the community was like, "Oh, we want it to be open.

We want everyone to use it." Their instinct is generous, like, share it. We, we were in this awkward situation where we had to tell them, "Well, if you put this data online to share with everyone, then some billion-dollar company can use the data, then develop product and sell it back to you." And yeah, this is the type of, let's say, uh, tension that we have to deal with, that we are dealing with basically

Aliya: One question I have for you is, so There's a lot of will questions here, right? Like, do companies have the political will? Do they have the, like, social moral will to do right by people? But [00:27:00] there's also a question of, like, what are ways to facilitate access to the communities or the experts we want them to talk to?

And at least in the language world, like, if I just take their claim that it's really hard to find the open data sets to train these systems, to test these systems, it's really hard to solicit consent from participants. If I take that at its face, one of the issues there is that there isn't an intermediary that makes this available, right?

And, like, a g- bunch of groups are trying to do that. Are there similar efforts in the news media ecosystem that are trying to create norms about how they want their data used or not used by AI companies or intermediaries sort of doing bargaining on behalf of the smaller actors in the ecosystem?

Klaudia: We're seeing some of that emerge.

There's been a couple of initiatives that have taken rise, like, just in the last couple of months that are trying to come up with collective standards. [00:28:00] The group in the UK, Standards for Publisher Usage Rights, SPUR, is one, one effort at that. But I think historically publishers are like... have been really competitive with one another, and there's been a reluctance to get together and determine shared value and things like that.

And I think there's a recognition now, you know, from The Telegraph to The Guardian, that there's going to need to be a little bit more, like, cross publisher collaboration and conversation in order to decide, like, "Hey, what's good for our industry and our audience that these companies are not thinking about?"

I know for myself, the citation research that I referenced before, I had reached out to all the companies before we published the piece. Most didn't respond. The ones that did gave us just, like, standard PR statements. And then when the study came out and it got a ton of coverage and attention, I got several emails [00:29:00] from people at the companies, and it wasn't their legal team or their comms team or their PR team.

It was their engineers who were like, "Wow, thanks for doing this. We literally never thought about this. Like, we didn't realize that citation was so important to publishers." It's tough because, like, the news business is... it's very much, like, in the moment. If you are a journalist, especially if you're, like, a breaking news journalist, like, you're just thinking about what's happening right now.

And I think I'm in a lucky position, and I'm one of a lucky few who get to research this industry. The institutions that study what I study are fewer and fewer, but I get to zoom out and look at the big picture and kind of, like, produce work that I hope informs some of their decision making. But it's really tough when you're just trying to survive.

It seems like financially, like, things are just constantly getting worse and worse, and there's so many layoffs and There's so many political and financial and technological forces that seem to be on the other side.

Alix: Like how, again, back to the hubris, like how much do you have to [00:30:00] not be paying attention or like try and understand the world of journalism to be surprised that citations are an important part of it?

Also, I hear you on the competition. I think that's a really interesting challenge of collective bargaining for media companies, is that they are... There's, there's something... I think being in retreat makes reinvention really hard, and it feels like media's been in retreat for quite some time. Hmm. And I think it can feel really zero-sum and getting in defensive posture- Yeah

when you're struggling.

Klaudia: And I think the defensiveness, like I understand where it comes from, but I do think it could contribute to like a fracturing of the information ecosystem as well. I, I mentioned the robot exclusion protocol, which is like this handshake agreement where website owners can add some texts or code to their website that says like, "Don't crawl this", which a lot of AI companies aren't honoring anyway.

But when they are, that means that, you know, when you go on ChatGPT, that is pulling from a different pool of sources than Perplexity is, than Claude is, than Google is, and people don't realize [00:31:00] that they're only getting a fraction of the internet when they ask questions of those tools, and not everything that is contained on the internet.

And I think that also, um, it, it's not benefiting anyone in that case, not the companies, not the audiences, and not the publishers. And so I don't think that that's a long-term solution.

Aliya: Do you think people are becoming just more skeptical about the outputs they're receiving because of this like, you know, back to the hubris point, like even in the West, like I see a lot of analysis of like ChatGPT vernacular as distinct from how people speak.

ChatGPT uses the em dash, and so people are like, "Don't use the em dash when you write now because your boss will know you use this." And that to me suggests that there's this like growing amount of l- like literacy, desire for the human touch. Even in English, like these systems pursue a very standard American institutional Western view of English at the expense of all the different commonwealth Englishes [00:32:00] that exist, or like post-colonial Englishes that exist, at the expense of like African American vernacular English that like uses the habitual be rather than like I be working rather than like other forms of semantic organization.

Klaudia: There's a study that came out recently from the Center for News Technology and Innovation where they interviewed a couple dozen users of chatbots, people who use chatbots for news, and generally the sentiment was like almost universally positive. And There's an acknowledgement that these tools are unreliable, and yet they're so convenient.

I think people are really inundated by news anywhere you turn on social media and in your inbox and everywhere. I think people have, like, a strong sense that, like, all these forces are probably trying to manipulate us. Like, the algorithm is showing me this thing and, you know, there's so much going on.

There's so much bad stuff that's going on. I wanna have, like, a little bit of, like, a sense of, like, control over my, like, information [00:33:00] diet, and I know not to trust these things 100%, but this gets me closer to where I wanna be. And so I think there's, like, an acknowledgement of limitation, but there's still, like, a sense that this is better than the alternative.

At

Yancik: the beginning, I was, I don't know, maybe less knowledgeable or more naive or both. You realize how complicated it is actually to really get this model working in your language, in your own terms, or at least in the term of your community. One of the mentors in our community is Mamadou Kone. Mamadou is, uh, is an architect, so he's an architect.

He has spent years, or actually decades, restoring the UNESCO World Heritage Site of Timbuktu. Uh, Timbuktu is one of, in one similar city in, in northern Mali, and Mamadou Kone is the architect who has been res- restoring these old buildings in the last 20 years. So basically, he has a lot of drawings, documentation about the indigenous architecture of Timbuktu and, uh, of the Sahel.[00:34:00]

And here's the good news. He gave us the complete archives, drawings, documentation, construction knowledge accumulated over, like, a lifetime. O- our first idea with the archives was like, okay, we can use this, you know, this treasure to build an AI systems for indigenous architecture and design. We have the technical capacity, you know.

If we decide to do it today, our engineer will ship it probably, like, in one month. But right now, as we are talking, the drawings are in a safe in Bamako because we are not sure of what happens when we feed decades of this architectural knowledge into LLMs that we don't control. Because at this point, you know, we just think that openness without governance is just extraction with extra steps

Klaudia: And that's why I think there's, like, such a need for external auditing of these systems, and s- sort of like journalism related benchmarking.

Like benchmarking to test how accurately they represent the original information. But also, [00:35:00] like, over time, if you ask a question about elections or about a given topic in your area, how does that answer change? How does the types of sources that are being cited change? How does personalization play into it?

I think that's really challenging because a lot of these models are now, like, using your interaction history to determine the kinds of outputs that you get. So that it's like there's all this possibility for people to just, like, have their own... be in their own universe of information that no one else is privy to and doesn't know it's coming from.

Alix: Except the platforms themselves.

Klaudia: Yeah.

Alix: Um, because they actually see the training data and the outputs and the inputs. They then, how are they gonna leverage that unique vantage point of understanding globally how everyone is engaging with these models? And also, just the fact that they're non-deterministic systems, which means you ask the same question twice, you get a different answer.

I just find it so bizarre that we, we've created the feeling of an answer machine, but it's actually constructed in a way that isn't designed to give you answers. It's designed to give you something approximating an answer, and [00:36:00] what that's gonna do to us seems bad. This all seems very bad.

Aliya: There is... I do think that there is a little bit of, like, putting your head in the sand element of, like, you know, when we say companies know and can fix it.

Like, on language, like, I actually don't... Like, they know it's a problem. I don't think they're doing enough rigorous testing to know the extent of the problem, and that was something that we saw even from, like, the technical reports that came out between GPT 3.5 coming out and 4, where there was this, like, decline in reporting what type of multilingual data went into the systems.

There was a decline in the reporting of what the sources of multilingual data were for testing the systems, whether those tests were done. Like, in GPT-4, I think in the white paper that was released, um, that accompanied the release of the model, one of the footnotes was like, "We think the system works in X, Y, Z languages.

We [00:37:00] only tested on standard American English prompts." And I was like, "So this claim is based on nothing," you know? And then under the hood, I think there's, like, very little either inquiry into or fundamental understanding of what the, like, language mapping looks like within the model, like, how data interacts with each other, because it's based on this system of, like, we'll just train around English language data, and we'll rely on cross-lingual transfer, where the rules of the English language will apply to low resource settings.

So there's a lot of, like, it'll work because languages around the world work the way English works and-

Alix: And we want it to.

Aliya: And we want it to.

Alix: So therefore, it must be possible because we want it.

Aliya: Precisely.

Alix: I think it'd be interesting to talk about what you all want to see if this were all, um I don't know, turning around into a better direction? Um, like what would you see? And I think maybe let's start with you, Claudia, on the [00:38:00] journalism front. If this all started to more effectively support quality production of quality journalism, um, financially support the robust ecosystem of journalists and media outlets that would need to be supported to be able to produce the kind of knowledge we need to function as a society, um, in parallel to a continued explosion of generative AI tools, like what, what would you wanna see?

Klaudia: I think I don't want to take for granted that LLMs or like generative search is the future of news engagement for everybody. But at the same time, I do think it's really important for journalists to meet people where they're at. And so I think at least a portion of the user experience or like the news consumption experience will be intermediated by these companies.

And so I think there are design choices that can be made different, and there's also choices that can be made around like how AI companies choose to interact with both audiences and news publishers that could make a lot of difference. I think from the design [00:39:00] perspective, just giving publishers more insight into when and how their content is used.

Right now, unless someone decides to click on a link in an AI-generated summary, a publisher has no insight into how often their content is appearing, in what context it's being presented, if it's presented accurately, and they're also losing like really valuable insight into who their audience is. I think another one is the establishment of clear and more consistent citation standards.

You know, right now, most of the time you're not seeing a date on the source that is linked, and so you don't know when it was produced, if it's still up to date. I think there's a lot more signals that could be incorporated into the citation practices of these companies to help people understand the reliability of the citation or maybe even some sort of like UX design where if given a paragraph or, you know, a bulleted list, if you hover over a specific bit of information, maybe you can like have a window that shows [00:40:00] you where in the context of the original source that information appeared so that you can like assess for yourself whether or not it's a reliable summarization, things like that.

I think also some sort of clear linguistic or visual signals of confidence. I'm not an engineer. I don't know exactly what this would look like or if it's possible, but it would be really helpful from the audience perspective to know whether you're seeing an output that is truly based on robust sourcing or if the chatbot is just trying to deliver an answer because it's designed to like answer either way.

So I think that would be really helpful. And then one thing that's like a little bit more in the weeds, but I think can make a really big difference in terms of reliability of the outputs as well, is based on background conversations that I've had with folks that work at some of these companies, it seems that Users might get different answers depending on the pricing tier of the model that they're using.

And so sometimes a query requires more [00:41:00] compute than whatever they're paying for or not paying for at all, and the chatbot will still give them an answer, and sometimes that answer will just not be correct instead of signaling like, "Hey, you need to like upgrade to get a quality answer for this question."

And I think like if additional computational resources are required to get a higher quality response, they should just explicitly signal that. I don't think we can always expect them to like answer every query, but like I think there should just be like a lot more transparency into how the process works.

And then, yeah, I think these systems should be more willing to engage on more equal terms with publishers and just acknowledge the value that robust reporting brings to LLMs, and to be more willing to compensate not just the largest, the publishers that are most relevant for, like, you to fight some sort of, like, get ahead of any sort of, like, regulation or something like that and, like, in good faith, try to license or compensate publishers [00:42:00] if you're using their work.

I

Alix: had never thought about the upgrade to get higher quality information. That's just such a dark... Oh, God. 'Cause the premise is that individual information consumption is a consumer good rather than a society that knows things is a good thing for all of us in a way that's really, uh... Um, Alia, do you have a similar sort of set of prescriptions you would like to see as and if models continue to try and position themselves as, you know, localized or translated, wholly translated, um, products in different contexts?

Aliya: Yeah. So I think the first would be companies should disclose a lot more about their data sources. Where is data coming from? What's the share of the data that's in each language, and where is that data coming from? Where is the data from that tests systems? You know, like, more transparency into, like, the data that train systems, but also test systems and disclosure of sources of that data.

I think that would go far to just, like, [00:43:00] enable the public, but also, like, NLP professionals who are going to be interrogating these systems or building on top of them understand sort of what the limitations are. So I think that's one big bucket of it. And then a second is that companies need to do a lot more to, like, engage with language speakers, native language speakers, including the subject matter experts who speak those languages and are familiar with the context, because language is really, like, inextricable from culture, from context, right?

Like, it's one thing for a system to work in French. It's another for it to work for French-speaking people in France or in Quebec or Senegal. Those are three different contexts and will require three different troves of data to contextualize and make outputs relevant. I always like to give this example, but currently, even multilingual evaluation mechanisms over-index on sort of [00:44:00] Western assumptions or perspectives.

So for example, like, if you are testing whether a system works in language and you use the, like, MMLU, which is, like, the, like, main benchmarking tool to measure performance, it's a set of multiple choice questions. So you're testing the model's performance on Whether the model can spew out legible sounding text in a certain language about the First Amendment, which is one of the questions in the MMLU, and not on the similar article that enshrines freedom of expression in whatever context, linguistic context that system works in.

So, for example, you're checking if a system can answer the question about First Amendment in Tagalog, not checking about the article about freedom of expression in the Philippines. So that's a huge gap. You can try to address that by, A, disclosing whether multilingual/multicultural benchmarks are used.

You can disclose that by engaging subject matter language experts. You know, that's just [00:45:00] again, you know, as I said earlier, like that's just the tip of the iceberg. You need to be engaging these experts at all stages of model development and application development, both at the pre-training level, the conception level, the design of the system, and then the evaluation and the deployment.

So these communities really are gonna be critical in determining, like what data is important for the types of prompts that are gonna be used and, and what sort of safety means and how refusals should be handled, et cetera. They are all social questions as much as they are technical questions, and so people should be part of that.

And I think finally, like we need to, as a society, like caution against a belief that these systems are working the way they should or that they are magical systems that work in these languages. I think I see a lot of, you know, enthusiasm around how these systems are legible or legible sounding in languages other than English.

And sure, [00:46:00] like, you know, Google Translate has, for example, like become a lot better than it used to be, and it's so helpful when you're like traveling and need to like translate like where the bank is. But that is not the same as using it as like a primary information intermediary. I've seen governments try to use similar LLMs to make emergency services available to constituents.

I've seen similar systems be used to translate articles to expand the audience base for a journalistic platform. Um, and now unfortunately, we're seeing LLMs be sort of weaponized for social media monitoring and immigration contexts. And I think there's a wholesale belief that these systems, because the like language that they output is like impressive sounding or legible, um, there's like a desire to believe that they can do more than they can.

So I think exercising caution and as like a call for everyone is what I'll leave us with.

Yancik: We refer to ourself as a learning [00:47:00] organization. What does that imply? It implies that we have to build, we have to shape, but you also have to research and learn. And that's why, you know, we have engineers or we have researchers in the team.

So we have a team with researchers at Kabakoo. When I was, uh, in grad school, there was this line that there's nothing more practical than a good theory. And this kind of stay with me, you know, and that's how we building Kabakoo. That's how we working, you know, like really integrating the research, integrating the sh- the, the building and the, and the learning.

Now, our shining house on the hill is quite simple. It's a world where a random young woman, young man in Bamako, in Lomé, in, in Niamey is able to live their lives on their own terms. So this is what we are working towards. So, and living their life on their own terms basically means first reclaiming their agency, being able to make sense of their surroundings, and then acting on that sense [00:48:00] making.

Based on that, being able to enjoy socioeconomic mobility. This basically what we are, what we are working on every day.

Alix: Thanks again to Yannick, Claudia, and Alia for joining us. This series was produced in collaboration with Real ML. For the past six years, Real ML has brought together people around the world working to challenge the power and inequities built into AI systems, not just through critique, but also through practice, and many of the people you hear from in this series met or developed their work directly through Real ML workshops where ideas are tested and collaborations are formed, and actually last year someone said soulmates were found.

I've also served on Real ML's board, and it's one of my favorite communities, and I mean that sincerely. I really love any time this group of people gets together because magic always ensues. And to learn more about Real ML and future workshops, you can check out the link in our show notes. A special thanks to Anna Bacciarelli, Isha Keegan-Nushabadi, and Shazeda Ahmed from Real ML.

And thank you to our production team, Sarah Myles, Georgia Iacovou, Kushal Dev, Marion Wellington, Van [00:49:00] Newman, and Zoe Trout.

Stay up to speed on tech politics

Subscribe for updates and insights delivered right to your inbox. (We won’t overdo it. Unsubscribe anytime.)

Illustration of office worker in a pants suit leaning so far into their computer monitor that their entire head appears to have entered the face of the computer screen