1.5.2
Newsjunkie.net is a resource guide for journalists. We show who's behind the news, and provide tools to help navigate the modern business of information.
Use of Data1.5.2
1.5.2
Lynda Kellam is a data librarian and director of research at an Ivy League institution, and a founding member of the Data Rescue Project (DPR). This interview was conducted on December 19th, 2025.
“All of those databases are gone. We didn't want that to happen again.”
Morgan Kriesel: I wanted to start with the founding of the Data Rescue Project: when it began, the motivation for starting it. Could you tell me a bit about that?
Lynda Kellam: Most of the DRP members and people working with our steering committee are data librarians and academics. It's a small community, really. A lot of us know each other through different organizations, and we had been communicating individually about how to respond to what we thought might happen, starting in October of 2024.
A lot of us were involved in the data rescue efforts that were happening in 2017, so we had some connection. In 2017, it was primarily focused on concerns about environmental data and climate data. So we had a little bit of a wait-and-see attitude [in October 2024], because a lot of that [2017] data was never taken down, so things were rescued, but it didn't go away. So when January [2025] came around and we saw the CDC data go down, that weekend a lot of us started getting emails from our patrons, from our researchers and our institutions, panicking about the loss of the CDC data. And then for me, I was getting questions about USAID data and the website, because a lot of people I work with use that website or use that data.
What we're seeing this year is—it's not just environmental data that's been targeted. And I don't think anybody really expected what happened. So what was really disconcerting was seeing this attack on social data, and anything that was like, sexual orientation, gender identity, ethnicity, race, education, all of it, being at risk, if not potentially altered or taken down.
Initially, we were just trying to gather all of the information out there for people who were wanting assistance, either finding alternative sources or backing up data, because we knew that there were efforts already existing. Originally, it was really focused on our patrons, trying to help them. And then it became bigger, because nobody else was really responding in that kind of way.
It started gathering momentum when we had this Google Doc that went viral, that had a bunch of lists of all the different kinds of institutions that were backing up data, and that's where we started connecting with others across the movement. In that first week in early February, we got together and decided to create what became known as the Data Rescue Project. We had some support from a group called Saving Ukrainian Cultural Heritage Online or SUCHO. A member of their steering committee came to us and helped us with the infrastructure. He helped us set up a website really quickly, and using his lessons learned from SUCHO, helped us set up the process for doing asynchronous data rescues. So it was a confluence of all the right people in the same place at the right time.
When we were meeting in February, we decided that we would try to coordinate, because we could see that there was really no one taking that lead in terms of coordinating who was doing what. Like, if somebody comes to us and asks us, “Oh, should we rescue this?” saying, “No, the PEDP or this other group has it. Don't worry about that particular thing.” So trying to serve as that central point where people could—a clearing house is the word we use—where people can learn more about what was going on.
We work with a large number of groups who are connected with Internet Archive, or connected with the End of Term web archive. PEDP (which is the Public Environmental Data Partners), EDGI, Data Index, ESIP. I mean, you name it, we probably know it. On our website, under Current Efforts, that page has all of the groups we are in contact with, almost all of them. Some of them do things a little more on the back end, they're not as public about what they're doing for various reasons.
And then also communicating out. We have a big Bluesky presence communicating out about what was happening or who was out there, what efforts were pre-existing, and what was being created.
And then the final thing we wanted to do was any asynchronous data rescues we could do for social data, so data about people. Because we knew that other groups were working on environmental and climate data, and they're the experts in that, so we didn't want to duplicate their work. They needed us to stay out of their way and focus on things we knew more about, so we started with Department of Education data, assuming that it was going to be at risk. So it was really motivated by seeing USAID going down and realizing that we couldn't wait. If they were going to take down an entire website and dismantle an entire agency, we had to act.
You mentioned USAID data. Who was it that you were working with that was missing this data?
There's a researcher I work with who had been using the Demographic and Health Surveys [DHS]. And the Demographic and Health Surveys are primarily a restricted access dataset that you have to apply to get access to, and it suddenly went down—that application process was closed. It wasn't clear if people would still be able to get it. So that was the person I worked with most closely.
But then we have people who are asking us about other indicators, and where it was backed up and who had it. That's why we did a backup of the API for the DHS indicators, just to ensure that if that API went down, we would be over it, so people would still have access to the data.
And then there were people trying to get access to what was in the DEC archive, which is all of the PDF documents from USAID’s Development Experience Clearinghouse—I mean, there's thousands and thousands of these things. We didn't create this, there was another group that just happened to have crawled them all, I think for AI, for a model learning purpose. They created DECfinder, which is available for people to get access to those.
And that's the thing—we can go into Wayback Machine and see the page, but we can't actually use the database. That's one of the saddest things about this for me, the fact that we weren't able to respond quickly enough to the USAID situation. All of those databases are gone.
We didn't want that to happen again. So that's why we started with the Department of Education.
So there's [data], like from USAID, that's just not gonna come back. It's gone.
Yeah, it's not accessible through anything that we can find. We've tried to ask through the communities, if people know, if somebody might have backed it up somewhere. And the truth is, some of the data, like climate.gov, went down, but we know that that data is backed up. What was on climate.gov was visualizations of the data, we know that real data is backed up somewhere. It still exists in either NOAA or NASA. But some agencies, I don't even know where their data would be. I don't know if the State Department has that [USAID data]. It's just not clear.
It is gone. And I know there are stories about how they destroyed some records too. So it's, yeah, it's really troubling.
We only work with public data, publicly accessible data. Actually there was a 404 Media article that came out. It was Sam Cole’s, and she’s right in this title: “Massive, Unarchivable Datasets of Cancer, Covid, and Alzheimer's Research Could Be Lost Forever.” That article is interesting, because it was talking about restricted access data that is no longer—the application processes were no longer open. PRAMS is one of the major ones, the Pregnancy Risk Assessment Monitoring System. After that article went out, it went viral. Like, real viral. And people were starting to call for, like, “We need to get our datasets.” So we put out posts saying, “Please do not try to get this. If you do this, it could risk making personally identifiable information public.” That's why these things are restricted.
So we do not have the capacity to do that. All the data that we go after is publicly accessible and in the public domain. But that is a concern. Those restricted access datasets, like PRAMS, like Demographic and Health Surveys, like a lot of BLS surveys, are just—the application processes are not continuing because they don't have the staffing to continue their application process. Some of them come back, like DHS came back after the Gates Foundation gave them some money.
But yeah, it's one place that was really troubling to me, because that data is so critical for researchers to be able to ask complex questions about various things, and there's nothing we could do about it.
“It's more complex than just disappearing data because there's also altered data, or data that's just at risk. Our data expertise is at risk.”
How has the media coverage been, from your perspective? Are we getting it right? Are there holes in the coverage?
I think the framing has been one of the challenges. I think the other thing is it's such a complex issue. Using terms like “disappearing data” is not—it's a catchy title, but it's not necessarily the truth. It's more complex than just disappearing data because there's also altered data, or data that's just at risk. Our data expertise is at risk. So, yeah, I think that's been the hardest part for the media, is being able to actually capture the complexity.
It's also ever-changing. Somebody was wanting to create a [project] that would highlight all of the different things that were going on, but it never got off the ground. Nobody ever did it that I know of. We've tried to do that with our efforts page. But I'm looking at today and seeing mistakes. And I mean, we had our tracker, and then three months later, we were able to create the [overlay], and people were still citing the tracker.
One of the things I've been trying to do is create a clearinghouse of my own, just for news media about what's going on to capture that history, in a sense, but also to keep track of when I see something new coming up, like a new organization. It's just an old Zotero library now, so like, if you wanted to understand what was going on, here's the essential reading, here's where you should start. But that's a librarian talking.
I think that's the challenge: finding a way for people to get caught up if they haven't been paying attention. It's interesting, because lately I've been asked to go into classes and talk about this—what's happened in the past year. I actually had somebody ask me if there was any readings the students could do, and I was like, “Wow, okay. We've actually gotten to that point where I could suggest, here are some things you could read that would help you understand the moment.”
“We have to move away from this idea that there's one record, and that's it.”
Tell me a little bit about what you were doing before the Data Rescue Project.
So this [position at DRP] is a volunteer position. I'm still a manager in the library department of an Ivy League institution. I've been working as a government information data librarian since 2007, and now I manage a group of people who do that work. My research is on government information preservation, it has been for a very long time.
Wow. Okay, so you've been prepared. Like, nobody could have been prepared for this, because it's insane, but you were the closest.
I mean, there's communities of people who have been pointing [this out]. The PEGI [Preservation of Electronic Government Information] project has been talking about this issue of electronic government information for a long time.
In the past, the dissemination of government information—it was primarily printed, right? And so you could put out information to various libraries, that are called depository libraries, and people’s [local] libraries are members of that. It's called the FDLP, the Federal Depository Library Program. And then, since the 90s, and this migration to more electronic information. We've seen indicators of challenges ahead when it comes to that, and certainly for data, (which never was really through that Depository Library Program, it was a little bit separate), it's been increasing challenges of knowing how we are going to access and preserve this information for the long term.
It's definitely an ongoing, known challenge when it comes to the ephemeralness of electronic government information. And now we have, on top of that, the political attacks on electronic government information. So that's our perspective, the librarian perspective: this is more on how we preserve and disseminate and make sure people have access.
You have others, like American Statistical Association just put out a report called The Nation’s Data at Risk, and that really is looking at the stability of the Federal Statistical system. If you haven't read that report, you really need to look at it. They do a great job of showing the gutting of the Department of Education, and talking about the challenges facing the overall statistical system. But they're looking at it from more of the protection of the Federal Statistical system perspective, and that expertise perspective. Our community approaches it a little bit differently.
Yeah, tell me about that—about preserving as much of this federal system as we can, versus trying to build things independently, building a separate infrastructure. Where do you fall on that?
I mean, I think there needs to be reform of the Federal Statistical system. We know that there needs to be reform of that. It’s not that we need to—I don't think there's going to be a shadow system, or anything like that. I think that what we're realizing is that a lot of us defer to, or look to the government as being the keeper of that data.
Even ICPSR. So ICPSR is one of the oldest archives of data in the country, and they have government data in their catalog, but they're not backing it up, they're just pointing people to those access points, so they're referring people over to government websites, right? And the challenge now is that we don't know that we can do that anymore.
So where I'm coming from is that, we've wanted to say that there needs to be one backup of record. And that's not going to be a sustainable model in the future. We have to move away from this idea that there's one record, and that's it. Because of the ephemeralness of the internet.
“When you have a partnership of nonprofit organizations coming together, one thing I think about is the stability of these structures we're creating—how long [will they last]?”
So what is your idea of a sustainable model?
Having mirrors outside of the United States for the larger datasets, especially environmental ones. I think those mirrors are a possibility, and especially if the establishment of those relationships considers the long term preservation aspect.
I'll give you a concrete example, PANGAEA is a repository based out of Germany, and they were very interested in mirroring NOAA data because it's important for their work. And so they reached out to NOAA to try and establish a relationship. And I think that's the model for what we really need in the future—having mirrors of datasets across borders.
The challenge, though, is—what happens to PANGAEA’s data if it goes down? Who comes in and takes over, right? So having some idea of continuity of resources, I think is one thing.
Another aspect of it is, if the government is taking down NOAA data or making it inaccessible, who steps in to kind of bolster that? Nobody has the resources to recreate what the government does. That’s the part I’m not sure about. No one institution has those resources. There are a lot of efforts to talk through what that could look like right now, but I don't think anybody has an answer. Certainly, PEDP has done a great job of recreating tools based on the backed up data. But we have to think about the long term sustainability of those tools. When you have a partnership of nonprofit organizations coming together, one thing I think about is the stability of these structures we're creating—how long [will they last]? How?
I'm not sure what the answer is. There are different answers out there, but I'm not really sure which one yet.
What have you been hearing? You said there's no concrete answer, but what's being thrown out there?
ESIP has been putting out information this week about creating resiliency scorecards for repositories, which is actually more useful on an academic side of things. A lot of universities have repositories of different kinds, and so they've created the scorecard for understanding the resiliency of your repository over time. Resiliency in terms of security—the way they focus it is how well it can withstand crisis conditions. This is a new thing that they just put out there to get community feedback on before it's launched. But I think this would be a helpful thing for the future. Because what people don't understand is there are a lot of repositories that are either funded by the government, or partially funded by the government, housed at universities. I mean, there's just tons of these, and there's so much data that's beyond what we've been capturing with the DRP. There's a lot of data out there that sits in different sites around the country, and around the world as well.
[As for continuing data collection,] I just don't know, it’s so early on right now. I think there are places where groups are having those discussions. It's just—it’s a big beast.
I would say the most productive discussions are the ones that are happening per sector, so per disciplinary area. Like, for the environmental data, or for education data, or for demographic data. I think if those [areas] are where we're seeing a little bit more productive conversations, then we're not going to come up with a solution for the entire system.
“That's what I'm really hopeful about—the coming together of all the different communities. None of that would’ve happened before this year.”
I feel like there's an opportunity for all of these local data collection efforts and, like you were saying, these sector conversations to be brought together to, maybe not get back what we had, but at least get the ball rolling. Am I being too optimistic? Why not a shadow structure? I guess that’s what I'm trying to ask.
Well, there are shadow structures that already exist. A concrete example is education data is reported from the states to the Fed. So that's something that I would be optimistic about—if states were to get together to create their own kind of reporting mechanisms. I think California could probably be a leader in that. The challenge is getting all 50 states to do that, especially in this environment. But, yeah, there are definitely efforts that are promising, that use those alternative sources.
When it comes to retaining access, I do still have faith in federal workers to do their work, to do the work that they are committed to do, and to try to do the best they can under the conditions. Part of that's because I come from a federal employee family, but I hope that there are still people doing the things that they need to do within the agencies, or at least, trying to keep the data going as best they can.
I think what I have hope in, is that bringing attention to the problem will put pressure on our leaders to at least try and do something. And that might be way too optimistic. I still have hope for democracy in some ways.
When we got together, we were like, “we’re focused on preservation.” That is what we do as librarians, and what we're going to constantly talk about. But it's really morphed over time to saying, “No, public data is a public good. It should be public-accessible.” That's our mantra for the Data Rescue Project. It's a little bit more advocacy-focused than what we were seeing before, but it's because we recognize that the government creates this data for the people, and those people should have access. It's a fundamental tenet of democracy to have an informed electorate, and to have an informed electorate, you have to have access to this information.
So I'm hopeful that getting those kinds of messages out there to the general public—they can put more pressure on public and local leaders, who can probably put more pressure on the federal government to knock it off. Anyway, I try not to be completely hopeless about this, otherwise I wouldn’t be able to do any work [she laughs].
This week we got information about NCAR [National Center for Atmospheric Research], and the American Geophysical Union just had a big meeting, so they were doing all this advocacy around it, and now there's an open letter. That's what I'm really hopeful about—the coming together of all the different communities. Not just librarians, but also activists, people working in NGOs, former feds, think tanks. None of that would’ve happened before this year. We have contact with such a wide range of people now that we didn't have before when it comes to this question of electronic government information.
Edited for sequencing and clarity.
Sources
Newsjunkie. Lynda Kellam, interviewed by Morgan Kriesel, December 19, 2025
© Newsjunkie 2026
Virtual
College Park, Maryland
The Public Environmental Data Partners are committed to preserving and providing public access to federal environmental data
San Francisco
The Internet Archive is an independent digital library. With a collection spanning billions of websites, books, movies, music and software, IA is among the largest digital repositories in the world. Its stated mission is to provide “Universal Access to All Knowledge.”
Latest news on efforts to remedy the erasure of government/public data and web sites
Get involved: The Internet Archive is once again leading a coalition to collect and preserve government documents and data that are at risk during presidential transition periods. This page provides info and tools to help the End of Term Harvest.
San Fransisco
American research group focused on highlighting the public's Environmental Right to Know (ERTK).
Virtual
dataindex.us is a collaborative effort dedicated to monitoring changes in federal datasets. Its team is building tools to track a broad range of changes in federal datasets to build actionable insights for policymakers, advocates, journalists, and data users.
Virtual
Preservation of Electronic Government Information (PEGI) is an initiative to address national concerns regarding the preservation of electronic government information by cultural memory organizations for long term use by the public.
Virtual
ESIP is a home for Earth science data professionals. As a nonprofit funded by cooperative agreements with NASA, NOAA, and USGS, it brings together interdisciplinary collaborations to share technical knowledge and engage with data users.
Alexandria, Virginia
Bremen/Bremerhaven, Germany
PANGAEA works at archiving, publishing and distributing georeferenced data from earth system research.
Ann Arbor, Michigan
ICPSR is a unit within the Institute for Social Research at the University of Michigan and maintains its headquarters in Ann Arbor, Michigan.
Washington
International nonprofit scientific association supporting a global community interested in advancing discovery in Earth and space sciences for the benefit of humanity.