This blog entry is a collaboration that connects graduate and faculty digital humanities practitioners from the Scholars' Labs Praxis Fellowship and IATH Resident Fellowship, both housed in the UVA Librarys Digital Humanities Center. Through conversations, a public event, and writing, the 2025-26 Praxis Fellows engaged with IATH Fellow Professor Mark Sicoli's scholarship and explored the technical and intellectual processes involved in designing, implementing, and sharing complex and ambitious digital scholarship.
In the common imagination, archives are dusty rooms, tightly packed with boxes, papers, folders, and books. They are a symbol of the desperate human attempt to stop the flow of time, to freeze history so that it might be accessed in the present. They are static and ghostly. They can only speak through yellowed images or text on brittle paper. Their doors are often open only to scholars and researchers. Those who are represented by the data housed in archives are too often denied access. These critiques are not new, but solutions are not easy to enact. How do you build a living archive, one that grows, talks, and changes? How can it be a space for community, instead of secluded research?
Mark Sicoli's IATH Fellowship project, “Data Back!”, has created one such archive, a "living infrastructure" and a tool for collaborative, community building, and preservation. This work was made possible through IATHs Resident Fellowship, which provided Sicoli with dedicated technical support and institutional infrastructure to transform years of fieldwork into a sustainable digital resource. Sicoli, along with student assistant Emma Broadwater, and technologists Shayne Brandon and Doug Ross from IATH, spoke at UVA earlier this month and shared their building process for a digital living archive of the Zapotec language, spoken in the Mexican state of Oaxaca. Sicoli, whose book Saying and Doing in Zapotec explores the relationship between language and social action in Zapotec communities, developed the project in collaboration with scholars, the indigenous and international Zapotec-speaking community aims to construct a linguistic survey of Zapotec and Chatino languages. It also includes a lexical database of Lachixío Zapotec to share the data with speaker communities and support the teaching as well as research of the language.
Native speakers are participants, not only in the creation of the archive, but also in its upkeep, growth, purpose, and use. This approach differentiates the project from other language documentation initiatives often created by both researchers and Indigenous language speakers, but archived and accessible almost exclusively to benefit scholars.
Rather than relying only on professional linguists, the project trains community members to document the language of their own regions. Within nine months, participants completed 122 surveys, resulting in more than 300,000 recorded utterances. The findings revealed patterns of language endangerment and discrepancies between census reports and field observations. Using Mukurtu, a free, open-source platform designed for community-managed digital heritage, created by Washington State University's Center for Digital Scholarship and Curation, the project is building an archive governed by community-defined cultural protocols. These protocols provide granular control for the Zapotec community to determine who can view, edit, or publish specific materials, rather than defaulting to the open-or-closed binary typical of institutional archives. It allows control over access levels, the addition of audio recordings and example sentences, and the generation of language-learning materials.
Making this possible required significant technical labor: Shayne Brandon converted legacy lexical data from a 1990s-era format (Shoebox) into standardized Unicode and then CSV for import into Mukurtu. He prepared roughly 6,000 dictionary entries while also writing scripts to extract over 300,000 individual audio clips from long survey recordings using ELAN annotation files and UVAs Rivanna high-performance computing cluster. Doug Ross, meanwhile, emphasized that the projects reliance on well-structured data standards, particularly XML-based ELAN files, gives the archive a durability that outlasts any single software platform, allowing audio, transcription, and timecode to remain linked and transformable into new systems over time. Sicoli described it as an ongoing, multi-author resource shaped by decades of collaboration and continued community consultation. Importantly, the online nature of the archive allows diasporic Zapotec-speaking communities to contribute to and draw from the collected data, and fosters cultural-linguistic links beyond borders.
Language is ever changing. It shapes cultural concepts and habits, how one thinks and lives. It calls for an archive capable of capturing its vitality without crystallizing it. Data Back! proves that building a living archive is not only possible but necessary. When one replaces the image of the dusty room with a living network of connections, voices, and inputs, the archive ceases to be a walled, frozen institution and becomes a breathing organism whose growth and preservation are obtained through a communal collaborative effort.
Thanks to the Praxis Fellows, 2025-2026!