LDaCA Newsletter Quarter 3 2024



LDaCA Newsletter — Quarter 3 2024
LDaCA logo with text Language Data Commons of Australia on a colourful background with black to green shading

LDaCA Newsletter — Quarter 3 2024

Welcome

Welcome to the third issue for 2024 of this newsletter about the activities of the Language Data Commons of Australia (LDaCA) and the Australian Text Analytics Platform (ATAP). This quarter, we announce the new phase of the project, advise of several workshops and other events we will be running, and report back from a symposium hosted by the Australian Research Data Commons (ARDC). If you have any questions or feedback, please email us at ldaca@uq.edu.au or message us on our LinkedIn page.

News

New phase of LDaCA

We are delighted to announce that the ARDC’s investment in LDaCA will continue until June 2028. In this next phase, LDaCA will:

  • develop the social and technical foundations for a national, distributed archival repository of language materials

  • continue securing vulnerable and nationally significant collections of Aboriginal and Torres Strait Islander languages, Indigenous languages in Australia’s Pacific region, varieties of Australian English and migrant languages, and sign languages of Australia and its region

  • continue to develop the LDaCA data portal for accessing and repurposing language data of significance to researchers and communities, including data which is held in galleries, libraries, archives and museums (GLAM)

  • establish workflows which link repositories and analytics environments so that researchers can create fully described, reproducible research on written, spoken, multimodal and signed language

  • provide training and develop resources for researchers and communities which support best practice in archiving, sharing, accessing and analysing language data in line with FAIR and CARE principles.


In this phase, a new Chief Investigator (CI), Dr Rose Barrowcliffe, joins The University of Queensland (UQ) team (Prof Michael Haugh and Dr Martin Scheweinberger) and three new partners will join LDaCA: Batchelor Institute of Indigenous Tertiary Education (CI: Prof Kathryn Gilbey), the Australian Digital Observatory (ADO) (CI: Dr Marissa Takahashi) and the University of Western Australia (CI: Prof Clint Bracknell). Monash University will not continue as a partner, at least in the short term, and we would like to take this opportunity to thank CI Assoc Prof Louisa Willoughby and her team for the wonderful contribution they have made over the past two years. Our other partners will continue their involvement: Australian National University (ANU) (CI: Prof Catherine Travis), The University of Melbourne (CI: Assoc Prof Nick Thieberger), The University of Sydney (USyd) (CI: Prof Monika Bednarek), AARNet (CI: Adam Bell) and First Languages Australia (CI: Beau Williams).


The project is part of the ARDC’s
HASS and Indigenous Research Data Commons (RDC), which is establishing long-term, enduring national digital research infrastructure. It supports researchers in harnessing research data to enhance Australian social and cultural wellbeing, and helps us understand and preserve our culture, history and heritage. In partnership with research institutions and government, the HASS and Indigenous RDC has achieved joint resourcing of more than $40 million over the coming four years. LDaCA also welcomes the new focus areas which will increase the size and value of the RDC: the social sciences project (currently in co-design), the Australian Internet Observatory and Australian Creative Histories and Futures.

PILARS

As reported in our previous newsletter, LDaCA hosted a workshop in Naarm/Melbourne in February with the aim to create an agreed framework for talking about data management across the partner projects in the ARDC’s HASS and Indigenous RDC. A working group led by Peter Sefton was formed at that event, and the result of their endeavours is the Protocols for Implementing Long-term Archival Repository Services (PILARS). Project lead Michael Haugh described PILARS as a guide to implementing “CAREful FAIRness”, referring to the CARE and FAIR principles. The current version of the Protocols is available along with further background information (see also below under New online content).

New online content

We have added several new pieces of content online:

Drawing of native wattle tree with circles scattered throughout with the themes of the Voices of Country action plan.

Voices of Country themes represented as a native wattle tree.

Image Source: Gilimbaa with cultural elements created by David Williams (Wakka Wakka)

New publications

There have been several publications released that feature, were authored or were co-authored by LDaCA team members:

Team updates

Harriet Sheppard (formerly a Research Data Analyst) has left to work as a linguistics lecturer at the University of the South Pacific in Suva, Fiji and Maria Weaver (also formerly a Research Data Analyst) has left our team. We wish them both the best!


Also, in the last edition, we advised that we had two new team members at the ANU and included an introduction from the first team member. Here is a belated introduction from the second team member:


Hi, I’m Anisa Puri, and I’m a historian and research manager who lives on Boonwurrung Country in Naarm/Melbourne. I have been working as a historian, within and beyond academia, for the last 14 years. I have a Master of Public History and a PhD in Historical Studies from Monash University, and I specialise in oral history, Australian history and migration history. I’m delighted to have joined the LDaCA team at ANU as a Research Project Manager. My current focus is on developing a new project which aims to enhance the findability of existing oral history collections in Australia. I look forward to sharing more details about this project in the coming months!

Events

Upcoming Events

Summer School 2025 topics workshop

When: 31 July 2024, 2 – 3:30 pm

Where: Online

Run by: The ARDC


The ARDC is hosting an online workshop to gather ideas for topics that should be covered in the HASS and Indigenous RDC Summer School 2025, to be held 3–6 February 2025 in Brisbane. Summer School 2025 aims to help attendees understand and gain skills in using the HASS and Indigenous RDC infrastructure in future work, and around Indigenous Data Governance and managing data. Registrations open.

Data migration skills workshop

When: 3–5 September 2024

Where: ANU

Run by: LDaCA


The LDaCA infrastructure is based on widely-accepted standards such as Research Object Crates (RO-Crates) and the Oxford Common File Layout (OCFL). Our project has been running for more than three years, and the processes and tools for aligning data with these standards are now well-developed. This workshop aims to show the application of those tools to data in a variety of formats to efficiently migrate material to the LDaCA standards.

UQ R&I Week 2024 events

When: 30 September – 4 October 2024

Where: UQ

Run by: UQ Research and Innovation (R&I) Week


LDaCA will be involved in two events during the UQ R&I Week 2024:

  • 30 September 2024, 4–6 pm at the UQ Anthropology Museum — A panel discussion about issues in Indigenous Data Governance. The panel will include Dr Rose Barrowcliffe, Robert McLellan and Senior Manager Lesley Acres from Aboriginal and Torres Strait Islander Collections and Services at UQ Library. The discussion will be moderated by Grant Sarra. Registrations open.

  • 1 October 2024, 11 am – 1 pm at Room 275, Global Change Institute, St Lucia campus — A contribution to the session ‘Showcasing the UQ HASS&I Research Infrastructure Capabilities’. Registrations open.

Joint ADO workshops

When: October 2024 (exact dates TBC)

Where: Online

Run by: LDaCA and ADO


A glamorous introduction to text analytics: In this two-part workshop series, participants will learn to collect posts from Reddit — using the very sparkly example of the Eurovision Song Contest — and apply a range of text analytic techniques to them. Some experience with R coding preferred, no experience with social media data collection or text analysis required.

ALS 2024 masterclass

When: 26–29 November 2024 (exact date TBC)

Where: ANU

Run by: The Australian Linguistic Society (ALS)


Martin Schweinberger and Sam Hames will present a masterclass at the ALS 2024 conference called ‘Improving Transparency and Reproducible Results in Linguistics’. More details are forthcoming once the conference schedule is announced.

Recent Events

Pilot of new ‘Introduction to Computational Text Analytics’ workshop

Martin Schweinberger and Sam Hames piloted their new introductory workshop to 10 people at the UQ on 23 and 24 May. Drawing on the Language Technology and Data Analysis Laboratory (LADAL) tools and a set of interactive exercises, this workshop is intended to provide a starting point for people with no background in programming or data analysis to use text analytics approaches.

Seven people are sitting at a desk, with four looking towards the camera. Two other people are standing at the back next to a TV screen. The screen is showing a website. The desk is covered with various objects like pieces of paper, water bottles, laptops and disposable coffee cups.

Participants in an introductory text analytics workshop with Sam Hames (back l.) and Martin Schweinberger (back r.).

Image Source: Sam Hames

ICA 2024 presentation

Chao Sun (Sydney Informatics Hub, USyd) and Olga Boichak (Computational Social Science Lab, USyd) attended the 74th Annual International Communication Association (ICA) Conference (20–24 June 2024, Gold Coast). On 23 June, Chao Sun demonstrated ATAP tools for corpus processing and analysis in a presentation titled ‘Integrated Text Analytics: Unveiling Insights With ATAP Corpus and Diverse Analytic Tools’. This presentation was part of the Computational Methods Tool Development session organised by the Computational Methods division that brought together a group of international presenters and attendees working on text analytics.

Chao Sun presenting at ICA 2024.
Image Source: Olga Boichak

Corpus Spotlight

This issue’s spotlight is on the corpus of archived materials from the Morehead documentation project, which aimed to extensively document two undescribed languages, Nen and Kómnzo, from the Western Province of Papua New Guinea.


Aside from linguists, there was also an ethnobiologist and a botanist on the documentation team. To collect both language and ethno-ornithological data, the team went on ‘bird walks’ with language consultants and local bird enthusiasts. They recorded birdsong alongside linguistic content, such as narratives about bird behaviour and bird lore, or elicitation of bird names.


Julia Colleen Miller was an ANU Postdoctoral Research Fellow and PARADISEC archivist when she worked on the team in 2012 and 2013. Julia has made a fascinating video ‘
Linguistics and Ethno-ornithology: Bird songs and bird stories from the Western Province of Papua New Guinea’ weaving together “photos, videos, audio soundscapes, birdsong, short stories in the Nen language, and visualisations of acoustic data”.

A screen capture from a video. The background shows trees in a rainforest. Superimposed over this is a picture of a black bird with a white bill. Next to this picture is a text box with the label "tibrom 'Greater Black Coucal'". The text box contains other text about the bird including example sentences. Running along the bottom of the screen is a white sound wave.
A screen capture of a video. The background is a path through a forest. Superimposed over the background is an image of a black bird with a white bill. Underneath the picture is white text in both Nen and English, with 'tibrom' and 'greater black coucal' in orange text. The English text says "the warriors knew when they heard the greater black coucal shouting that daylight was approaching".

Screen captures relating to the bird tibrom (‘Greater Black Coucal’) from the ‘Linguistics and Ethno-ornithology: Bird songs and bird stories from the Western Province of Papua New Guinea’ video.
Image Source: Julia Colleen Miller

All materials have been archived in the DOBES Archive, part of The Language Archive located at the Max Planck Institute for Psycholinguistics in Nijmegen, Netherlands. Read the project page to learn more and visit the archive page to access the materials, including recordings, photos, elicitation materials, field notebooks and dictionaries.

Team Member’s Tip

Catherine’s tip: My tip relates to preparing ethics applications for collecting language data to maximise the possibility of reuse. This means gaining approval for (1) data sharing, with appropriate restrictions as relevant to the collection, and (2) long-term archiving of the recordings (as it is often assumed by ethics boards that the raw data should be destroyed once the research has been completed).


These can be addressed in questions around data storage, with statements like:

  • The de-identified transcriptions and audio files of those participants who give their permission may be incorporated into web-based corpora and made available to others, with approval by the investigators.

  • The de-identified audio recordings will be stored indefinitely. It is not realistically possible to note everything that occurs in the speech of the participant(s) to the point at which the primary data is no longer necessary, and thus it is essential that the raw audio files be archived.


In the consent form, participants can be given the option of selecting different levels of data sharing (for an example, see Figure 6 in the Sydney Speaks case study), such as:

  • with members of the research team

  • with other approved researchers

  • incorporation into web-based corpora.

Learn More

No Office Hours

The Joint Office Hour run by LDaCA and ADO will not take place in 2024. The teams from the two projects are working towards an alternative way to provide targeted advice to researchers — watch this space!

We welcome any feedback to make future issues more useful for you. If the newsletter was forwarded to you, you can subscribe here.

LDaCA acknowledges Traditional Owners of Country throughout Australia and recognises the continuing connection to lands, waters and communities. We pay our respects to their Ancestors and their descendants, who continue cultural and spiritual connections to Country.


You are receiving this email because you have provided us with your email address for promotional purposes.


Republishing is encouraged — CC BY text and infographics.

If you have questions about republishing, please contact ldaca@uq.edu.au

©LDaCA — 2024

Australian Research Data Commons logo