| LDaCA Newsletter — Quarter 3 2024 |
|
|
|
| Welcome to the third issue for 2024 of this newsletter about the activities of the Language Data Commons of Australia (LDaCA) and the Australian Text Analytics Platform (ATAP). This quarter, we announce the new phase of the project, advise of several workshops and other events we will be running, and report back from a symposium hosted by the Australian Research Data Commons (ARDC). If you have any questions or feedback, please email us at ldaca@uq.edu.au or message us on our LinkedIn page. |
|
|
|
| | | We are delighted to announce that the ARDC’s investment in LDaCA will continue until June 2028. In this next phase, LDaCA will: develop the social and technical foundations for a national, distributed archival repository of language materials continue securing vulnerable and nationally significant collections of Aboriginal and Torres Strait Islander languages, Indigenous languages in Australia’s Pacific region, varieties of Australian English and migrant languages, and sign languages of Australia and its region continue to develop the LDaCA data portal for accessing and repurposing language data of significance to researchers and communities, including data which is held in galleries, libraries, archives and museums (GLAM) establish workflows which link repositories and analytics environments so that researchers can create fully described, reproducible research on written, spoken, multimodal and signed language provide training and develop resources for researchers and communities which support best practice in archiving, sharing, accessing and analysing language data in line with FAIR and CARE principles.
In this phase, a new Chief Investigator (CI), Dr Rose Barrowcliffe, joins The University of Queensland (UQ) team (Prof Michael Haugh and Dr Martin Scheweinberger) and three new partners will join LDaCA: Batchelor Institute of Indigenous Tertiary Education (CI: Prof Kathryn Gilbey), the Australian Digital Observatory (ADO) (CI: Dr Marissa Takahashi) and the University of Western Australia (CI: Prof Clint Bracknell). Monash University will not continue as a partner, at least in the short term, and we would like to take this opportunity to thank CI Assoc Prof Louisa Willoughby and her team for the wonderful contribution they have made over the past two years. Our other partners will continue their involvement: Australian National University (ANU) (CI: Prof Catherine Travis), The University of Melbourne (CI: Assoc Prof Nick Thieberger), The University of Sydney (USyd) (CI: Prof Monika Bednarek), AARNet (CI: Adam Bell) and First Languages Australia (CI: Beau Williams). The project is part of the ARDC’s HASS and Indigenous Research Data Commons (RDC), which is establishing long-term, enduring national digital research infrastructure. It supports researchers in harnessing research data to enhance Australian social and cultural wellbeing, and helps us understand and preserve our culture, history and heritage. In partnership with research institutions and government, the HASS and Indigenous RDC has achieved joint resourcing of more than $40 million over the coming four years. LDaCA also welcomes the new focus areas which will increase the size and value of the RDC: the social sciences project (currently in co-design), the Australian Internet Observatory and Australian Creative Histories and Futures.
|
| | As reported in our previous newsletter, LDaCA hosted a workshop in Naarm/Melbourne in February with the aim to create an agreed framework for talking about data management across the partner projects in the ARDC’s HASS and Indigenous RDC. A working group led by Peter Sefton was formed at that event, and the result of their endeavours is the Protocols for Implementing Long-term Archival Repository Services (PILARS). Project lead Michael Haugh described PILARS as a guide to implementing “CAREful FAIRness”, referring to the CARE and FAIR principles. The current version of the Protocols is available along with further background information (see also below under New online content). |
| | We have added several new pieces of content online: |
| | Voices of Country themes represented as a native wattle tree. Image Source: Gilimbaa with cultural elements created by David Williams (Wakka Wakka) |
| | There have been several publications released that feature, were authored or were co-authored by LDaCA team members: |
|
|
|
| Harriet Sheppard (formerly a Research Data Analyst) has left to work as a linguistics lecturer at the University of the South Pacific in Suva, Fiji and Maria Weaver (also formerly a Research Data Analyst) has left our team. We wish them both the best!
Also, in the last edition, we advised that we had two new team members at the ANU and included an introduction from the first team member. Here is a belated introduction from the second team member:
Hi, I’m Anisa Puri, and I’m a historian and research manager who lives on Boonwurrung Country in Naarm/Melbourne. I have been working as a historian, within and beyond academia, for the last 14 years. I have a Master of Public History and a PhD in Historical Studies from Monash University, and I specialise in oral history, Australian history and migration history. I’m delighted to have joined the LDaCA team at ANU as a Research Project Manager. My current focus is on developing a new project which aims to enhance the findability of existing oral history collections in Australia. I look forward to sharing more details about this project in the coming months! |
| | | | Summer School 2025 topics workshop |
| When: 31 July 2024, 2 – 3:30 pm Where: Online Run by: The ARDC
The ARDC is hosting an online workshop to gather ideas for topics that should be covered in the HASS and Indigenous RDC Summer School 2025, to be held 3–6 February 2025 in Brisbane. Summer School 2025 aims to help attendees understand and gain skills in using the HASS and Indigenous RDC infrastructure in future work, and around Indigenous Data Governance and managing data. Registrations open. |
| Data migration skills workshop |
| When: 3–5 September 2024 Where: ANU Run by: LDaCA
The LDaCA infrastructure is based on widely-accepted standards such as Research Object Crates (RO-Crates) and the Oxford Common File Layout (OCFL). Our project has been running for more than three years, and the processes and tools for aligning data with these standards are now well-developed. This workshop aims to show the application of those tools to data in a variety of formats to efficiently migrate material to the LDaCA standards. |
| | When: 30 September – 4 October 2024 Where: UQ Run by: UQ Research and Innovation (R&I) Week
LDaCA will be involved in two events during the UQ R&I Week 2024: 30 September 2024, 4–6 pm at the UQ Anthropology Museum — A panel discussion about issues in Indigenous Data Governance. The panel will include Dr Rose Barrowcliffe, Robert McLellan and Senior Manager Lesley Acres from Aboriginal and Torres Strait Islander Collections and Services at UQ Library. The discussion will be moderated by Grant Sarra. Registrations open. 1 October 2024, 11 am – 1 pm at Room 275, Global Change Institute, St Lucia campus — A contribution to the session ‘Showcasing the UQ HASS&I Research Infrastructure Capabilities’. Registrations open.
|
| |
|
|
When: October 2024 (exact dates TBC) Where: Online Run by: LDaCA and ADO
A glamorous introduction to text analytics: In this two-part workshop series, participants will learn to collect posts from Reddit — using the very sparkly example of the Eurovision Song Contest — and apply a range of text analytic techniques to them. Some experience with R coding preferred, no experience with social media data collection or text analysis required. |
| | When: 26–29 November 2024 (exact date TBC) Where: ANU Run by: The Australian Linguistic Society (ALS)
Martin Schweinberger and Sam Hames will present a masterclass at the ALS 2024 conference called ‘Improving Transparency and Reproducible Results in Linguistics’. More details are forthcoming once the conference schedule is announced. |
| | Online seminar on social meaning of language variation in Australian English |
| On 2 May, Catherine Travis presented an online seminar titled ‘What's in an accent? Understanding the social meaning of language variation in Australian English’. She covered some of the fascinating research findings from the Sydney Speaks project, which looked at speech features across Australian English speakers from different ethnic backgrounds, such as word-final -er. It was clear that ethnicity as a standalone variable did not account for language variation; rather, it was the intersection with other social variables, like socio-economic class and age. |
|
|
|
|
|
|
|
|
|
|
| Pilot of new ‘Introduction to Computational Text Analytics’ workshop |
| Martin Schweinberger and Sam Hames piloted their new introductory workshop to 10 people at the UQ on 23 and 24 May. Drawing on the Language Technology and Data Analysis Laboratory (LADAL) tools and a set of interactive exercises, this workshop is intended to provide a starting point for people with no background in programming or data analysis to use text analytics approaches. |
| | Participants in an introductory text analytics workshop with Sam Hames (back l.) and Martin Schweinberger (back r.). Image Source: Sam Hames |
| Community Connect workshop |
| Otis Carmichael and Simon Musgrave delivered the ‘Improving access to language data’ workshop, developed under the ARDC’s Community Connect program, at the UQ on 29 May. This was the first time the materials had been used and was a valuable learning experience, which will lead to some revisions and enhancements for future use. One important outcome of this workshop was being made aware of potential audiences for the material which we had not thought of. |
|
|
|
|
|
|
|
|
|
|
| HASS and Indigenous RDC Symposium 2024 |
| The symposium took place in Naarm/Melbourne on 18 and 19 June and was attended by around 200 people (including online attendees). The program highlighted the achievements of the existing streams of work in the RDC (LDaCA, Social Sciences, Improving Indigenous Research Capability and the Community Data Lab) and also introduced the two streams which are commencing from July (the Australian Internet Observatory and Australian Creative Histories and Futures). Michael Haugh, Robert McLellan and Simon Musgrave presented on the outcomes and achievements of LDaCA, and Robert was also involved in a presentation about the Indigenous Data Governance Framework. |
| | Attendees of the ARDC’s HASS and Indigenous RDC Symposium 2024. Image Source: The ARDC |
|
|
|
|
|
|
|
|
|
|
| | Chao Sun (Sydney Informatics Hub, USyd) and Olga Boichak (Computational Social Science Lab, USyd) attended the 74th Annual International Communication Association (ICA) Conference (20–24 June 2024, Gold Coast). On 23 June, Chao Sun demonstrated ATAP tools for corpus processing and analysis in a presentation titled ‘Integrated Text Analytics: Unveiling Insights With ATAP Corpus and Diverse Analytic Tools’. This presentation was part of the Computational Methods Tool Development session organised by the Computational Methods division that brought together a group of international presenters and attendees working on text analytics. |
| | Chao Sun presenting at ICA 2024. Image Source: Olga Boichak |
| | | This issue’s spotlight is on the corpus of archived materials from the Morehead documentation project, which aimed to extensively document two undescribed languages, Nen and Kómnzo, from the Western Province of Papua New Guinea.
Aside from linguists, there was also an ethnobiologist and a botanist on the documentation team. To collect both language and ethno-ornithological data, the team went on ‘bird walks’ with language consultants and local bird enthusiasts. They recorded birdsong alongside linguistic content, such as narratives about bird behaviour and bird lore, or elicitation of bird names. Julia Colleen Miller was an ANU Postdoctoral Research Fellow and PARADISEC archivist when she worked on the team in 2012 and 2013. Julia has made a fascinating video ‘Linguistics and Ethno-ornithology: Bird songs and bird stories from the Western Province of Papua New Guinea’ weaving together “photos, videos, audio soundscapes, birdsong, short stories in the Nen language, and visualisations of acoustic data”.
|
| | | Screen captures relating to the bird tibrom (‘Greater Black Coucal’) from the ‘Linguistics and Ethno-ornithology: Bird songs and bird stories from the Western Province of Papua New Guinea’ video. Image Source: Julia Colleen Miller |
| | | | Catherine Travis is a CI on LDaCA, and Professor of Modern European Languages at the ANU. She has been putting together language collections throughout her career, and these include: |
|
| | Catherine Travis giving a presentation. Image Source: Catherine Travis |
|
|
|
|
|
|
|
|
|
|
|
|
|
Catherine’s tip: My tip relates to preparing ethics applications for collecting language data to maximise the possibility of reuse. This means gaining approval for (1) data sharing, with appropriate restrictions as relevant to the collection, and (2) long-term archiving of the recordings (as it is often assumed by ethics boards that the raw data should be destroyed once the research has been completed).
These can be addressed in questions around data storage, with statements like: The de-identified transcriptions and audio files of those participants who give their permission may be incorporated into web-based corpora and made available to others, with approval by the investigators. The de-identified audio recordings will be stored indefinitely. It is not realistically possible to note everything that occurs in the speech of the participant(s) to the point at which the primary data is no longer necessary, and thus it is essential that the raw audio files be archived.
In the consent form, participants can be given the option of selecting different levels of data sharing (for an example, see Figure 6 in the Sydney Speaks case study), such as: with members of the research team with other approved researchers incorporation into web-based corpora.
|
| | | AARNet is Australia’s Academic and Research Network and is one of the partner organisations in the LDaCA project. AARNet is widely regarded as the founder of the internet in Australia and is a provider of high-speed network infrastructure for the research and education communities. AARNet is committed to supporting collaborations with academia and technologists to develop solutions for researchers, and help them make the most of AARNet’s powerful network and services to achieve their project goals. |
|
|
|
|
|
|
|
|
|
|
| | | The Joint Office Hour run by LDaCA and ADO will not take place in 2024. The teams from the two projects are working towards an alternative way to provide targeted advice to researchers — watch this space! |
|
|
|
| We welcome any feedback to make future issues more useful for you. If the newsletter was forwarded to you, you can subscribe here. |
| | |
|
|
LDaCA acknowledges Traditional Owners of Country throughout Australia and recognises the continuing connection to lands, waters and communities. We pay our respects to their Ancestors and their descendants, who continue cultural and spiritual connections to Country.
You are receiving this email because you have provided us with your email address for promotional purposes.
Republishing is encouraged — CC BY text and infographics. If you have questions about republishing, please contact ldaca@uq.edu.au ©LDaCA — 2024 |
| | |
|
|
|
|
|