| LDaCA Newsletter — Quarter 2 2024 |
|
|
| News | New content on website | We have added two new pieces of content to our website: a case study about data management in language technology, based on projects at tech company Appen. a blog post about why the Australian National Corpus is not a single collection within LDaCA.
| LDaCA draft project plan | On 14 March, the ARDC released draft project plans for the next phase of the HASS and Indigenous Research Data Commons (HASS&I RDC), including a plan for the LDaCA project. They were seeking feedback from the public on project activities planned for the next four years. Feedback closed on 22 March, and once this feedback has been reviewed and incorporated, final project plans will be released in the coming months. Meanwhile, the draft plans are still available to view online. | Co-designing a digital future for Indigenous language materials | Robert McLellan and his team are seeking Indigenous language champions who are working with language materials for a research project called “Co-designing a digital future for Indigenous language materials”. We plan to run a series of Zoom and face-to-face interviews to find out what works well and what doesn’t work when finding, accessing and using Indigenous language materials.
We know that there are challenges involved with accessing and using language materials stored in various locations. We want to contribute to making that data more findable and usable so that it can support and enhance the language work currently being undertaken. So, we are seeking input to help build language platforms, spaces and tools that suit Indigenous language workers and communities.
If you are interested or would like to know more, please take a minute to get in touch through our Contact Form. All interviewees will receive a gift voucher worth $70 as a thank you for their time. | Graduate Digital Research Fellowship program | The Graduate Digital Research Fellowship (GDRF) program ran very successfully in 2023. We are hoping to build on this success, with three fellows from The University of Queensland (UQ) participating in the 2024 program: Lu Jin is a PhD candidate in architecture and urban design. Her multidisciplinary work will design urban green infrastructural networks in a circular food system for city resilience and sustainability. In the GDRF program, Lu will be applying machine learning methods to identify and assess green cover in street view images. David Gilchrist is a PhD candidate in journalism. His research looks at how journalists connect with audiences, and in the GDRF program, he hopes to explore methods for finding and gathering relevant data from digital platforms. Quy Pham is a PhD candidate in applied linguistics. He is researching the errors produced by learners of English, working with an already-existing corpus of recordings. In the GDRF program, he will be experimenting with using speech recognition methods to automate coding the data, especially automatic identification of pauses.
|
|
|
New team member | We have two new team members at the Australian National University (ANU). Here is an introduction from the first team member; the second introduction will follow in the next issue:
Greetings, I am Gan Qiao, and I am thrilled to be appointed as a Research Data Officer with the LDaCA team, located in Canberra on Ngunnawal Country. As a variationist linguist, my passion lies in language variation and change, learner corpus research, and language technology. Having recently achieved my PhD in Linguistics from ANU, I am eager to bring my expertise to LDaCA, where I aim to streamline language data onboarding processes, and create resources, such as scripts and notebooks, to enhance data management for linguists and beyond. | | Events | Upcoming Events | Online seminar on social meaning of language variation in Australian English | When: 2 May 2024, 1:00 pm AWST/3:00 pm AEST Where: Online seminar Run by: The University of Western Australia (UWA) Linguistics and Language Lab
LDaCA Chief Investigator Catherine Travis (ANU) will present a seminar titled “What's in an accent? Understanding the social meaning of language variation in Australian English”. Use password 250801 to access the seminar online. Note that sessions in this seminar series are usually not recorded. | ARDC Digital Research Skills Summit 2024 | When: 21–23 May 2024, Day 1: 1–5 pm AEST, Day 2 and 3: 9:30 am – 4 pm AEST Where: Woodward Conference Centre, Law Building, The University of Melbourne, Carlton, VIC | Online Run by: The ARDC
Find out how digital infrastructure providers and research communities are upskilling researchers in emerging research technologies through a three-day summit in Naarm/Melbourne or online. Share, learn and network with thought leaders, digital skills trainers and researchers by registering online for all three days or the days that interest you most: Day 1 — ARDC Skills Leadership Forum (The Skilled Research Infrastructure workforce: Pathways and support to enable effective research): Explore digital research skills challenges and opportunities with thought leaders. Day 2 — Researcher Challenges: Hear from researchers and learn how they navigate their skills needs and gaps. Day 3 — Carpentry Connect: Participate in regional conversations with digital research skills trainer communities.
| Australian Historical Association Conference – Digital History Stream |
|
|
When: 1–4 July 2024 Where: Flinders University, South Australia | Online Run by: The Australian Historical Association (AHA)
Through the HASS&I RDC, the ARDC is sponsoring the digital history stream of the 2024 AHA Conference. The stream will explore the possibilities and pitfalls of using digital tools and methods to explore historical data. Registration information for the conference can be found online. | Recent Events | Repositories and Workspaces workshop | The ARDC’s Repositories and Workspaces workshop was held from 5–6 February on the lands of the Wurundjeri people of the Kulin Nation at The University of Melbourne. It brought together over 60 people representing Aboriginal and Torres Strait Islander groups (including language centres), academia, the GLAM (galleries, libraries, archives and museums) sector and other stakeholder groups, with the aim to create an agreed framework for talking about data management across the partner projects in ARDC’s HASS&I RDC. There were presentations from the LDaCA and Indigenous Data Network (IDN) streams, as well as from the Community Data Lab and integration projects. A report summarising the presentations and discussion is currently being prepared. |
|
|
|
|
|
|
|
|
| HASS and Indigenous Research Data Commons Computational Skills Summer School 2024 | The ARDC’s HASS&I RDC held a Computational Skills Summer School on the lands of the Kulin Nation in Naarm/Melbourne on 7–9 February. More than 100 participants had the opportunity to learn about research infrastructure through talks, case studies and workshops. LDaCA team members Ben Foley and Simon Musgrave delivered content in a stream shared with IDN, with the invaluable assistance of Levi Murray (IDN) and Karen Manton (Batchelor Institute).
On the first day, LDaCA presented two sessions on making data FAIR into the future, discussing long-term storage of data (spoiler alert — there are not many suitable solutions in Australia) and data governance decisions, with a special emphasis on properly documenting access conditions. Throughout the discussion, Levi (IDN) ensured that we took CARE of how these issues apply when handling Indigenous data. | | Levi Murray, Karen Manton and Ben Foley explaining RO-Crates Image Source: LDaCA |
| On the second day, we presented a collaborative case study to highlight the challenges faced by data custodians. Karen (Batchelor Institute) sketched the history of the Centre for Australian Languages and Linguistics (CALL) Collection, emphasising the importance of the material to multiple communities. Ben (LDaCA) explained the analogue and digital infrastructure underpinning the Collection and LDaCA’s approach to improving its technological sustainability. Levi (IDN) provided insightful commentary on the complexity involved in meeting both technological and social needs in this context.
|
|
|
|
|
|
|
|
|
| On the third day, we held a practical workshop session on some basics of working with geospatial data in relation to languages, using wordlists for Indigenous languages collected by Daisy Bates in the early twentieth century (available online with interactive maps). Participants used a Jupyter Notebook as a tool to explore the data underlying the online resource, and reused that data to make maps of their own. Along the way, we encouraged them to think about when it might be inappropriate to tie aspects of language data to specific locations.
The ARDC generously funded a number of travel bursaries, and we can share feedback from some bursary holders about what they found valuable: “Such in-depth discussion which deepened my understanding of key data and archiving concepts, as well as providing practical advice.” “The opportunity to be acquainted with, and reflect on, different ways of knowing/carrying knowledge, from an Indigenous perspective.” “Speaking and connecting with a range of people who are interested in research data and investigating ways to share our knowledge and skills.”
We are proud to have been part of this very successful event and look forward to its next iteration. |
|
|
|
|
|
|
|
|
| | Participants of the ARDC Computational Skills Summer School 2024 Image Source: The ARDC |
|
|
|
|
|
|
|
|
| Co-design workshops for LDaCA | The ARDC held two co-design workshops for the LDaCA project on 22 February and 7 March. We hoped to better understand current digital research challenges by consulting directly with researchers and managers of language data. The first workshop aimed to refine our understanding of the challenge to be addressed and find out what outcomes participants wanted to achieve. The second workshop aimed to explore how these outcomes could be achieved, and understand requirements and practical considerations for the potential solutions.
There were 35 participants at each workshop, representing Australian universities, as well as GLAM and research infrastructure organisations. Participants provided their thoughts through notes added to an online whiteboard tool. These notes informed the development of a draft project plan, which was released for public feedback (see “LDaCA Draft Project Plan” under News above) alongside the publication of a report from the workshops. Both documents are available online. |
|
|
|
|
|
|
|
|
| Making Meaning 2024: Collections as Data symposium | The 2024 “Making Meaning: Collections as Data” symposium was hosted by the State Library of Queensland on 8 March. Robert McLellan delivered a keynote address with the title: “Digital data for HASS and Indigenous researchers: ‘people’ behind the data”. The presentation highlighted the individuals, families and cultural groups represented within ‘people’ data, giving the audience insights into the challenges associated with reclaiming cultural heritage, achieved through rigorous exploration of Aboriginal languages, stories, art and identity. Additionally, the talk explored the recontextualisation of historic research within a contemporary framework. | | Robert McLellan (second from l.) with other keynote speakers and CEO Vicki McDonald (second from r.) Image Source: State Library of Queensland | Sam Hames (UQ) also presented a lightning talk along with collaborator Naomi Barnes (Queensland University of Technology). They spoke about their work combining computational and qualitative inquiry to make sense of some of the billion words recorded in the Proceedings of Federal Parliament since 1901. | Tech team visit to Batchelor Institute |
|
|
|
|
|
|
|
|
| Batchelor Institute recently invited our technology team to visit their campus one hour south of Darwin, to get a sense of what it’s like to work with archival material in the extreme conditions of the tropics. Peter Sefton, Moises Sacal Bonequi and Ben Foley spent a few days with Karen Manton, Jo Wood and Will Wood, browsing items in the collection and having scintillating conversations about metadata. Batchelor and LDaCA are formally starting a new partnership in 2024 to continue the work of ensuring long-term access to the CALL Collection, a significant archive of physical and digital Indigenous language material.
|
| | (l. to r.) Peter Sefton, Karen Manton, Ben Foley and Will Wood Image Source: Moises Sacal Bonequi |
|
|
|
|
|
|
|
|
| While in the Top End, Ben also started a pilot project with the Aboriginal Resource Development Service (ARDS), long-time champions of the importance of language and culture in developing self-determination for Aboriginal people. The pilot project will work on organising, RO-Crating and building an access interface for a Yolŋu language audio collection that was recorded in the early 2000s. |
|
|
|
|
|
|
|
|
| Online seminar on corpus linguistics approach to language variation in media | LDaCA Chief Investigator Monika Bednarek (The University of Sydney) gave a seminar titled “Language variation in the media: A corpus linguistic approach” (online, 21 March) in a webinar series organised by UWA Linguistics and Language Lab. She discussed her recent study examining a corpus of dialogue from Australian fictional television series through lexical profiling, a method that allows users to retrieve words in a corpus that occur or do not occur in selected reference word lists. She explored the extent to which this method is useful for identifying language variation in Australian narrative mass media, with a focus on lexis. Although no recording is available, interested parties can contact Monika to learn more. |
|
|
|
|
|
|
|
|
| | Corpus Spotlight | The PAC Corpus (Phonologie de l’Anglais Contemporain ‘Phonology of Contemporary English’ Corpus) is based on reading and conversational tasks completed in native and non-native varieties of English spoken worldwide. The research program that produced the corpus is led by a network of French universities that partnered with international institutions, including Griffith University in QLD.
The PAC-Australia sub-corpus is part of the PAC Corpus and is based on data collected from 2003 to 2023 in Australia from speakers reading wordlists, reading a text, and participating in semi-guided interviews. The 240 spoken recordings in the corpus come with orthographic and phonetic transcriptions, as well as the place of recording. Most of the recordings can be accessed freely online and downloaded as WAV or MP3 files; only access to the semi-guided interviews is restricted. | | Team Member’s Tip | Simon Musgrave is Engagement Lead for the LDaCA project. He was previously part of the linguistics program at Monash University. Simon’s research covered various topics, including the use of digital technology in linguistic research, an interest which led to his involvement in the Australia National Corpus project, a precursor to LDaCA. In an earlier life, Simon made his living playing the violin.
Simon’s tip: If you want to use a particular tool to analyse data, make sure that you understand what input format the tool expects — ideally, find an example that you can look at. This advice can be relevant in ways that you might not expect. I once lost a few hours of time because I did not know that a tool I was trying to use required the input data to be saved with Unicode (UTF-8) encoding rather than ANSI. |
| | Simon and his violin Image Source: LDaCA |
|
|
|
|
|
|
|
|
|
|
|
| Learn More | One of the partner institutions in the LDaCA project is First Languages Australia (FLA), a national peak body working to strengthen Aboriginal and Torres Strait Islander languages. They support a network of language centres and community programs, connecting often-isolated language communities to share knowledge, resources and skills, as well as facilitating communication with government and non-government agencies.
FLA has an extremely useful resources page, which includes the Gambay — First Languages Map. This interactive map reflects the names and groupings of Aboriginal and Torres Strait Islander languages as favoured by language communities. The map also lists language centres associated with different languages, which serve as important points of contact in Indigenous language work. | | Gambay — First Languages Map Image Source: First Languages Australia |
|
|
|
|
|
|
|
|
| | No Office Hours | The Joint Office Hour run by LDaCA and the Australian Digital Observatory (ADO) will not take place in 2024. The teams from the two projects are working towards an alternative way to provide targeted advice to researchers — watch this space! |
|
|
| We welcome any feedback to make future issues more useful for you. If the newsletter was forwarded to you, you can subscribe here. | | |
|
|
LDaCA acknowledges Traditional Owners of Country throughout Australia and recognises the continuing connection to lands, waters and communities. We pay our respects to their Ancestors and their descendants, who continue cultural and spiritual connections to Country.
You are receiving this email because you have provided us with your email address for promotional purposes.
Republishing is encouraged — CC BY text and infographics. If you have questions about republishing, please contact ldaca@uq.edu.au ©LDaCA — 2024 | | |
|
|
|
|
|