Access Conditions | Conditions which specify who can access data and what they can do with that data. A well-governed archival repository has mechanisms in place to administer and implement such conditions which will be specified on a data license. |
ADA | Australian Data Archive. A national service for the collection and preservation of digital research data. More information |
ADM+S | ARC Centre of Excellence for Automated Decision-Making and Society. It brings together universities, industry, government and the community to support the development of responsible, ethical and inclusive automated decision-making. More information |
ADO | Australian Digital Observatory. An ARDC platform working to establish a national infrastructure to support a diverse array of researchers, especially in the humanities, in accessing and working with dynamic digital data. More information |
AIATSIS | Australian Institute of Aboriginal and Torres Strait Islander Studies. Australia’s only national institution focused exclusively on the diverse history, culture and heritage of Aboriginal and Torres Strait Island Australia, with a growing collection of over one million items, dedicated to Australian Aboriginal and Torres Strait Islander cultures, histories and contemporary stories. More information |
API | Application Programming Interface. A way for computer programs to communicate with each other. It is a way for one computer or system to ask another computer or system to do something, like provide a dataset. |
ARC | Australian Research Council. Its purpose is to grow knowledge and innovation for the benefit of the Australian community through funding the highest quality research, assessing the quality, engagement and impact of research, and providing advice on research matters. More information |
Archival Repository | A location for the storage of data that has an appropriate governance regime in place. |
ARCP | Archive and Packaging ID. A globally unique, searchable ID with zero management overhead, but which can be used like URLs in linked data systems, but does not resolve to content in a browser. |
ARDC | Australian Research Data Commons. The ARDC is Australia’s leading research data infrastructure facility accelerating Australian research and innovation by driving excellence in the creation, analysis and retention of high-quality data assets. More information |
ARDS | Aboriginal Resource and Development Services (ARDS Aboriginal Corporation). Its work champions the importance of language and culture in developing self-determination for Aboriginal people, and supports Aboriginal communities to increase control and understanding of mainstream services and systems. More information |
Arkisto | A scalable, standards-based platform for sustainable data. Data on an Arkisto deployment is always available on disc (or object storage) with a complete description independently of any services such as websites or APIs. Once the data is safe and well-described, Arkisto has a flexible model for how data can be accessed using a variety of services. Built on top of RO-Crate and OCFL. More information See also: Oxford Common File Layout See also: RO-Crate |
ASR | Automatic Speech Recognition. ASR enables computers to process human spoken language into readable text, allowing users to operate devices through speech or facilitate translation of that speech into other languages. |
ATAP | Australian Text Analytics Platform. An open source environment that provides researchers with tools and training for analysing, processing and exploring text. More information |
AustLang | Provides a controlled vocabulary of persistent identifiers, a thesaurus of languages and peoples, and information about Aboriginal and Torres Strait Islander languages which has been assembled from referenced sources. Alphanumeric codes are used as persistent identifiers, while associated text strings are changeable and can reflect community preferences (including alternative names and spellings). In AustLang, Warlpiri has two codes: C15 for the language in general, and C15.1 for the variety named as Wakirti Warlpiri. More information |
BI | Batchelor Institute of Indigenous Tertiary Education. The only First Nations dual sector tertiary education provider in Australia. The Institute gives precedence to its philosophy of Both Ways: positioning First Nations peoples as knowledge holders in all educational transactions with Western knowledge systems as well as privileging First Nations ways of learning and teaching to underpin engagement with mainstream education systems and society more broadly. More information |
BinderHub | A Kubernetes-based cloud service that allows users to share reproducible interactive computing environments from code repositories. It is the primary technology behind Binder. ATAP notebooks are made available using a Binder instance maintained by AARNet/Nectar. More information |
CADRE | Coordinated Access for Data, Researchers and Environments. A shared and distributed sensitive data access management platform for the social sciences and related disciplines. More information |
CARE | Four principles developed by the Global Indigenous Data Alliance (GIDA) to ensure that Indigenous communities have control over the application and use of Indigenous data and Indigenous Knowledge for collective benefit.
More information |
CDL | Community Data Lab. CDL shares tools and datasets for collaborative HASS research projects that use data from archives, libraries and collections. More information |
CDU | Charles Darwin University. More information |
CLARIN | CLARIN is a digital infrastructure offering data, tools and services to support research based on language resources. It is a European Research Infrastructure Consortium (ERIC). More information |
Class | In linked data, a resource that represents a concept or entity. Classes in the LDAC Metadata Schema include CollectionEvent, CollectionProtocol, DataDepositLicense, DataLicense and DataReuseLicense. |
CMDI | Component Metadata Infrastructure. Provides a standard for metadata within CLARIN. It draws on the earlier ISLE Metadata Initiative (IMDI), but CMDI adopts a more flexible approach where components are assembled into reusable profiles. More information |
Collection | A group of related Objects. Examples of collections include corpora, and sub-corpora, as well as aggregations of cultural objects such as PARADISEC collections, which bring together items collected in a region or a session with consultants. |
Confidentiality | The obligation to protect identity and privacy as recognised under Australian Law in the Privacy Act 1988. More information |
Copyright | The legal right of the owner of intellectual property. In simpler terms, copyright is the right to copy. This means that the original creators of products and anyone they give authorisation to are the only ones with the exclusive right to reproduce the work. |
Copyright Owner | The creator of the work, and the person/institution who has the exclusive right to reproduce, publish, perform, communicate, and adapt or modify the work, for both commercial and non-commercial purposes. The copyright owner may be the same as the Data Steward. |
Corpus | A sizable collection of real-life examples of language selected to be a fair representation of the language or a particular linguistic genre. Use of the term generally implies that the material is in a form which can be read and manipulated by a computer. |
Crate-O | A browser-based editor that allows you to create and update RO-Crates using a web interface, and with metadata spreadsheets. It provides researchers with a relatively simple way to describe their data using the best practices in formal metadata description. More information |
Creative Commons Licenses | A set of licenses that allow for data reusability under specified conditions regarding attribution, data sharing, commercialisation and data adaptation. |
Data Collection | A set of data collected under similar conditions and brought together in a shared framework. |
Data Commons | Cloud-based infrastructure coupled with governance strategies and principles that allow a community to use, share, manage and analyse its data. LDaCA is a language data commons serving researchers and community groups that are interested in language data. |
Data Governance | The policies and processes by which data is managed through its life cycle to ensure the quality, reliability, security, and sustainability of the data. |
Data License | A legal arrangement between the creator of the data and the end-user specifying what users can do with the data. More information |
Data Management Plan | A document that (1) outlines key information about a research project and its data, including the access conditions and ownership, storage, and future use and (2) sets out roles and responsibilities in its management. |
Data Onboarding | The process by which language collections are catalogued in LDaCA, carried out collaboratively by the Data Steward and LDaCA. |
Data Packaging | The application of widely used standards, for example, in terms of formats, metadata , and access conditions, to the collection data. See also: Data Transformation |
Data Steward | An individual or organisation with the authority to make decisions regarding the collection. |
Data Transformation | The process of converting, cleansing, and structuring data into a usable format. Sometimes used as a synonym for Data Packaging. See also: Data Packaging |
Defined Term | In linked data, a metadata category that allows for a) accurate definitions of the values assigned to Properties, and b) grouping such definitions in DefinedTermSets, which can function as controlled vocabularies. DefinedTerms in the LDAC Metadata Schema include DerivedMaterial, PartOfSpeech, SignedLanguage, SpokenLanguage, etc. |
Describo | A tool that allows you to create and update RO-Crates. It provides researchers with a relatively simple way to describe their data using the best practices in formal metadata description. Superseded for project purposes by Crate-O. |
DOI | Digital Object Identifier. A type of Persistent Identifier (PID) which is becoming the default identifier for research datasets, as a long-lasting reference to the collection. It comprises a unique number made up of a prefix and a suffix separated by a forward slash, resolvable by displaying it as a link, e.g. https://doi.org/10.1000/182 |
ELAN | A software tool to make time-aligned annotations (which may be transcriptions) of audio and video recordings. The tool is commonly used by linguists and others who work with language. More information |
Elpis | A tool to obtain a first-pass transcription of untranscribed audio. It brings cutting-edge speech recognition technology within reach of language workers and researchers who don’t have backgrounds in speech engineering. More information |
FAIR | Four key principles developed in 2016 with the aim of supporting the discovery and reuse of research data. The principles encourage us to make data:
More information |
Field Notebook/Journal | A collection of fieldnotes compiled while completing fieldwork. |
Fieldnotes | Notes taken by a researcher while conducting fieldwork that record their observations and other relevant information. |
Fieldwork | The collection of data from an environment where the data is likely to occur naturally or organically without the intervention of researchers. In linguistics, this typically involves studying a language as it is spoken by a community of speakers in a particular location. |
FLA | First Languages Australia. A national organisation working to ensure the strength of all Aboriginal and Torres Strait Islander languages. More information |
GitHub | A developer platform that allows developers to create, store, manage and share their code, using Git software. More information |
GLAM | Galleries, Libraries, Archives and Museums. |
GLAM Peak | A representative national body that brings together the representative bodies for Australia’s galleries, libraries, archives, museums, historical societies, cultural heritage organisations and research peak bodies. More information |
GLAM Workbench | A suite of Jupyter notebooks developed by Tim Sherratt to help with exploring and using data from GLAM institutions. Primarily, the notebooks use data from Trove newspaper and magazine collections, but have some extensions beyond this. More information |
Glottolog | An alternative catalogue of the world’s languages, language families and dialects - Glottolog uses the term languoid to cover all of these. Each languoid is assigned a unique identifier consisting of four alphanumeric characters and four digits. For example, (standard) French has the code stan1290, and Warlpiri is warl1254. More information |
HASS | Humanities, Arts and Social Sciences. |
HMI | Human Machine Interface. A user interface that connects a person to a machine, system or device. For example, in-car HMIs allow drivers to interact with their vehicle. |
IAD | Institute for Aboriginal Development (Aboriginal Corporation). An Aboriginal community-controlled organisation established as a cross-cultural adult education and training centre serving all Aboriginal people in Central Australia. More information |
IDIL | International Decade of Indigenous Languages. The United Nations General Assembly has declared the period between 2022 and 2032 as the International Decade of Indigenous Languages, to draw global attention to the critical status of Indigenous languages worldwide and encourage action for their revitalisation, promotion and ongoing use. More information |
IDN | Indigenous Data Network. A national network of Aboriginal community-controlled organisations, university research partners, Indigenous businesses and government agencies and departments established to support and coordinate the governance of Indigenous data for Aboriginal and Torres Strait Islander peoples and empower Aboriginal and Torres Strait Islander communities to decide their own local data priorities. More information |
IIRC | Improving Indigenous Research Capability. A project supporting the creation of an Aboriginal and Torres Strait Islander Research Data Commons. More information |
Intellectual Property | Creative works protected by law via patents, copyright and trademarks. |
Interoperability | The ability of computer systems or software to exchange and make use of information. The relevant FAIR principle uses the term specifically in relation to data. |
IPA | International Phonetic Alphabet. An alphabetic system of phonetic notation based primarily on the Latin script, designed as a standardised representation of speech sounds in written form. |
ISO-639 | A standard by the International Organization for Standardization (ISO) concerned with representation of languages and language groups. An earlier version of this system used two-letter codes to identify languages; more recent versions use three-letter codes (referred to as ISO 639-3). The ISO 639-3 code for French is fra, and Warlpiri is wbp. More information |
JSON | JavaScript Object Notation. A data-interchange text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages. More information |
Jupyter Notebook | Interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media. More information |
LADAL | Language Technology and Data Analysis Laboratory. A free, open-source, collaborative support infrastructure for digital and computational humanities assisting anyone interested in working with language data in matters relating to data processing, visualization and analysis, and offering guidance on matters relating to language technology and digital research tools. More information |
LDAC | Language Data Commons. LDAC can refer either to the schema, profile or modes associated with it. |
LDaCA | Language Data Commons of Australia. LDaCA is making nationally significant language data available for academic and non-academic use and providing a model for ensuring continued access with appropriate community control. Our preferred pronunciation of the name is el-dakka (and that is why you may find the odd alpaca on this website). More information |
Legacy (File) Format | An old, outdated or obsolete file format that is no longer supported by modern hardware and/or software systems. |
Lexicon | A list of forms in a language with associated information, such as meanings, pronunciations or word class assignments. |
Licensing | A process that allows the copyright owner of a work to share the right to access and use some material from the work without reassigning the ownership of the copyright. License terms establish the conditions for that access and use. A license for a data collection is the legal agreement between the creator of the data and the end-user specifying who can access, share and reuse the data, and other conditions as required. |
Linked Data | Structured data that is interlinked with other data and published in a machine-readable way to maximise interoperability and improve the precision of metadata. |
Metadata | The information that defines and describes data. It provides data users with information about the purpose, processes, and methods involved in the data collection. (Source: Australian Bureau of Statistics). |
Mode | Also called a Mode file. An implementation of an RO-Crate Profile consisting of a set of lightweight syntactic rules for combining Schema.org Style Schema (SOSS) Classes, Properties and DefinedTerms in a JSON file. Modes can be loaded to an editor such as Crate-O, used for RO-Crate validation or used to summarise rules for RO-Crate Profiles. |
MT | Machine Translation. |
NCRIS | National Collaborative Research Infrastructure Strategy. It provides strategic funding for national-scale research infrastructure, driving collaboration to bring economic, environmental, health and social benefits for Australia. More information |
NER | Named-Entity Recognition. NER locates and classifies named entities in unstructured text into predefined categories such as person names, organisations and locations. |
NFSA | National Film and Sound Archive. Australia’s national audiovisual cultural institution which collects, preserves and shares Australia’s audiovisual culture. More information |
Nyingarn | A 3-year Australian Research Council funded project that will provide digital access to early sources of Australia’s Indigenous languages, using various ways to turn images of manuscripts into text, including Optical Character Recognition (OCR), and crowdsourced transcription (using DigiVol). More information |
Object | A single resource or a group of tightly related resources; for example, a work (document) in a written corpus, or the files associated with a dialogue or session in a speech study (recordings, transcriptions etc.). |
OCFL | Oxford Common File Layout. An application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories. More information |
OCR | Optical Character Recognition. The electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text. |
OLAC | Open Language Archives Community. An international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. More information |
Oni | A portal for discovery of RO-Crated data. It is a web application which provides indexing, searching and access to secure data repositories which follow the Arkisto model. More information |
Oral History | The gathering, recording and preserving of historical information, based on interviews about the experiences, memories and opinions of people who participated in or observed past events. |
ORCID | Open Researcher and Contributor ID. A registry providing globally unique persistent identifiers (PIDs) for researchers, authors and contributors of scholarly works. More information |
Orthographic Transcription | A transcription method that employs the standard spelling system of each target language. |
PARADISEC | Pacific and Regional Archive for Digital Sources in Endangered Cultures. A digital archive that works to digitise, preserve and make accessible recordings that are at risk of loss; particularly for languages in the Pacific region. More information |
Phonemic Transcription | A representation of speech in terms of the sound contrasts made in a language, using a phonetic alphabet, such as the International Phonetic Alphabet (IPA) or X-SAMPA. |
Phonetic Transcription | A representation of speech in terms of the sounds actually produced in specific instances, using a phonetic alphabet, such as the International Phonetic Alphabet (IPA) or X-SAMPA. |
PID | Persistent Identifier. A digital identifier that is permanently assigned and provides a long-lasting reference to an object or entity, for example a Digital Object Identifier (DOI). |
Profile | Specifies a subset of a metadata standard for a particular use case, such as for describing language resources. LDaCA uses RO-Crate profiles, which are a set of conventions, types and properties that are required in RO-Crates. Specifically, the LDAC RO-Crate Metadata Profile provides the minimum structural metadata for describing language data resources. |
Property | In linked data, a metadata category which is an attribute of an instance of a Classes. Properties in the LDAC Metadata Schema include author, communicationMode, linguisticGenre, speaker, signer, etc. |
Provenance | The documented history or chain of custody of materials from their creation to their current location within a collection. The full history and ownership of an item from the time of its discovery or creation to the present day, through which authenticity and ownership are determined. |
Python | A high-level, general-purpose programming language with an emphasis on code readability. More information |
QUT | Queensland University of Technology. More information |
R | A programming language and environment for statistical computing and graphics. More information |
RDC | Research Data Commons. See also: ARDC |
Research Data Management | The handling of data during and after a research activity including generating, collecting, organising, accessing, using, analysing, storing, disclosing, documenting, preserving, disposing of, sharing and re-using data. |
REMS | Resource Entitlement Management System. A tool to help researchers browse resources such as datasets relevant to their research and to manage the application process for access to those resources. |
Research Infrastructure | The facilities, systems, tools, platforms, equipment, instruments and other resources and services that are needed for research communities to conduct research. This can include both tangible assets, like supercomputers, and intangible assets, like data collections. |
RIIP | Research Infrastructure Investment Plan (NCRIS). It provides continued support for Australia’s National Research Infrastructure facilities, as well as investment in emerging research priorities. |
RO-Crate | Research Object Crate. A way of packaging research data that stores the data together with its associated metadata and other component files, such as the data license. More information |
Schema | Specifies a metadata vocabulary of Classes and Properties, based on the RO-Crate specification’s use of Schema.org classes. |
Sensitive Data | Data that, as a result of research, contains confidential or other ‘sensitive information’ which is defined in the Privacy Act as information or opinion about an individual’s:
More information |
Takedown Policy | The policy according to which data may be removed, or access may be adjusted in some way, and the steps by which this is implemented. |
TK Labels | Traditional Knowledge Labels. An initiative for Indigenous communities and local organisations, allowing communities to express local and specific conditions for sharing and engaging in future research and relationships in ways that are consistent with already existing community rules, governance and protocols for using, sharing and circulating knowledge and data. More information |
Tools | Code or software developed in order to support or enhance (language) data accessibility and use. |
Transcoding | The process of converting one digital encoding format to another, such as from a high-resolution image to a lower-resolution one. |
TTS | Text-to-Speech. TTS generates an artificial spoken audio version of a written text and can be used to improve accessibility. |
UoM | The University of Melbourne. More information |
UQ | The University of Queensland. More information |
USC | University of the Sunshine Coast. More information |
USyd | The University of Sydney. More information |
UWA | The University of Western Australia. More information |
VoIP | Voice over Internet Protocol. A technology allowing phone calls to be made through the Internet using a broadband connection, rather than through a landline or mobile network. |
Wangka Maya | Wangka Maya Pilbara Aboriginal Language Centre. It aims to be recognised as a leading Aboriginal language and resource centre in Australia, using expertise, knowledge and sensitivity to record and foster Aboriginal languages, culture and history. More information |
Work Plan | An agreement between LDaCA and the Data Steward establishing the terms according to which the data will be onboarded to LDaCA, including the goals and responsibilities of each party, and the steps and timeline for carrying out the onboarding process. |
WP | Work package within a funded project. |
X-SAMPA | Extended Speech Assessment Methods Phonetic Alphabet. A phonetic script designed to extend SAMPA to cover the range of characters in the International Phonetic Alphabet (IPA). |
XML | Extensible Markup Language. A markup language and file format for storing, transmitting, and reconstructing data. More information |
Zenodo | A multi-disciplinary open data repository maintained by CERN. More information |