Glossary


Access ConditionsConditions which specify who can access data and what they can do with that data. A well-governed archival repository has mechanisms in place to administer and implement such conditions which will be specified on a data license.
ADAAustralian Data Archive. A national service for the collection and preservation of digital research data.
More information
ADM+SARC Centre of Excellence for Automated Decision-Making and Society. It brings together universities, industry, government and the community to support the development of responsible, ethical and inclusive automated decision-making.
More information
ADOAustralian Digital Observatory. An ARDC platform working to establish a national infrastructure to support a diverse array of researchers, especially in the humanities, in accessing and working with dynamic digital data.
More information
AIATSISAustralian Institute of Aboriginal and Torres Strait Islander Studies. Australia’s only national institution focused exclusively on the diverse history, culture and heritage of Aboriginal and Torres Strait Island Australia, with a growing collection of over one million items, dedicated to Australian Aboriginal and Torres Strait Islander cultures, histories and contemporary stories.
More information
APIApplication Programming Interface. A way for computer programs to communicate with each other. It is a way for one computer or system to ask another computer or system to do something, like provide a dataset.
ARCAustralian Research Council. Its purpose is to grow knowledge and innovation for the benefit of the Australian community through funding the highest quality research, assessing the quality, engagement and impact of research, and providing advice on research matters.
More information
Archival RepositoryA location for the storage of data that has an appropriate governance regime in place.
ARCPArchive and Packaging ID. A globally unique, searchable ID with zero management overhead, but which can be used like URLs in linked data systems, but does not resolve to content in a browser.
ARDCAustralian Research Data Commons. The ARDC is Australia’s leading research data infrastructure facility accelerating Australian research and innovation by driving excellence in the creation, analysis and retention of high-quality data assets.
More information
ARDSAboriginal Resource and Development Services (ARDS Aboriginal Corporation). Its work champions the importance of language and culture in developing self-determination for Aboriginal people, and supports Aboriginal communities to increase control and understanding of mainstream services and systems.
More information
ArkistoA scalable, standards-based platform for sustainable data. Data on an Arkisto deployment is always available on disc (or object storage) with a complete description independently of any services such as websites or APIs. Once the data is safe and well-described, Arkisto has a flexible model for how data can be accessed using a variety of services. Built on top of RO-Crate and OCFL.
More information
See also: Oxford Common File Layout
See also: RO-Crate
ASRAutomatic Speech Recognition. ASR enables computers to process human spoken language into readable text, allowing users to operate devices through speech or facilitate translation of that speech into other languages.
ATAPAustralian Text Analytics Platform. An open source environment that provides researchers with tools and training for analysing, processing and exploring text.
More information
AustLangProvides a controlled vocabulary of persistent identifiers, a thesaurus of languages and peoples, and information about Aboriginal and Torres Strait Islander languages which has been assembled from referenced sources. Alphanumeric codes are used as persistent identifiers, while associated text strings are changeable and can reflect community preferences (including alternative names and spellings). In AustLang, Warlpiri has two codes: C15 for the language in general, and C15.1 for the variety named as Wakirti Warlpiri.
More information
BIBatchelor Institute of Indigenous Tertiary Education. The only First Nations dual sector tertiary education provider in Australia. The Institute gives precedence to its philosophy of Both Ways: positioning First Nations peoples as knowledge holders in all educational transactions with Western knowledge systems as well as privileging First Nations ways of learning and teaching to underpin engagement with mainstream education systems and society more broadly.
More information
BinderHubA Kubernetes-based cloud service that allows users to share reproducible interactive computing environments from code repositories. It is the primary technology behind Binder.
ATAP notebooks are made available using a Binder instance maintained by AARNet/Nectar.
More information
CADRECoordinated Access for Data, Researchers and Environments. A shared and distributed sensitive data access management platform for the social sciences and related disciplines.
More information
CARE

Four principles developed by the Global Indigenous Data Alliance (GIDA) to ensure that Indigenous communities have control over the application and use of Indigenous data and Indigenous Knowledge for collective benefit.
The principles specify four aspects of the respectful use of data:

  • Collective Benefit
  • Authority to Control
  • Responsibility
  • Ethics

More information
CDLCommunity Data Lab. CDL shares tools and datasets for collaborative HASS research projects that use data from archives, libraries and collections.
More information
CDUCharles Darwin University.
More information
CLARINCLARIN is a digital infrastructure offering data, tools and services to support research based on language resources. It is a European Research Infrastructure Consortium (ERIC).
More information
ClassIn linked data, a resource that represents a concept or entity. Classes in the LDAC Metadata Schema include CollectionEvent, CollectionProtocol, DataDepositLicense, DataLicense and DataReuseLicense.
CMDIComponent Metadata Infrastructure. Provides a standard for metadata within CLARIN. It draws on the earlier ISLE Metadata Initiative (IMDI), but CMDI adopts a more flexible approach where components are assembled into reusable profiles.
More information
CollectionA group of related Objects. Examples of collections include corpora, and sub-corpora, as well as aggregations of cultural objects such as PARADISEC collections, which bring together items collected in a region or a session with consultants.
ConfidentialityThe obligation to protect identity and privacy as recognised under Australian Law in the Privacy Act 1988.
More information
CopyrightThe legal right of the owner of intellectual property. In simpler terms, copyright is the right to copy. This means that the original creators of products and anyone they give authorisation to are the only ones with the exclusive right to reproduce the work.
Copyright OwnerThe creator of the work, and the person/institution who has the exclusive right to reproduce, publish, perform, communicate, and adapt or modify the work, for both commercial and non-commercial purposes. The copyright owner may be the same as the Data Steward.
CorpusA sizable collection of real-life examples of language selected to be a fair representation of the language or a particular linguistic genre. Use of the term generally implies that the material is in a form which can be read and manipulated by a computer.
Crate-OA browser-based editor that allows you to create and update RO-Crates using a web interface, and with metadata spreadsheets. It provides researchers with a relatively simple way to describe their data using the best practices in formal metadata description.
More information
Creative Commons LicensesA set of licenses that allow for data reusability under specified conditions regarding attribution, data sharing, commercialisation and data adaptation.
Data CollectionA set of data collected under similar conditions and brought together in a shared framework.
Data Commons

Cloud-based infrastructure coupled with governance strategies and principles that allow a community to use, share, manage and analyse its data.

LDaCA is a language data commons serving researchers and community groups that are interested in language data.

Data GovernanceThe policies and processes by which data is managed through its life cycle to ensure the quality, reliability, security, and sustainability of the data.
Data LicenseA legal arrangement between the creator of the data and the end-user specifying what users can do with the data.
More information
Data Management PlanA document that (1) outlines key information about a research project and its data, including the access conditions and ownership, storage, and future use and (2) sets out roles and responsibilities in its management.
Data OnboardingThe process by which language collections are catalogued in LDaCA, carried out collaboratively by the Data Steward and LDaCA.
Data PackagingThe application of widely used standards, for example, in terms of formats, metadata , and access conditions, to the collection data.
See also: Data Transformation
Data StewardAn individual or organisation with the authority to make decisions regarding the collection.
Data TransformationThe process of converting, cleansing, and structuring data into a usable format. Sometimes used as a synonym for Data Packaging.
See also: Data Packaging
Defined TermIn linked data, a metadata category that allows for a) accurate definitions of the values assigned to Properties, and b) grouping such definitions in DefinedTermSets, which can function as controlled vocabularies. DefinedTerms in the LDAC Metadata Schema include DerivedMaterial, PartOfSpeech, SignedLanguage, SpokenLanguage, etc.
Describo

A tool that allows you to create and update RO-Crates. It provides researchers with a relatively simple way to describe their data using the best practices in formal metadata description.

Superseded for project purposes by Crate-O.

DOIDigital Object Identifier. A type of Persistent Identifier (PID) which is becoming the default identifier for research datasets, as a long-lasting reference to the collection. It comprises a unique number made up of a prefix and a suffix separated by a forward slash, resolvable by displaying it as a link, e.g. https://doi.org/10.1000/182
ELANA software tool to make time-aligned annotations (which may be transcriptions) of audio and video recordings. The tool is commonly used by linguists and others who work with language.
More information
ElpisA tool to obtain a first-pass transcription of untranscribed audio. It brings cutting-edge speech recognition technology within reach of language workers and researchers who don’t have backgrounds in speech engineering.
More information
FAIR

Four key principles developed in 2016 with the aim of supporting the discovery and reuse of research data.

The principles encourage us to make data:

  • Findable
  • Accessible
  • Interoperable
  • Reusable

More information
Field Notebook/JournalA collection of fieldnotes compiled while completing fieldwork.
FieldnotesNotes taken by a researcher while conducting fieldwork that record their observations and other relevant information.
FieldworkThe collection of data from an environment where the data is likely to occur naturally or organically without the intervention of researchers. In linguistics, this typically involves studying a language as it is spoken by a community of speakers in a particular location.
FLAFirst Languages Australia. A national organisation working to ensure the strength of all Aboriginal and Torres Strait Islander languages.
More information
GitHubA developer platform that allows developers to create, store, manage and share their code, using Git software.
More information
GLAMGalleries, Libraries, Archives and Museums.
GLAM PeakA representative national body that brings together the representative bodies for Australia’s galleries, libraries, archives, museums, historical societies, cultural heritage organisations and research peak bodies.
More information
GLAM WorkbenchA suite of Jupyter notebooks developed by Tim Sherratt to help with exploring and using data from GLAM institutions. Primarily, the notebooks use data from Trove newspaper and magazine collections, but have some extensions beyond this.
More information
GlottologAn alternative catalogue of the world’s languages, language families and dialects - Glottolog uses the term languoid to cover all of these. Each languoid is assigned a unique identifier consisting of four alphanumeric characters and four digits. For example, (standard) French has the code stan1290, and Warlpiri is warl1254.
More information
HASSHumanities, Arts and Social Sciences.
HMIHuman Machine Interface. A user interface that connects a person to a machine, system or device. For example, in-car HMIs allow drivers to interact with their vehicle.
IADInstitute for Aboriginal Development (Aboriginal Corporation). An Aboriginal community-controlled organisation established as a cross-cultural adult education and training centre serving all Aboriginal people in Central Australia.
More information
IDILInternational Decade of Indigenous Languages. The United Nations General Assembly has declared the period between 2022 and 2032 as the International Decade of Indigenous Languages, to draw global attention to the critical status of Indigenous languages worldwide and encourage action for their revitalisation, promotion and ongoing use.
More information
IDNIndigenous Data Network. A national network of Aboriginal community-controlled organisations, university research partners, Indigenous businesses and government agencies and departments established to support and coordinate the governance of Indigenous data for Aboriginal and Torres Strait Islander peoples and empower Aboriginal and Torres Strait Islander communities to decide their own local data priorities.
More information
IIRCImproving Indigenous Research Capability. A project supporting the creation of an Aboriginal and Torres Strait Islander Research Data Commons.
More information
Intellectual PropertyCreative works protected by law via patents, copyright and trademarks.
InteroperabilityThe ability of computer systems or software to exchange and make use of information. The relevant FAIR principle uses the term specifically in relation to data.
IPAInternational Phonetic Alphabet. An alphabetic system of phonetic notation based primarily on the Latin script, designed as a standardised representation of speech sounds in written form.
ISO-639A standard by the International Organization for Standardization (ISO) concerned with representation of languages and language groups. An earlier version of this system used two-letter codes to identify languages; more recent versions use three-letter codes (referred to as ISO 639-3). The ISO 639-3 code for French is fra, and Warlpiri is wbp.
More information
JSONJavaScript Object Notation. A data-interchange text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages.
More information
Jupyter NotebookInteractive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media.
More information
LADALLanguage Technology and Data Analysis Laboratory. A free, open-source, collaborative support infrastructure for digital and computational humanities assisting anyone interested in working with language data in matters relating to data processing, visualization and analysis, and offering guidance on matters relating to language technology and digital research tools.
More information
LDACLanguage Data Commons. LDAC can refer either to the schema, profile or modes associated with it.
LDaCALanguage Data Commons of Australia. LDaCA is making nationally significant language data available for academic and non-academic use and providing a model for ensuring continued access with appropriate community control. Our preferred pronunciation of the name is el-dakka (and that is why you may find the odd alpaca on this website).
More information
Legacy (File) FormatAn old, outdated or obsolete file format that is no longer supported by modern hardware and/or software systems.
LexiconA list of forms in a language with associated information, such as meanings, pronunciations or word class assignments.
LicensingA process that allows the copyright owner of a work to share the right to access and use some material from the work without reassigning the ownership of the copyright. License terms establish the conditions for that access and use. A license for a data collection is the legal agreement between the creator of the data and the end-user specifying who can access, share and reuse the data, and other conditions as required.
Linked DataStructured data that is interlinked with other data and published in a machine-readable way to maximise interoperability and improve the precision of metadata.
MetadataThe information that defines and describes data. It provides data users with information about the purpose, processes, and methods involved in the data collection. (Source: Australian Bureau of Statistics).
ModeAlso called a Mode file. An implementation of an RO-Crate Profile consisting of a set of lightweight syntactic rules for combining Schema.org Style Schema (SOSS) Classes, Properties and DefinedTerms in a JSON file. Modes can be loaded to an editor such as Crate-O, used for RO-Crate validation or used to summarise rules for RO-Crate Profiles.
MTMachine Translation.
NCRISNational Collaborative Research Infrastructure Strategy. It provides strategic funding for national-scale research infrastructure, driving collaboration to bring economic, environmental, health and social benefits for Australia.
More information
NERNamed-Entity Recognition. NER locates and classifies named entities in unstructured text into predefined categories such as person names, organisations and locations.
NFSANational Film and Sound Archive. Australia’s national audiovisual cultural institution which collects, preserves and shares Australia’s audiovisual culture.
More information
NyingarnA 3-year Australian Research Council funded project that will provide digital access to early sources of Australia’s Indigenous languages, using various ways to turn images of manuscripts into text, including Optical Character Recognition (OCR), and crowdsourced transcription (using DigiVol).
More information
ObjectA single resource or a group of tightly related resources; for example, a work (document) in a written corpus, or the files associated with a dialogue or session in a speech study (recordings, transcriptions etc.).
OCFLOxford Common File Layout. An application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories.
More information
OCROptical Character Recognition. The electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text.
OLACOpen Language Archives Community. An international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.
More information
OniA portal for discovery of RO-Crated data. It is a web application which provides indexing, searching and access to secure data repositories which follow the Arkisto model.
More information
Oral HistoryThe gathering, recording and preserving of historical information, based on interviews about the experiences, memories and opinions of people who participated in or observed past events.
ORCIDOpen Researcher and Contributor ID. A registry providing globally unique persistent identifiers (PIDs) for researchers, authors and contributors of scholarly works.
More information
Orthographic TranscriptionA transcription method that employs the standard spelling system of each target language.
PARADISECPacific and Regional Archive for Digital Sources in Endangered Cultures. A digital archive that works to digitise, preserve and make accessible recordings that are at risk of loss; particularly for languages in the Pacific region.
More information
Phonemic TranscriptionA representation of speech in terms of the sound contrasts made in a language, using a phonetic alphabet, such as the International Phonetic Alphabet (IPA) or X-SAMPA.
Phonetic TranscriptionA representation of speech in terms of the sounds actually produced in specific instances, using a phonetic alphabet, such as the International Phonetic Alphabet (IPA) or X-SAMPA.
PIDPersistent Identifier. A digital identifier that is permanently assigned and provides a long-lasting reference to an object or entity, for example a Digital Object Identifier (DOI).
ProfileSpecifies a subset of a metadata standard for a particular use case, such as for describing language resources. LDaCA uses RO-Crate profiles, which are a set of conventions, types and properties that are required in RO-Crates. Specifically, the LDAC RO-Crate Metadata Profile provides the minimum structural metadata for describing language data resources.
PropertyIn linked data, a metadata category which is an attribute of an instance of a Classes. Properties in the LDAC Metadata Schema include author, communicationMode, linguisticGenre, speaker, signer, etc.
ProvenanceThe documented history or chain of custody of materials from their creation to their current location within a collection. The full history and ownership of an item from the time of its discovery or creation to the present day, through which authenticity and ownership are determined.
PythonA high-level, general-purpose programming language with an emphasis on code readability.
More information
QUTQueensland University of Technology.
More information
RA programming language and environment for statistical computing and graphics.
More information
RDCResearch Data Commons.
See also: ARDC
Research Data ManagementThe handling of data during and after a research activity including generating, collecting, organising, accessing, using, analysing, storing, disclosing, documenting, preserving, disposing of, sharing and re-using data.
REMSResource Entitlement Management System. A tool to help researchers browse resources such as datasets relevant to their research and to manage the application process for access to those resources.
Research InfrastructureThe facilities, systems, tools, platforms, equipment, instruments and other resources and services that are needed for research communities to conduct research. This can include both tangible assets, like supercomputers, and intangible assets, like data collections.
RIIPResearch Infrastructure Investment Plan (NCRIS). It provides continued support for Australia’s National Research Infrastructure facilities, as well as investment in emerging research priorities.
RO-CrateResearch Object Crate. A way of packaging research data that stores the data together with its associated metadata and other component files, such as the data license.
More information
SchemaSpecifies a metadata vocabulary of Classes and Properties, based on the RO-Crate specification’s use of Schema.org classes.
Sensitive Data

Data that, as a result of research, contains confidential or other ‘sensitive information’ which is defined in the Privacy Act as information or opinion about an individual’s:

  • racial or ethnic origin
  • political opinions
  • membership of a political association
  • religious beliefs or affiliations
  • philosophical beliefs
  • membership of a professional or trade association
  • membership of a trade union
  • sexual preferences or practices
  • criminal record
  • health information
  • genetic information
  • culturally sensitive data or data deemed sensitive by the data provider

More information
Takedown PolicyThe policy according to which data may be removed, or access may be adjusted in some way, and the steps by which this is implemented.
TK LabelsTraditional Knowledge Labels. An initiative for Indigenous communities and local organisations, allowing communities to express local and specific conditions for sharing and engaging in future research and relationships in ways that are consistent with already existing community rules, governance and protocols for using, sharing and circulating knowledge and data.
More information
ToolsCode or software developed in order to support or enhance (language) data accessibility and use.
TranscodingThe process of converting one digital encoding format to another, such as from a high-resolution image to a lower-resolution one.
TTSText-to-Speech. TTS generates an artificial spoken audio version of a written text and can be used to improve accessibility.
UoMThe University of Melbourne.
More information
UQThe University of Queensland.
More information
USCUniversity of the Sunshine Coast.
More information
USydThe University of Sydney.
More information
UWAThe University of Western Australia.
More information
VoIPVoice over Internet Protocol. A technology allowing phone calls to be made through the Internet using a broadband connection, rather than through a landline or mobile network.
Wangka MayaWangka Maya Pilbara Aboriginal Language Centre. It aims to be recognised as a leading Aboriginal language and resource centre in Australia, using expertise, knowledge and sensitivity to record and foster Aboriginal languages, culture and history.
More information
Work PlanAn agreement between LDaCA and the Data Steward establishing the terms according to which the data will be onboarded to LDaCA, including the goals and responsibilities of each party, and the steps and timeline for carrying out the onboarding process.
WPWork package within a funded project.
X-SAMPAExtended Speech Assessment Methods Phonetic Alphabet. A phonetic script designed to extend SAMPA to cover the range of characters in the International Phonetic Alphabet (IPA).
XMLExtensible Markup Language. A markup language and file format for storing, transmitting, and reconstructing data.
More information
ZenodoA multi-disciplinary open data repository maintained by CERN.
More information