Spreadsheet Upload


Template
Tab Breakdown
Column Breakdown
Upload Spreadsheet to an RO-Crate with Crate-O

Template

For collections where there are a lot of interconnected objects and files, it may be easier or preferable to add the metadata for these via uploading a spreadsheet to an existing RO-Crate in Crate-O, rather than adding these items manually. An RO-Crate metadata spreadsheet template can be downloaded below and populated with metadata specific to your collection:


ro-crate-metadata-template.xlsx


Spreadsheet upload currently only has functionality to add new data, and cannot overwrite or edit existing data in your RO-Crate.

The template is based on an example data collection that contains three types of files within each object:

  • Audio files (WAV), the primary material
  • Text files (CSV), transcriptions of the audio files
  • ELAN files (EAF), linguistic annotations of the audio files

Tab Breakdown

The spreadsheet has the below tabs by default, but depending on your collection, you may need to add additional tabs, or others may not be applicable.


TabDescription
RootMetadata about the root or top level of the collection.
AuthorsMetadata about the person or organisation responsible for creating this collection.
PublishersMetadata about the organisation responsible for releasing this collection.
LicensesMetadata about the license(s) within the collection; both for the objects and files, and for the collection’s metadata.
PeopleMetadata about the people within the collection.
PlacesMetadata about the places within the collection.
LocalitiesMetadata about the geometric location data within the collection.
ObjectsMetadata about the entities within the collection that could encompass one or more files.
FilesMetadata about the files in your collection. If the collection has multiple file formats, duplicate this tab and add the formats to the tab names, e.g. csv_files, eaf_files, wav_files.

ELAN (.eaf) files can have relative or absolute paths to the data they relate to. The ELAN preferences file is generally not needed for the collection and relates to the particular ELAN user only.


Below the header, an example row is included to illustrate how the section can be filled. The example row is colour-coded according to whether the column:

  • requires the user to input data (blue)
  • is pre-filled with a formula or static value and doesn’t require editing (green).

HINT: Highlight the example row and drag it down to copy all the pre-filled cells. Don’t forget to remove the example rows before you upload your spreadsheet to Crate-O!


At a minimum, it’s best practice to include @id and @type columns in each of your spreadsheet tabs, as these appear in Crate-O for each of the entities. The tables in the next section provide further details on what constitutes a valid @id and @type in each tab. For more detailed lists of these, see Metadata for Language Data.

HINT: To type a column name beginning with @ in Excel, put an apostrophe before it '@. This will force it to be recognised as a text value rather than a formula.


The columns provided in the template tabs are illustrative only and may not all apply to your collection; please edit these as needed. Where a column header begins with a full stop (.), this indicates that the column will be ignored when the data is loaded into Crate-O and will not appear in the RO-Crate. This can be helpful if you want to retain other information in your spreadsheet that may not be in a format applicable to the RO-Crate.


Column Breakdown

The section below describes each of the columns included in the template, ordered by tab. Please note that the columns provided in the template tabs are illustrative only and should be edited according to the requirements of your collection.


Root

The root tab provides information about the top level of the collection. Unlike the other tabs, the root tab can only have one row, so if there are columns that require more than one value, duplicate that column.

ColumnTypeDescription
@idData entryPersistent, managed unique ID in URL format (if available), for example, a DOI for a collection.
@typePre-filledThe type of the collection. Both Dataset and RepositoryCollection are required.
nameData entryThe name of this collection.
descriptionData entryAn abstract of the collection. Include as much detail as possible about the motivation and use of the dataset, including things that we do not yet have properties for.
doiData entryA Digital Object Identifier, e.g. https://doi.org/10.1000/182.
isRef_authorPre-filledGenerated from the @id column in the Authors tab.
isRef_publisherPre-filledGenerated from the @id column in the Publishers tab.
isRef_licensePre-filledGenerated from the @id column in the Licenses tab.
datePublishedData entryThe date the object was published. The date should be in the ISO 8601 format YYYY-MM-DD.
inLanguageData entryThe language in which the resource is written. For example, a work about the Italian language as used in Australia (subjectLanguage) that is written in English (inLanguage).
subjectLanguageData entryThe languages that the materials in the collection are about (not the language that it is in). For example, a work about the Italian language as used in Australia (subjectLanguage) that is written in English (inLanguage).

The prefix isRef_ indicates that data in this column should be taken from another @id field in the spreadsheet. For example, isRef_author uses the @id from the Author tab to link all the author details to the Root tab.


Authors

An author is a person or organisation responsible for creating the collection. It is possible for collections to have multiple authors.

ColumnTypeDescription
@idData entryPersistent, managed unique ID in URL format (if available), for example, an ROR for an organisation or an ORCID, personal home page URL or email address for a person.
@typeData entryThe type of the author. Either Person or Organization can be selected.
nameData entryThe name of the author. Don’t include titles such as Dr/Prof.

Publishers

A publisher is an organisation responsible for releasing the collection. It is possible for collections to have multiple publishers.

ColumnTypeDescription
@idData entryPersistent, managed unique ID in URL format (if available), for example, an ROR for an organisation.
@typePre-filledThe type of the publisher. Only Organization is valid.
nameData entryThe name of the organisation.

Licenses

A license for a collection establishes the conditions for who can access, share and reuse the data, and other conditions as required. It is a legal arrangement between the creator of the data and the end-user specifying what users can do with the data.

ColumnTypeDescription
@idData entryA URL to a version of the license (if available), for example, a URL of a Creative Commons license. If there is no URL, a license.txt file containing the text of the license needs to be included in the repository, and license.txt should be added as the @id.
@typePre-filledThe type of the license. Only DataReuseLicense is valid.
nameData entryThe name of the license.
descriptionData entryA description of the license.
metadataIsPublicData entryDetermines whether the collection metadata can be viewed publicly. Requires a Boolean value (TRUE or FALSE).
allowTextIndexData entryDetermines whether the collection text can be indexed for search purposes. Requires a Boolean value (TRUE or FALSE).

It is possible to leave the licensing tab blank if these details are still being finalised for the collection, however, this will need to be amended later in Crate-O.

For custom licenses (i.e. those specific to a particular collection), it is recommended that a copy of the license be included in the repository to ensure that it remains accessible. Furthermore, if there are any additional usage restrictions or options for use outside of a given license, this information can be included in a usageInfo field, e.g. “For any use not permitted by the CC-BY-ND 4.0 License, please contact the Data Steward”.


People

This tab contains information about the people within the collection.

ColumnTypeDescription
@idPre-filledA unique identifier for the person, generated from the name column. Identifiers should be prefixed with #.
@typePre-filledThe type of the entity. Only Person is valid.
nameData entryThe name of the person.
genderData entryThe gender of the person. An example of an optional metadata field from the source data.
birthDateData entryThe birth date (year) of the person. An example of an optional metadata field from the source data.
isRef_specializationOfData entryA reference to another Person entity, used for collections where a person appears more than once with different demographic info (e.g. a different age). In these collections, there should be a ‘canonical’ person for each participant and another Person entity each time they participate, with different ages or other statuses.

Places

This tab contains information about the places within the collection.

ColumnTypeDescription
@idData entryA unique identifier for the place. Identifiers should be prefixed with #.
@typePre-filledThe type of the entity. Only Place is valid.
nameData entryThe name of the place.
descriptionData entryA description of the place, including its alternative names.
isRef_geoData entryThe @id of the location to which this object relates from the Localities tab.

Localities

This tab contains information about the geometric locations within the collection.

ColumnTypeDescription
@idData entryA unique identifier for the location. Identifiers should be prefixed with #.
@typePre-filledThe type of the entity. Only Geometry is valid.
.latitudeData entryThe latitude of the location in decimal degree format.
.longitudeData entryThe longitude of the location in decimal degree format.
asWKTPre-filledThe WKT serialisation of the geometry, generated from the .latitude and .longitude columns. Note that asWKT format lists longitude first followed by latitude.

Objects

An object is a single resource or a group of tightly related resources in a collection. For example, a work (document) in a written corpus, or the files associated with a dialogue or session in a speech study (recordings, transcriptions etc.). Some systems, such as PARADISEC, refer to Objects as Items or may use other terms.

ColumnTypeDescription
@idPre-filledA unique identifier for the object, generated from the name column. Identifiers should be prefixed with #.
@typePre-filledThe type of the entity. Only RepositoryObject is valid.
nameData entryThe name of the object.
descriptionData entryA description of the object.
isRef_speakerPre-filledGenerated from the .pseudonym column with # prefixed.
.pseudonymData entryAn example of a column from a data steward’s source data, so that speakers in the collection are anonymised.
datePublishedData entryThe date the object was published. The date should be in ISO 8601 format YYYY-MM-DD.
isRef_pcdm:memberOfPre-filledThe collection this object is a member of, generated from the @id column in the Root tab. Or if the collection contains sub-collections, a reference to another RepositoryCollection @id.
isRef_licenseData entryThe @id of the license to which this object adheres from the Licenses tab.
isRef_indexableTextData entryIdentifies which of the files in the given object has content that is indexed for search purposes. For example, in the template, the content of the CSV file would be searchable, whereas the EAF and WAV files would not. If isRef_indexableText is not included in a collection, search will only run on the metadata and not the transcript file content.
isRef_contentLocationData entryThe @id of the place to which this object relates from the Places tab.
inLanguageData entryThe language in which the resource is written. For example, a work about the Italian language as used in Australia (subjectLanguage) that is written in English (inLanguage).
subjectLanguageData entryThe languages that the materials in the collection are about (not the language that it is in). For example, a work about the Italian language as used in Australia (subjectLanguage) that is written in English (inLanguage).

Files (CSV, EAF, WAV)

A file is a container for data and can store data in different formats. For example, a single object could have an audio file as well as a text file containing a transcription of the audio. Three examples of file tabs are included in the template, and their columns are combined in the table below.

TabColumnTypeDescription
CSV, EAF, WAV@idPre-filledThe filepath to the given file. Generated from the .folder, .filename and .postfix columns.
CSV, EAF, WAV@typePre-filledThe type of the entity. Only File is valid.
CSV, EAF, WAV.folderData entryThe folder name in which the given file appears.
CSV, EAF, WAV.filenameData entryThe name of the given file, without postfixes.
CSV, EAF, WAV.postfixData entryThe file format of the given file, for example, .csv, .eaf, .wav.
CSV, EAFisType_AnnotationData entryIndicates whether the given file is an annotation of another file. Requires a Boolean value (TRUE or FALSE).
WAVisType_PrimaryMaterialData entryIndicates whether the given file is the object of study, such as a literary work, film, or recording of natural discourse. Requires a Boolean value (TRUE or FALSE).
CSV, EAF, WAVisRef_isPartOfPre-filledSpecifies the object that the file is a part of. Template example is generated from the .filename column. If entering manually, note that this field is case-sensitive.
CSV, EAFisRef_annotationOfData entryThe full filename of the primary material that the given file is an annotation of.
CSV, EAF, WAV.objectIdPre-filledGenerated from the .filename column.

Upload Spreadsheet to an RO-Crate with Crate-O

For steps on adding your spreadsheet data to an existing RO-Crate, see Append Data from Spreadsheet.