Table of Contents
Exporting data in RIMMF6
This page applies to RIMMF6 release 20230630 and later.
Export records options
In RIMMF, an option to export records is available:
- When viewing the EI
- When viewing an R-Tree
- When viewing a Manifestation 1)
This page describes
- how the program stores your data, and
- the data options available for an export.
Your RIMMF data
Data format
When an entity record is created and saved in RIMMF, the data is stored in a diskfile using the N-Triples format: 'a line-based, plain-text serialisation format for RDF graphs'–wikipedia. These N-Triples files use the windows file extension '.nt'. Note: although the 'official' internet media type for an .nt file is “application/n-triples”, they are in essence “text/plain” files. The .nt files produced by RIMMF can be opened, viewed, edited, etc., by any text editor.
As to the actual data within these files, we apply the following conventions.
Unicode characters
Unicode characters are stored in escaped format. This means that the characters which comprise the displayed string
John Le Carré
will be stored as
John Le Carr\u00E9
In the example above, “\u00E9” represents the hexadecimal Unicode code point for “é”. The hexadecimal number must be exactly four digits long2). When converting and deconverting between unicode characters and their escaped representations, we use Normalization Form C.
RDA Elements
RIMMF stores RDA Elements, or properties, using opaque URIs. For example, when storing a triple for the RDA element 'Title of work', the URI
http://rdaregistry.info/Elements/w/P10088
will be used.
In addition, RIMMF stores only canonical RDA elements. We do not store triples using the object or datatype subclasses typically defined for each RDA entity. (Data that is imported to RIMMF using these subclasses will be mapped to the corresponding canonical class).
Statements about statements
As you know, a triple is a statement, comprised of a subject, predicate (or property), and object (value). In order to support provenance, we need a way to uniquely identify each statement. In RIMMF, the N-Quads format is used to assign unique identifiers to statements. N-Quads are an extention of N-Triples; the fourth part, called a graphLabel, is appended after the object. The 'quad' assigned by RIMMF is always a unique IRI. N-Quads should be compatible with all applications that support N-Triples.
The term used in RDF for saying 'something about' a statement is reification. More on this later.
RIMMF-specific data and metadata
The subject of every RIMMF statement is assigned the namespace:
http://rimmfdata.com/
Subdomains are used to categorise statements, as follows:
- http://rimmfdata.com/r – statements in the record
- http://rimmfdata.com/m – program metadata
The program metadata assigned to the '/m' namespace is data that RIMMF uses internally: the version of RIMMF used to create the record, the windows filename, tne entity template used to display it, and so on. An application on the receiving end of this data can safely ignore triples in the the '/m' namespace.
Export options
The next three options on the form refer to various combinations of two processing features:
- LexicalAlias properties, and
- RDF reification vocabulary.
Select 'LexicalAlias properties' to export the selected records using a human-readable string instead of an opaqueId for RDA elements. For example, without this option, the property used for “Title of Work” would be:
http://rdaregistry.info/Elements/w/P10088
Whereas if this option is selected, the same property would be rendered as:
http://rdaregistry.info/Elements/w/titleOfWork.en
The main reason to use this option is to facilitate data analysis–since an NTriples
When a '.txt' file is dragged and dropped on RIMMF's 'Import records' form, a new folder will be created, and the contents of the file will be added as individual records in that folder, and appear in a new EI.
If 'Format for RIMMF' is not checked, the selected records will be export to a file with the '.nt' file extention, and a spacing line will not be output after each record. This exports the data as a single batch of N-Triples.
When an '.nt' file is dragged and dropped on RIMMF's 'Import records' form, a new folder will be created, as above, and the contents of the file will be added as a single record.
'.nt' might be the more useful format to use when sharing records with a non-RIMMF application.
Notes
Both the '.txt' and '.nt' formats use the N-Triples syntax. RIMMF exports N-triples as 'text/plain' (any character outside US-ASCII will be escaped); thus, either format can be opened in any text editor.
The 'spacing' line mentioned above may contain an arbitrary string, such as '0000', used as a marker for the program to determine when one 'record' ends and the next begins. Other than this, the '.txt' file is exactly the same as the '.nt' file.
About '.zip' files
In the past, RIMMF supported '.zip' versions of an export. We haven't entirely ruled this out as a future option; but at present, given the relatively small file sizes involved, and the blocking of '.zip' email attachments by many institutions, there's not a dire need to support this in R4.
If there is a need to export a large EI, the user can manually generate a '.zip' from either of the two export formats above.
The RIMMF4 Import process still supports dragging and dropping a '.zip' file onto it.
+ When viewing a Manifestation (export all records in set)
If 'RIMMF' is checked (which is the default), the selected records will be exported as N-Triples; the output file will be assigned a '.txt' file extension, and a spacing line will be output after each record. This extra line maintains the RIMMF distinction between 'records'; otherwise the exported data would simply be a stream of N-Triples. '.txt' is a convenient format for sharing records with other RIMMF users (via email, etc.).
If 'N-Triples' is checked, the selected records will be exported as N-Triples and the output file will be assigned a '.nt' file extension. There is no other difference between this option and the default–except for the file extension and the empty line between records, the exported data is exactly the same in both cases.
, although there are as many as four other different conventions for reification 3).