Navigation Menu

Skip to content

Index Specification File

haschart edited this page Nov 20, 2016 · 5 revisions

Which MARC fields and subfields are to be mapped to which Solr index fields is handled via an index specification file, which is specified via the -config command line argument. Since the index specification file is a properties file, there are certain constraints of how the file is structured. Basically all properties files consist of a number of pairs of values separated by an equals sign. The key values represent the name of the field that will be added to the Solr document. All of these must either match a field definition that occurs in the schema.xml file for the Solr index, or they must match a dynamicField definition in that file.

If any Solr field entries are listed here that do not match either a field definition or a dynamicField definition from the schema.xml file, the indexing will fail for those records. Additionally if multiple values are provided for a given field and that field in the schema.xml is not marked as multiValued, the indexing of that record will fail. Other indexing errors can occur that will not prevent the creation of a Solr index. For example, if a MARC record cannot be read due to a munged leader field, an error message is printed and the record is skipped.

The text below shows examples of index specifications from the previous version of SolrMarc. These are all still supported by the new version, but for simple, field based extraction specifications (like most of those below), there are many new options available that provide richer and more powerful capabilities for controlling how the data is extracted.

id = 001, first
author_text = 100a:110a:111a:130a
author_display = 100a:110a
published_text = 260a
material_type_text = 300a
notes_text = 500a:505a
uniform_title_text = 240a:240b
uniform_title_display = 240a
uniform_subtitle_display = 240b
marc_display = FullRecordAsXML
marc_text = custom, getAllSearchableFields(100, 900)
source_facet = "Library Catalog"

Constant Field Specification

The simplest index field specification is when a quoted string appears after the equals sign. Everything that appears in the quotes is taken verbatim and added to the specified field on the Solr index. So in the above example, every record added to the Solr index by this program will have a value of “Library Catalog” stored for the index field named source_facet. This can be useful when data from several different sources is being added to the same Solr index, to allow searchers to narrow their search to data from one or another of the sources.

If you need to define several values of a single Solr field in this way, you can include several quoted strings, separated by the vertical bar character.

format_facet = "Online"|"EBook"|"Government Document"

Full Record Specification

To include the entire contents of the MARC record that is being indexed in the Solr index use one of the following special purpose index specifications:

marc_display = raw  
marc_display = FullRecordAsMARC

specifies that the entire MARC record should be added in the standard MARC binary form (ISO 2709).

marc_display = xml
marc_display = FullRecordAsXML

specifies that the entire MARC record should be added encoded using the MARCXML standard.

marc_display = text
marc_display = FullRecordAsText

specifies that the entire MARC record should be translated to a readable format, and stored, (with
tags being inserted in place of newline characters.

marc_display = json
marc_display = FullRecordAsJSON

specifies that the entire MARC record should be added encoded using the MARCinJSON standard.

Date indexed

A similar special purpose index specification can be used to record the date the record is indexed as a field in the Solr index. The date that is used is computed once as a batch of records is being read, and that value (formatted as yyyymmddhhmm ) will be used for all of the records in the batch.

date_indexed_facet = index_date
date_indexed_facet = dateRecordIndexed

Custom Indexing Methods

If the first entry after the equals sign is "custom" or "script" then the specification references custom code (or a custom beanshell script) that will perform the work of extracting values to use for the specified Solr field. This custom code can be one provided by SolrMarc for backwards-compatibility, one defined in a jar file that you provide, or one for which you provide the java source code, which will be compiled at run time.

marc_text = custom, getAllSearchableFields(100, 900)
video_director_facet = custom(org.solrmarc.mixin.DirectorMixin), getVideoDirector 
first_date_text = script(getdate.bsh) getFirstDate

More details of how to create a custom method and how to reference it in an index specification can be found on this Wiki page

Field based extraction specifications

If the value after the equals sign is not one of these special cases entries, it is assumed to be a list of MARC fields from which to extract the data to use for the index field. Details of how to create these specification can be found on this Wiki page