Advanced Keyword Index Final Report
John Cocklin, Barbara DeFelice, Barbara Reed, Karen Sluzenski, Cecilia Tittemore, Jennifer Merrill (Chair)
The Advanced Keyword Index group is charged with determining which data from the INNOPAC bibliographic record should be included in the INNOPAC Advanced Keyword Index. To complete this work, the group will:
- become familiar with the current Author, Title and Subject indexes which form the base for the Advanced Keyword Index
- decide which additional fields to include to create the final keyword index.
Brief description of the Advanced Keyword Index product
The Advanced Keyword Index (AKI) builds on the existing Author, Title and Subject phrases indexes in INNOPAC. Whatever is included in those phrase indexes will be included in the Author, Title and Subject slices of the AKI. Any additional data from the bibliographic record that the site wants indexed can be included in the "Additional keywords" slice. Data from all-numeric fields are not included.
Data in attached records; item, checkin, order cannot be included in the AKI and will not be indexed.
Searchers can search across the entire AKI (similar to our All Indexes search), or can restrict their search to one or more of the slices. For example: author=Smith and title=The Book of Rhymes. Boolean operators are supported as are parentheses to group terms in complicated searches. (See Appendix A for a complete description of the product).
The group worked through the list of data indexed in the BRS database and compared that to what is currently included in the Author, Title and Subject phrase indexes in INNOPAC. Fields containing all numeric data, and data from attached records, were discarded.What was left over was the data in these BRS indexes: Collation, Edition, Place of Publication, Publisher, Source, Notes, and MeSH and other minor subject headings (which are not included in the Subject phrase index.)
For each of these BRS indexes the group looked at the data represented by that index and tried to determine whether it would be useful to patrons to include it in the Advanced Keyword Index. Since it is not possible to search for this data with any more precision than specifying that it appear in the Additional keywords slice, the group had to consider the effects of false drops on search results.
The group felt it was important to be able to retrieve records based on whether they had illustrations or maps within the text. This information is only available in the Collation field, and in the fixed fields. Because it is not yet known whether fixed field data can be used by the INNOPAC, the group decided to index the Collation field in AKI.
The group did not feel that it was important to include edition information in the AKI.
Place of Publication
There was a lot of interest in including Place in the keyword index because it can be useful when little is known about the piece other than where it was published, but concluded that the huge number of false drops for highly posted places would be detrimental to the typical user. (Consider books about New York vs. books published in New York.)
Acquisitions requested that Publisher be included in the keyword index and the group agreed.
Innovative supplies a list of MARC tags representing Notes data for sites to use as a starting point to determine what to index. We compared this list to what is included in Notes searches in the BRS database and added in the MARC fiels that were missing. The Additional keywords slice of our AKI will include everything that was in the Notes index in BRS with the exception of Local Notes. Local Notes are in the item record and cannot be indexed.
Source can contain related author information such as illustrator or translater that may not be reflected anywhere else in the record. For this reason, the group chose to include Source in the AKI.
Medical Subject Headings (MeSH) are not included in our Subject phrase index because they conflict with Authority control. (MeSH headings will appear in the subject phrase indes as "redirects", or see-references, to equivalent LC subject headings after authority records are loaded into the Innopac. This will potentially increase the MeSH vocabulary available in the Subject index.) Since MeSH are excluded from the Subject phrase index, they will also be excluded from the Subject slice of the AKI. The group decided to include them in the Additional keywords slice.
In the course of educating ourselves about the Advanced Keyword Search product the group made some observations that they would like to pass on to the Web OPAC Implementation Team. Some of these observations are a result of our decision on what to include in the index. Others come from searching other Innovative sites that have already implemented this index.
- Searching across all slices of the keyword index is not comparable to an All Indexes search in the BRS database because
- the keyword index does not index the entire record
- numeric indexes (call numbers, ISSN, ISBN, etc) are not included in the keyword index
- The default boolean operator if none is specified by the user is "adjacent". In our local catalog the default operator is usually either "and" or "same" depending on which index you are searching.
- Our users are accustomed to doing keyword searches now. How do we teach our users about phrase searching vs keyword searching? What does the user think will happen if they choose to do a keyword search?
- There is no Topic search.
- The keyword index does not reference authority records. Users need to search the phrase indexes to get cross references.
- When examining search results we noticed that search terms were highlighted in all fields, not just fields searched. Innovative confirmed that this is the current behavior, but that they are considering an enhancement to the system that will change this. (They were not specific as to the nature of the enhancement.)
List of tags to include in the "Additional keywords" slice.
This list of tags is intended to isolate Medical Subject Headings (MeSH), identified in the MARC format by a second indicator value of 2 in subject heading fields. The first indicator value "." above means any first indicator is acceptable for inclusion in this index.
541 all subfields except e