UNCLASSIFIED

Guidelines for Mandrake II Metadata
Version 1.2

Edited for GOL Metadata Presentation, September 18, 2000

Technical Working Group
April 27, 1999, Revised: September 15, 2000, January 5, 2000
CSIS 950-91
Please send comments to Cameron Price


Table of Contents

1.0 Introduction

2.0 Purpose and Use

3.0 Canadian and International Metadata Standards

     Canadian Government Information Locator Service (GILS)
     The Dublin Core
     Intelink: Guidelines for Intelink Metadata

4.0 CSIS Usage of Metadata

5.0 Comparison of Various Metadata Schema

6.0 Mandrake II Metadata Tag Definitions

7.0 Mandrake II Metadata Tag Thesaurus

     
    MANDATORY OPTIONAL AUTOMATIC FUTURE
    Abstract ApprovalAuthority Description SecurityCompartment
    Agency Classification DocumentID  
    Author DocumentID Last-Modified  
    EffectiveDate Expires PublishDate  
    ExpiryDate Language URL  
    Keywords PID    
    SecurityCaveat PublishDate    
    SecurityClassification Robots    
    Title      
    8.0 Conclusion and Recommendations
 

    Appendix A - Mandrake II MetaSet Examples

    Appendix B - Java Metadata Creation Tool
 
 

1.0 Introduction

Information is generally stored within a web site in a hierarchical structure.  A series of menus and sub menus are used to locate specific information within each site, guided by hypertext and graphical links.  As the data in a web site grows, it becomes increasingly more difficult to locate specific information.  In addition, when each Organization's web site is unique in format and does not follow a standard site structure or navigational technique, this can add to the difficulty in locating individual or related information.  This is not uncommon and parallels the diversity found on the Internet.  Even an indexed search capability cannot effectively retrieve only the intended data records, despite the indexing of all information records.  The intelligence analyst is often searching for related information across the various contributing Organizations, irrespective of the data structures and does not have the time to filter through hundreds of documents to find specific information.  There is a strong desire to deliver items of interest accurately, quickly and easily to our user community.

Rather than searching an index of the entire text or some arbitrary portion of each document or information resource description, the precision and relevance of search results could be improved by restricting the search to an indexed subset that categorizes the information -  in other words data about the data.  This identifying information is referred to as Metadata using the HTML META command.  Metadata identifies specific elements of an information resource such as the title, the author, the subject, the creation date, etc.   This information, if consistently included in each web data record, for each Mandrake II site, will immediately improve the quality of individual searches and can potentially be important for future Mandrake II enhancements such as user profiles and restricting document access.

^
 

2.0 Purpose and Use

The metadata tags appear within the HTML HEAD tag and are not displayed when the HTML page is displayed in a browser.  Metadata can act as a comment, document descriptor or more importantly as a search qualifier.  Many Internet search engines; such as Netscape COMPASS Server, allow metadata tags to be used within an advanced search.  The values assigned within each metadata tag can be displayed and even concatenated when a web page is referenced on a screen or is printed.  If using Netscape Composer as your content editor, the proper attibute (NAME or HTTP-EQUIV) will be generated.

All Mandrake II metadata tags will be of the formats:

<META NAME="xxx' CONTENT="yyy">   or
<META HTTP-EQUIV="xxx' CONTENT="yyy">

where xxx is the name of the metadata element and yyy specifies the content value of the element.  Multiple CONTENT values can be entered for a single NAME, delimited by a comma.

There are four categories of Mandrake metadata tags:

MANDATORY - These tags are required to meet Mandrake minimum metadata standards.

OPTIONAL - Optional tags will be provided as a means of providing additional information which can aid in registration of the product or to aid in its discovery during the search phase of access.  Optional tags can be used effectively to better describe or qualify the document.

AUTOMATIC - These metadata tags are automatically created by the web publishing application or server and are used to store data in a publishing database for subsequent retrieval and modification.  Some of these metadata tags are visible and can be used dynamically within a web document and others are used internally by the web publishing application.

FUTURE - These metadata tags are reserved for future use pending a decision within the Mandrake II community to implement these new features.

^
 

3.0 Canadian and International Standards

There is no universal list of metadata tags, nor is there any lexicon for the content values.  There are however several initiatives that suggest what mandatory and optional metadata tags should be incorporated into a site or Extranet document structure.  Three such examples follow that show how this technique has been deployed.

The Canadian Government Information Locator Service - external reference http://gils.gc.ca

In August 1995, Treasury Board recognized a need to establish a primary Government of Canada Internet site and requested the Government Telecommunications and Informatics Services (GTIS) to develop and maintain gateway services to government information.

Based on a U.S. standard, adapted to meet Canadian government needs, The Government Information Locator Service (GILS) provides users with the means of finding government information located in local and remote systems.  GILS is a computer platform independent system for locating government information in a decentralized collection of databases. GILS systems or locators are made up of searchable databases of GILS records which indicate what information is available, where it is located and how it may be accessed or acquired. A GILS record is not the information itself, but a standards-compliant description and a pointer to an information resource. GILS records can describe a collection, a service, a system, a Web site, a publication or an individual electronic document. They can contain a direct link (Uniform Resource Locator or URL) to a networked information resource. They can also describe how to obtain information that is not available on an electronic network such as the Internet or a departmental Intranet.

GILS originated in the United States and all U.S. federal government agencies were required by law to implement this government-wide standard beginning January 1996.  GILS is a method of addressing the perceived need for a government-wide metadata standard, the Treasury Board Internet Advisory Committee and the Electronic Document Standards Working Group (EDSWG) agreed that a GILS Subgroup (GSG) should be established in November 1995 within the Government Standards Program.
^

The Dublin Core: A Simple Content Description Model for Electronic Resources - external reference http://purl.org/DC/

The Dublin Core is a metadata element set intended to facilitate discovery of electronic resources. Originally conceived for author-generated description of Web resources, it has attracted the attention of formal resource description communities such as museums, libraries, government agencies, and commercial organizations.

The building of an interdisciplinary, international consensus around a core element set is the central feature of the Dublin Core. The characteristics of the Dublin Core that distinguish it as a prominent candidate for description of electronic resources fall into several categories:

Simplicity - The Dublin Core is intended to be usable by non-catalogers as well as resource description specialists.  Most of the elements have a commonly understood semantics of roughly the complexity of a library catalog card.

Semantic Interoperability - In the Internet Commons, disparate description models interfere with the ability to search across discipline boundaries. Promoting a commonly understood set of descriptors that helps to unify other data content standards increases the possibility of semantic interoperability across disciplines.

International Consensus - Recognition of the international scope of resource discovery on the Web is critical to the development of effective discovery infrastructure. The Dublin Core benefits from active participation and promotion in some 20 countries in North America, Europe, Australia, and Asia.

Extensibility - The Dublin Core provides an economical alternative to more elaborate description models such as the full MARC cataloging of the library world. Additionally, it includes sufficient flexibility and extensibility to encode the structure and more elaborate semantics inherent in richer description standards

Metadata Modularity on the Web - The diversity of metadata needs on the Web requires an infrastructure that supports the coexistence of complementary, independently maintained metadata packages. The World Wide Web Consortium (W3C) has begun implementing an architecture for metadata for the Web. The Resource Description  Framework, or RDF, is designed to support the many different metadata needs of vendors and information providers. Representatives of the Dublin Core effort are actively involved in the development of this architecture, bringing the digital library perspective to bear on this important component of the Web infrastructure.

^

Intelink: Guidelines for Intelink Metadata Version 1.0  July 1997

The US Intelink web structure also recognized that metadata tags are a necessary component of their information structure.  Intelink has assigned "required", "required if applicable", "optional" and "future" metadata tags that are stored within each Intelink web document.  The requirement to use metadata tags within Intelink is summarized as:  "One of the major complaints being heard from the Intelink user community is the inability to quickly, and easily, find items of interest.  The Intelink Management Office (IMO) has taken this complaint to heart and has several efforts underway to improve Intelink in this regard.  ....  Metadata and document tags are intended to provide greater insight into the content and structure of information, and thereby, to facilitate discovery by query and retrieval tools (e.g., search engines).  In the future, as these query and retrieval tools are used in conjunction with sophisticated, "tailored" capabilities such as user profiles, metadata will become even more important and useful".

^
 

4.0 CSIS Usage of Metadata

The first iteration of the CSIS Mandrake II web site used a limited subset of metadata tags, based on the Dublin Core standard.  These were entered as the document was "submitted" to the web "publisher" and verified prior to posting the document onto the web site.  Many of the CSIS metadata tags were dynamically used for fields displayed on the document templates.  The security classification, entered in a similar fashion to the other metadata tags, became the HTML <TITLE> tag so that it would appear on each display screen and on printed output.  The initial metadata tags consisted of: Generator, DocumentID, Author, Expires, Title, Keywords, and Abstract.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO 8859-1">
<META NAME="Generator" CONTENT="Simba CSIS">
<TITLE>Secret CEO</TITLE>
<META NAME="DocumentID" CONTENT="260">
<META NAME="Author" CONTENT="jc.........">
<META NAME="Expires" CONTENT="2003/08/04">
<META HTTP-EQUIV="Expires" CONTENT="2003/08/04">
<META NAME="Title" CONTENT="980720- Science and Technology ........................">
<META NAME="Keywords" CONTENT="........espionage science technology ..........">
<META NAME="Abstract" CONTENT="(Information effective 20 July 1998)
The PRC is continuing ..............................
............
</HEAD>

<BODY BGCOLOR="#FFFFFF">
 .
^
 

5.0 Comparison of Various Metadata Schema

The revised Mandrake II metadata tag summary is defined in the following table, compared to other metadata implementations.
 
MANDRAKE II INTELINK DUBLIN CORE GILS
Agency  (M) IL.agency  (R) - OriginatorDepartment
Title  (M) IL.title  (R) TITLE Title (M)
Author  (M) IL.poc  (R) CREATOR Originator (M)
Keywords  (M) IL.keyword  (R) SUBJECT -
Abstract  (M) IL.summary  (O) DESCRIPTION Abstract
EffectiveDate  (M) IL.pubdate  (A) DATE DateofPublication
ExpiryDate  (M) IL.cutdate  (R) - -
SecurityClassification  (M) IL.secur.classif  (R) - -
SecurityCaveat  (M) IL.secur.ctrl  (R) - -
SecurityCompartment  (F) IL.secur.compartment  (F) - AccessConstraints
classification  (O) - TYPE Subjects
ApprovalAuthority  (O) - PUBLISHER -
PID  (O) - SOURCE RecordSource (M)
Language  (O) - LANGUAGE LanguageofRecord (M)
Last-Modified  (A) - - DateofLastModification (M)
Description (A) - DESCRIPTION Note
DocumentID  (O or A) IL.docid  (A) IDENTIFIER OriginalControlIdentifier
PublishDate  (O or A) IL.postdate (R) - CreateDate
- IL.country  (A) - -
- IL.format  (A) FORMAT Medium
- IL.secur.declasson  (A) - -
- IL.secur.dissem  (A) COVERAGE -
- IL.secur.relto  (A) RELATION -
- IL.subcode  (A) - -
URL (A) IL.url  (A) - Linkage
- IL.secur.other  (O) - -
Expires (O) IL.validtil  (O) - TimePeriodTextual
- IL.coordinates  (F) - -
- IL.itype  (F) - -
- IL.secur.warning  (F) - -
- - RIGHTS -
- - CONTRIBUTORS Contributor
- - - LanguageofResource (M)
- - - PlaceofPublication
Robots (O) - - -
 M - Mandatory                                R - Required                                                                               M - Mandatory
 O - Optional                                    A - Required if Applicable
 A - Automatic                                  O - Optional
 F - Future                                        F - Future

Note:  Guidelines for Intelink Metadata, Version 1.0 July 1997 was used for content and format consistency.
Note:  GILS contains over 100 separate metadata tags, equivalent to the USMARC tagging standard. ^
Note:  The following Metadata tag may be generated with either an HTTP-EQUIV or NAME attribute: Expires.  If using Netscape Composer the Metadata tags: Title, Author, Description, Keywords and Classification can be entered within the Format, Properties, General folder or the META Tags folder.

^
 

6.0 Mandrake II Metadata Tag Definitions

The following table identifies and describes the specific metadata tags to be incorporated onto all Mandrake II sites.
 
Custom Metadata Tag                                      Description
Agency agency acronym, used for identification and HTML title creation
Title actual document title, displayed on index page, in document template header and/or What's New
Author original document author, responsibility area, or the point of contact
Keywords words or phrases that describe overall content, main topic or idea of document, comma delimited. If not specified, will be automatically generated by web server 
Abstract synopsis of document, usually taken from document summary paragraph
EffectiveDate date that document was written or published, displayed on index, What's New & within document
ExpiryDate date when the document information is no longer valid or for compatibility with Intelink to provide the cutoff date, use the value of "unknown" if the date is not known
SecurityClassification security classification of document for display and HTML title creation
SecurityCaveat security caveat if applicable, also for compatibility with Intelink to provide control codes use default value of "none"
SecurityCompartment for future implementation of document access restriction
Classification allows the actual document to be placed into a specific category or sub category with other similar report types. Used within COMPASS server to restrict actual search to a specific category or ability to search for similar documents within same category.
ApprovalAuthority used to signify document publisher who approved document publication onto the web site
PID Product IDentification code for compatibility with the other agencies on Mandrake II - origin or source of document 
Language specifies the actual language of the document
Last-Modified automatically generated to indicate the date that the document was last revised
Description if not explicitly specified, will be automatically generated by server, consisting of first 20-30 words after <BODY> tag
DocumentID may be supplied by web publishing package to provide a unique database record number or identifier
PublishDate the actual date that the document was published to the web site, can optionally include the time stamp as well
URL the full HTTP address for the document object, automatically captured by the server and searchable by COMPASS advanced
Expires specifies the date when the document object's link is removed from the search index.  This does not remove the document from the web server.
Robots specifies that this document and optionally all that follow in the directory, will not be indexed

^
 

7.0 Mandrake II Metadata Tag Thesaurus

The following table identifies all mandatory, optional and automatic metadata tags as proposed for usage within Mandrake II.  These metadata tags are stored within the HTML document, between the HTML <HEAD>.....</HEAD> tags.  The example is a suggested representation and should be followed to ensure consistency within each site.  Where compatibility is required for external document exchange, the recommended format will be indicated as well as the recommended Mandrake II format.

Agency (M)

The Agency metadata tag is used to provide an acronym for the Department, organization or agency responsible for publishing the web document.  The organizational hierarchy can be provided following the approved acronym if required.  To maintain compatibility with Intelink, it is recommended that the full name be entered, optionally followed by an acronym (in parentheses).

Examples of usage:

(The following example illustrates a product originated by the Canadian Security Intelligence Service, Requirements and Analysis Production unit.)

<META NAME="Agency" CONTENT="CSIS/RAP">

Approved values:   CSE, CSIS, DFAIT, DND, PCO and the French equivalents CST, SCRS, MAECI, MDN, BCP

(The following example illustrates a product originated by the Department of National Defence and shared with Intelink)

<META NAME="Agency" CONTENT="Department of National Defence (DND)">
^

Title (M)

The Title metadata tag is used to provide the official or actual title of the web document.  Do not include additional information such as document numbers, series or date of publication unless it is considered to be an integral part of the title.  In the absence of a title the subject may be used.  The information in this metadata tag may contain the same information as the HTML <title> tag, which is still required.  The HTML <title> tag itself can be a concatenation of various metadata tags, such as Agency, Security Classification, SecurityCaveat and Title, so that this information will appear in each document screen as well as appear on each printed output page.  The Title can optionally contain a security marker abbreviation if the title is a different classification level than the document itself.

Examples of usage:

(The following example illustrates a Title metadata tag for the IAC home page)

<META NAME="Title" CONTENT="Intelligence Assessment Committee Home Page">

(The following example illustrates a Title metadata tag where the classification of the title has a security classification that is different from the document)

<META NAME="Title" CONTENT="Exploitation of Non-Governmental Organizations by Terrorist Groups (C)">
^

Author (M)

The Author metadata tag is used to identify either the author, the responsibility centre or the point of contact for the information published.  The contents of this metadata tag should be an official point of contact within each agency, capable of responding to additional questions or be able to contact the actual publication author.  The Author metadata tag can be an actual name, a responsibility centre, a group of names (separated by commas), an identifier or a telephone number.

Examples of usage:

(The following example illustrates an Author metadata tag for an individual)

<META NAME="Author" CONTENT="John Doe">

(The following example illustrates an Author metadata tag for an individual and responsibility centre)

META NAME="Author" CONTENT="John Doe, RAP">

(The following example illustrates an Author metadata tag for a responsibility centre, including the E-mail address and telephone number)

META NAME="Author" CONTENT="RAP, JDoe@CSIS, 842-1047">
^

Keywords (M)

The metadata tag Keywords is used to identify a single keyword or series of keywords, that describe the overall content of the document, or the main theme of the document, the value of which may or may not be present in the actual document text.  Keywords can then be used to help in the retrieval of the information, linking similar documents in the resulting document search lists.

Examples of usage:

(The following example illustrates a Keywords metadata tag consisting of several elements)

<META NAME="Keywords" CONTENT="extremists, terrorist, fundraising, covert affiliation">

(The following example illustrates a single Keywords metadata tag)

META NAME="Keywords" CONTENT="Y2K">
^
 

Abstract (M)

The Abstract metadata tag is used to provide a short description or summary of the document it represents.  It is recommended that the contents be restricted to a paragraph in length and be less that 1024 characters in size.  The Abstract should not contain any HTML control characters, such as the quotation mark.  The Abstract contents, as with any other metadata tag, can then be used in the resulting document search lists, to help the analyst decide if this document satisfies their search criteria.  If the Abstract is considered classified and will appear in a search result list, then it is recommended that it be terminated with a security classification marking.

Examples of usage:

(The following example illustrates an Abstract metadata tag consisting of summary information and a security classification marking)

<META NAME="Abstract" CONTENT="(Information effective 31 July 1996) There is a growing cadre of . . . affiliation to such legitimate aid organizations. (S)">

(The following example illustrates a referential Abstract metadata tag)

META NAME="Abstract" CONTENT="For further information, please refer to the Government Security Policy">
^

EffectiveDate (M)

The EffectiveDate metadata tag is used to describe the actual publication date of the source document, not the date that the document was published or posted to the web site.  For subsequent search purposes, it is recommended that there be no separation characters between the year, month and day.  The date is specified as: CCYYMMDD.

It is optional to include a time stamp immediately following the date element, specified as:  HH:MM:SS, where SS and the colon [:] separator are optional Examples of usage:

(The following example illustrates an EffectiveDate metadata tag of July 1997 with no day specified)

<META NAME="EffectiveDate" CONTENT="199707">

(The following example illustrates an EffectiveDate metadata tag for January 6, 2000, including a time stamp of 4:15 PM)

META NAME="EffectiveDate" CONTENT="20000106 16:15">
^

ExpiryDate (M)

The ExpiryDate metadata tag is used to describe the actual date when the information is no longer valid.  For compatibility with Intelink, it is recommended that there be no separation characters between the year, month and day.  If there is no known ExpiryDate, then the value of the metadata tag will contain the string unknown.  Although close in meaning, the ExpiryDate is is not identical to the metadata tag Expires.  The date is specified as: CCYYMMDD.

It is optional to include a time stamp immediately following the date element, specified as:  HH:MM:SS, where SS and the colon [:] separator are optional Examples of usage:

(The following example illustrates an ExpiryDate metadata tag of September 2003 with no day specified)

<META NAME="ExpiryDate" CONTENT="200309">

(The following example illustrates an ExpiryDate metadata tag for December 15, 2000, including a time stamp of 1 minute before midnight)

META NAME="ExpiryDate" CONTENT="20001215 23:59">

(The following example illustrates a situation where the ExpiryDate is not known at the time the document is published)

META NAME="ExpiryDate" CONTENT="unknown">
^

SecurityClassification (M)

The SecurityClassification metadata tag is used to describe the highest security classification of the published document.  The following table outlines the approved values for SecurityClassification in both official languages.
 
 

SecurityClassification (English)
SecurityClassification
(Français)
UNCLASSIFIED NON CLASSIFIÉ
CONFIDENTIAL CONFIDENTIEL
SECRET SECRET
TOP SECRET TRÈS SECRET

Examples of usage:

(The following example illustrates a SecurityClassification metadata tag for a document that is classified Top Secret)

<META NAME="SecurityClassification" CONTENT="TOP SECRET">

(The following example illustrates a SecurityClassification metadata tag for a French document that is classified Confidentiel)

<META NAME="SecurityClassification" CONTENT="CONFIDENTIEL">
^

SecurityCaveat (M)

The SecurityCaveat metadata tag is used to describe the COMINT Channel security control information for the published document, if applicable.  This value may contain several values, separated by a comma.  The following table outlines the approved values for SecurityCaveat in both official languages.  CEO (ou CSS) is an acceptable abbreviation for Canadian Eyes Only.
 
 
 

SecurityClassification
(English)
SecurityCaveat
(English)
SecurityClassification
(Français)
SecurityCaveat
(Français)
UNCLASSIFIED none NON CLASSIFIÉ none
CONFIDENTIAL Canadian Eyes Only (CEO) CONFIDENTIEL Citoyens canadiens seulement (CCS)

Note:  For GOL metadata document, several values were omitted from the preceeding table to reduce security classification of this document

Examples of usage:

(The following example illustrates a SecurityCaveat metadata tag for a document that is classified Top Secret, with no security control)

<META NAME="SecurityClassification" CONTENT="TOP SECRET">
<META NAME="SecurityCaveat" CONTENT="none">

(The following example illustrates a SecurityCaveat metadata tag for a French document that is classified Confidentiel, with a security control of Citoyens canadiens seulement)

<META NAME="SecurityClassification" CONTENT="CONFIDENTIEL">
<META NAME="SecurityCaveat" CONTENT="Citoyens canadiens seulement">
^

SecurityCompartment (F)

The SecurityCompartment metadata tag is reserved for future use pending security policies allowing the storage of codeword information within Mandrake II and the ability to restrict access using a need-to-know mechanism.

Example of usage:

(The following example illustrates a SecurityCompartment metadata tag for a document that XXX codeword material)

<META NAME="SecurityCompartment" CONTENT="XXX">
^

Classification (O)

The Classification metadata tag is a means of specifying a category or sub-category in which documents with a similar relationship can be grouped, regardless of their physical location on the server.  Classification can also indicate a hierarchical relationship such that only documents from this category may be searched and that once located, can be sequentially browsed.   The primary category is the value of the first referenced classification element, with related sub categories separated by the colon [:].  A document may be indexed in more than one category or sub-category, providing the metadata Classification is specified using the semi-colon [;] separator in the content field.  Category (Classification) is a function provided within the Advanced Search feature of the Netscape Compass server.

Examples of usage:

(The following hierarchical structure illustrates a Classification metadata tag that defines a search architecture category)

MANDRAKE II Web Site Structure

CSIS        Reports
                    National Security
                    Public Safety
                Studies
                Current Intelligence Briefs
                Commentaries
                Other Publications

<META NAME="Classification" CONTENT="Reports:National Security">
<META NAME="Classification" CONTENT="Reports:Public Safety">
<META NAME="Classification" CONTENT="What's New">
 

DFAIT     New Products
                 Interview Reports
                   . . .

<META NAME="Classification" CONTENT="New Products">

Note:  For GOL document, several examples were omitted to reduce the security classification of this document
^

ApprovalAuthority(O)

The ApprovalAuthority metadata tag is used to indicate the name of the individual or responsibility area that approved the publication of this document.  The web publishing package may require that this metadata tag be authenticated prior to allowing the document to be published onto the web server.

Example of usage:

(The following example illustrates the use of the ApprovalAuthority metadata tag for a document that was authorized for publication by Jane Bain)

<META NAME="ApprovalAuthority" CONTENT="Jane Bain">
^

PID (O)

The PID (Product IDentification code) metadata tag is used to indicate the respective product series of an originating document in a similar manner as a keyword in a search query.  The PID metadata tag may be used either to include or exclude a series of documents from a search operation.  In the example (Russia* AND Yeltsin) AND (*PCOcurrentevents) the search command will locate all documents that contain the words Russia* and Yeltsin within the PCO Current Events collection.   The format of the PID metadata tag is XXXyyyyyy, where XXX is one of  the acronyms CSE, CSIS, DFAIT, DND, PCO and the French equivalents CST, SCRS, MAECI, MDN, BCP, followed by the product series, usually the product source in small letters with no imbedded spaces.

Example of usage:

(The following example illustrates the use of the PID metadata tag for a document that originated from the Korean Herald on the PCO web server)

<META NAME="PID" CONTENT="PCOkoreanherald">
^

Language (O)

The Language metadata tag is specified as a paired value to indicate the natural language and dialect of the published document.  The values of English or EN, Français ou FR and Bilingual or BI are the recommended values, paired with the hypenated 2 letter country code; CA to represent the Canadian dialect.  This format also provides a mechanism to identify documents stored in another language, with an appropriate Language metadata tag.

Examples of usage:

(The following example illustrates the use of the Language metadata tag for an English document)

<META NAME="Language" CONTENT="EN-CA">

(The following example illustrates the use of the Language metadata tag for a bilingual document)

<META NAME="Language" CONTENT="Bilingual-CA">
^

Last-Modified (A)

The Last-Modified value is automatically generated by the Netscape web server whenever a document is added or changed on the web server.  The stored value for Last-Modified is not generated as a <META> tag but rather assumes the date format and value as defined by the web server's Network Operating System.  It is recommended that this field be defined on the server in a CCYY/MM/DD format.  The field Last-Modified is used as a search argument in an advanced search statement pull-down box.  For example; the expression "[the Last-Modified] [is before] 2001" will select all documents on the web server that were Last-Modified in the year 2000 or before.  If a metadata tag of Last-Modified (<META NAME="Last-Modified" CONTENT="unknown">) is specified within the document, then it will override the automatic generation of this field and is therefore not recommended.
^

Description (A)

The Description value is automatically generated by the Netscape web server.  The stored value for Description is not generated as a <META> tag but rather assumes a value taken from the first 20-30 words encountered after the document's <BODY> tag.  The field Description is used as a search argument in an advanced search statement pull-down box.  For example; the expression "[the Description] [begins with] UNCLASSIFIED" will select all documents on the web server that start with the value UNCLASSIFIED.  If a metadata tag of Description (<META NAME="Description" CONTENT="blah blah blah"> is specified within the document, then it will override the automatic generation of this field and is therefore not recommended.
^

DocumentID (O or A)

The DocumentID metadata tag may be automatically generated by the web publishing application program to maintain a reference with an interim document publishing database.  This field may be used to retrieve or delete the web document and generally is not modified in order to maintain integrity.

Example of usage:

(The following example illustrates the automatic generation on the CSIS web server of the DocumentID field and assigned a value of 555)

<META NAME="DocumentID" CONTENT="555">
^

PublishDate (O or A)

The PublishDate metadata tag may be automatically generated by the web publishing application program or may be entered by the person who is submitting the document for publication on the web server.  This PublishDate metadata tag may be used to retrieve or delete the web document, or be displayed on the published web document.  It should assume the date format and value as defined by the web server's Network Operating System.  It is recommended that this field be defined on the server in a CCYY/MM/DD format.

Example of usage:

(The following example illustrates the automatic generation of the PublishDate metadata tag and is assigned a value of 2000/02/29)

<META NAME="PublishDate" CONTENT="2000/02/29">
^

URL (A)

The value generated for the URL element is created automatically by the Netscape web server whenever a document is added or manipulated on the web server.  This is also the value used to retrieve a web document in the Advanced Search feature of Compass and can be used to display the actual web URL value in the search results list.  For example; the Advanced Search expression "[the URL] [contains] /fr/menu" will select all documents on the web server that contain the partial string "/fr/menu".  This feature may be a useful tool to inspect the contents of a web server from a client browser.  If a metadata tag of URL (<META NAME="URL" CONTENT="unknown">) is specified within the document, then it will override the automatic generation of this field and is therefore not recommended.
^

Expires (O)

The Expires HTTP equivalent metadata tag is used to indicate to the Netscape Compass server that the document's index entry should be removed from the web server.  This HTTP equivalent metadata tag in itself does not cause the actual web document to be deleted from where it resides.   When used in conjunction with a web publishing product, the Expires HTTP equivalent metadata tag will ensure that both the search index entry and the actual document are removed from the web server simultaneously.  The Expires HTTP equivalent metadata tag should assume the date format and value as defined by the web server's Network Operating System.  It is recommended that this field be defined on the server in a CCYY/MM/DD format.

Care should be exercised in using this HTTP equivalent metadata tag.  Once the expiration date that was specified has passed, the document will be removed from the index the next time the administrator purges expired documents. If a document is manually removed from a server before the document's expiration date, then the server will not automatically remove the document from its index, resulting in a 404 Not Found message if the browser tries to view the missing document.

Example of usage:

(The following example illustrates the use of the Expires HTTP-EQUIV metadata tag and is assigned a value of 2004/06/01)

<META HTTP-EQUIV="Expires" CONTENT="2004/06/01">  and/or
<META NAME="Expires" CONTENT="2004/06/01">
^

Robots (O)

The Robots metadata tag may be used to specifically exclude this document, or a range of subbordinate documents from the search engine's index.  This is useful for excluding navigational, index, menu and frame and frameset HTML documents.  The Robots metadata tag may also be used to exclude duplicate HTML documents from a site, as well as MIME types that cannot be handled by either a plugin or helper application.

Examples of usage:

(The following example illustrates the use of Robots metadata tag to exclude a single HTML document from the search engine's index)

<META NAME="Robots" CONTENT="noindex">

(The following example illustrates the use of Robots metadata tag to exclude all HTML documents, subordinate to this location {folder / sub-directory} from being indexed by the search engine)

<META NAME="Robots" CONTENT="noindex, nofollow">
^

8.0 Conclusion and Recommendations

Metadata tags continue to provide an effective and industry recognized way to capture additional information in a web environment.  The subset of metadata tags will provide information that ensures consistency in displaying web documents within the Mandrake extranet.  In addition, to satisfying the needs of the Canadian Intelligence community, all of the "required" Intelink metadata tags have an equivalent field, for possible future compatibility.  Adoption of a standard metadata implementation strategy will allow Departmental search index collections to be combined and allow the client to search one, many or all web sites simulataneously, using the common metadata field decriptions.

It is recommended that the other Agencies and Departments involved with Mandrake II, review this paper and consider adopting the identified metadata tags, whenever possible.  This will improve the searching on the site, regardless of the technology used in conjunction with the search engine.

Cameron D. Price
Mandrake II Project Manager
CSIS   842-1047      PriceC@smtp.gc.ca ^
^

Appendix A - Mandrake II MetaSet Examples

The following examples illustrate the functionality of the MetaSet application, developed at CSIS and written in ColdFusion.  At CSIS, MetaSet has been integrated into the existing web document submission and publication process.  The MetaSet application has been adopted by several other Mandrake II web content management environments.

Figure 1:  MetaSet Application Login Screen


 

Figure 2:  MetaSet Application Screen


 

Figure 3:  View Sample Document submit.htm Screen


 

Figure 4:  View Document Source submit.htm Screen (note existing metadata)


 

Figure 5:  MetaSet Application Data Entry Screen


 

Figure 6:  View Document Source submit.htm Screen (after MetaSet)

^
 

Appendix B - Java Metadata Creation Tool

The following example is a simple mechanism where the HTML metadata element tags can be
generated, then cut and pasted into the Agency's web document.

Government of Canada

Web Metadata Tag Builder
This web tag builder provides a subset of the full GILS record.
More information can be found at the GILS web site.

Title:

Originator:

Author: (Separate multiple authors with a semi-colon)
Date of Publication:(yyyymmdd) Language of Resource: (Repeat for multilingual resources.)
(ISO 3 char. code.) 
Description:

Keywords: (Separate multiple keywords with a semi-colon)
Language of Record: (Language of web page.)
Modification Date:



This resource uses client side javascript and is functional offline. ^            UNCLASSIFIED