MCP: The Metadata Collection Parser

MCP: The Metadata Collection Parser

The Metadata Collection Parser is a Perl script that takes a collection of GIS metadata that might span several directories and subdirectories and creates a nifty hypertext catalog with various indexes, etc. It was written by Paul Cote GIS Specialist at the Harvard Graduate School of Design.

For an example of the metadata catalog produced, click here

The reason MCP was created was to provide a way to catalog and recatalog geographic data easily, though it may be rearranged in different directory structures, or put onto CD. One big advantage of the catalogs produced, though they have many cross-referenced indices, they are nothing but hypertext which can be burned onto CD and perused with any hypertext browser.

The source for most of what appears in the catalog comes from mp-complient metadata files that are associated with each dataset in the collection, and index.htm files that are html-format readme files explaining the content of each subdirectory. MP does a lot of work for MCP, checking the formatting of the metadata files and generating html reports.

The cataloger parses each mp-complient file, and then creates the following:

In the midst of doing this, the cataloger calls on MP to create its comprehensive HTML format metadata, and a listing of any errors that it found in the metadata.

Architecture and Dependencies

There are a few things you need to know, and a few resources you need to have before you can exploit MCP:

Downloading and learning MCP

You can download a tar-gzipped distribution of mcp by clicking here. THis distribution has been prototyped on unix, and may need some alterations to run on other platforms. It will unpack as a directory named mcp. This directory contains a file list that you probably will want to read.

It easy to start learning MCP by running it against the sample metadata collection contained in the sample_md_tree directory. There are three things you will undoutedly need to change to make catalog.pl work on your system before running it on the sample data:

  1. change the perl path at the first line of catalog.pl
  2. change the $refdir for your collection in sample.conf to be the full system path of the sample_md_tree directory.
  3. change the entry for $mp_path in the sample.conf file to point to the system path for your version of mp.

Before you run catalog.pl the first time, you should look through the sample_md_tree directory and take a look at what is in there. catalog.pl will create a bunch of other files, and you will understand the process better if you see the 'before picture.' If you are reading this too late, you can always unpack a new sample_md_tree from your tgz archive.

Now you should be able to run catalog.pl with the single argument being the name of the configuration file:

catalog.pl sample.conf

Now that you have seen what it does, I will leave it as an exercise for you to read sample.conf, and the various readme.htm and index.htm files in the sample directory to figure out their roles. You should then be able to run catalog.pl on your own directories.

A couple of related, maybe useful, not well documented arcview extensions: