Grace Agnew, Rutgers University Libraries (and NJ Digital Highway)
She has been in the metadata game since it started, and wrote the first book on the subject. She's been involved in OAI, etc.
Metadata: describes data
Where does it come from? Can be added by hand, can have it auto generated or auto harvested.
Audience: Almost everyone – end user, metadata creator/manager, computer applications or programs.
For example, at the NJDH they can ask organizations if they want NJDM watermarks on material printed out – it is not on the object itself, and the organization hosting the document has the choice if they want to have it or not.
Metadata can be posted in the document, linked to the document, or some kind of hybrid (not unlike CSS). Putting the information in the header can be bulky and inaccurate; if you link it you can point it to a database that will update UTC, etc.
There are commercial and open source models. The commercial costs money obviously, but the open source is not really monitored and standardized. But because it is open source it is flexible.
D-Space is an opens ource turnkey model. Because it is on the business model it is not as flexible as one might expect from open source.
Fedora – developed by Cornell and UVA – powerful, flexible, combines objects, metadata and behaviors in a modular fashion.
-- Supports the information context of data users (current and future) – that is, context independence.
Agnew is working with the National Earthquake Engineering System (NEES). They didn't want to start with a data model as they assumed they had on already. She demurred, demonstrating that their data model was not flexible. The contextualized model that NEES started with was systems oriented, it started with the systems. Her contextualized model puts data at the center, around which are located various user communities.
Moves into IFLA models – structure of information
|Novel||Paper||Copy at Blockbuster|
|Script||Reel of film|
Shareable across repositories
Can be mapped to other schema (re-purpose it)
Maintained by standards body for durability
“Namespace” used to document the XML schema that defines and validates the metadata schema.
Versions distinguished by number and/or date
She likes to start out with the data model and uses whatever schema is out there.
Metadata schema components
Data element – community defined
Attribute – refines, extends, interprets data element
Value – information unique to each data instance
Constraint – order imposed on elements
Label -- -- contextual instance of data element name
XML – describes data (not unlike metadata itself!)
As I said, she uses any schema that seems interesting, but beginners should start with one and use it.
Types of metadata
Structural metadata – Structured relationship between components – allows a user to browser for chapters, etc.
Meta metadata – describes and manages the metadata record – who created it, when, for what reason
Administrative metadata – official records, rights of access, etc.
Digital provenance -- change in version, audit trail, etc.
Technical metadata – file size, duration, encoding, etc.
Descriptive – find, identify, select, obtain. --> all four elements necessary.
File encoding and Transport
METS -- Metadata Encoding and Transmission Standard -- LOC -- used by NJDH
File selection, structural map, structural links, behaviors
METS is fairly new and the NJDH is one of the first full implementations.
See diagram on Pg. 10 of the first handout for NJDH data model.
Initial goals for Metadata --
Enable discovery access to information
Preserve information for discovery and access to future users
End of first part of presentation
She describes learning about metadata as an iterative process – that is, it might not make a lot of sense now, but sooner or later the shoe will drop. Much like we spoke of in the early part of the HIB class.
Descriptive metadata Schema examples:
IEEE Learning Object Metadata
MODS (LOC) – NJDH uses it. She says there are not any great schema out there, but this is as good as it comes.
MPEG-7 Multimedia Description Interface
She is pretty down on DC – she said you'd never want to submit a proposal saying you'd be using DC, but that the schema you'd be using is expressible in DC. Good portability but lacks in flexibility
MODS is pretty flexible – it's really built from a base of MARC (so a librarian likes it – duuuuhhh!!).
MPEG-7 is new and not many people really use it – she claims to be one of it's only real proponents, and suggests that it is finally gaining traction. Describes textual and color attributes. She says the XML is horrible – took her a year to learn it: four months to actually get down to brass tacks preceded by eight months of trying to find anything else to do with her time!
Vocabularies must be controlled – don't let students come up with tags of their own; won't come out well.
Remember, the other goal of metadata is preservation for future generations. Brings us to the technical aspects of metadata.
Key issues of preservation
see Gladney and Bennett What do we mean by authentic? Http://www.dlib.org/dlib/july03/gladney/0
What is a digital master file? Not Word or Word Perfect. Maybe a canonical master in PDF form?
NISO Technical Metadata for Digital Still Images (Z39.87-2002)
PREMIS Preservation Metadata
Fair use, rights:
See the Mary Minow website about rights. Of course there is always the Creative Commons solution.
For examples of metadata in use, see her Moving Images Collections site at UGA (moving to LOC soon enough). They have links to MIC XML, MARC XML, MPEG7 XML DC XML and the original record.