Last week I attended the NISO "Managing Electronic Collections" workshop. I spoke about our Digital Library Repository implementation, and was gratified to have a number of people ask me questions over the course of the two days that I was there. One question really struck me -- "What is the most important thing that you learned in your process that we should take into account in our project?"
It could almost be a one word answer: metadata.
Of course it's a more complex answer than that. What metadata do you need to capture? Technical, preservation, administrative, descriptive? In what format? What's the minimum? We have experimented a lot in this area, and there has been a certain amount of "lather, rinse, repeat" as we've refined our metadata. In some cases, encoding standards have changed so mappings had to change. Or workflow tools have changed, requiring review of what metadata we can automatically capture, and in what form. Or standards have developed, such as those for the preservation or rights, so we need to review what we're capturing.
One of the most significant change agents has been evolving end-user services. Why? Because you can't support functionality and services (and often usability) if the needed metadata isn't there, or is in the wrong form. Having an extensible architeture is vital. Identifying standards to be used, and having production workflows that can process appropriate content in a timely fashion is key. But really, it's all about the metadata.
Ex: We want to be able to support sorting and grouping of search results by creator or title, which is easier if there are pre-generated sort names and sort titles (doing it on the fly takes a lot of processor overhead).
Ex: We want to create aggregation objects that bring together multi-volume series or issues in a serial title, which is easier if you have the most complete enumeration possible and identify scope to as granular as level as possible (e.g., volume, issue, article).
Ex: We want to supported faceted subject navigation, which is easier if the subjects terms are broken out in a granular way from their post-coordinated forms, such as identifying geographic vs. topical vs. temporal parts in the subject.
Each of these requires a change to our DTD and/or the patterns of our encoding, and, sometimes requires us to regenerate the metadata from the originals sources. But each time we both better document the objects and improve the services and the interface that we provide, so it's worth it.
If you're interested in what we've delved into so far:
http://www.lib.virginia.edu/digital/metadata/