Wednesday, October 08, 2008

DCC Curation Lifecycle Model

Via the Digital Curation Blog, I came across the DCC Curation Lifecycle Model. This is a very interesting high-level overview of the life cycle stages in digital curation efforts. There's an introductory article available.

The model proposes a generic set of sequential activities -- creating or receiving content, appraisal, ingest, preservation events, storage, etc. There are some decisions points at the appraisal and preservation event stages about next steps -- refusal, reappraisal, migration, etc. A colleague and I sat together and looked it over this afternoon. We were both looking at it from a perspective of a digital collections repository and not an IR, and the model was designed primarily with IRs in mind, so our thoughts are coming from a different place in terms of what we wanted to see additionally taken into account in the visualization.

There's a "transform" activity -- definitely something that takes place potentially multiple times in a data life cycle. In the visualization this appears sequentially after "store" and "access, use and reuse." This is an activity that's hard to include in a visualization of a sequence because it can take place at so many points, but it feels like it should be earlier in the sequence, perhaps before those two steps.

The next ring is labeled with the activities "curate" and "preserve" with arrows. Does the placement of the terms and arrows mean anything in relation to the outermost ring? Are "ingest," "preservation activity" and "store" part of "preserve" and the rest part of "curate?" Or does this more simply represent ongoing activities?

The center of the model is the data. It's surrounded by a ring for descriptive and presentation information. It's an activity of central importance and is directly related to the data as is shown, but we weren't sure how its placement related to the sequence of tasks in the visualization.

"Preservation planning" is the next ring out. Planning and implementation are a central, ongoing activity. We also weren't sure when this ongoing activity meshed with the sequence.

"Community watch and participation" is the last remaining inner ring. It's also on ongoing activity. What actions might the outcomes of this activity affect?

Overall, this is a good model for planning. It's challenging to create a visualization for complex processes and dependencies and this covers a lot of ground. And of course it's meant to be generic and high-level, to be made more concrete by an institution that makes use of it. It certainly stimulated our thinking in terms of how we might model our data life cycle and the dependencies between the various tasks.

NOTE: Sarah Higgins, who created the model, has provided excellent responses to my thoughts and questions in the comments to this post. Please read them!

2 comments:

Sarah Higgins said...

It’s great to hear that the DCC Curation Lifecycle Model is stimulating discussion about data lifecycle modelling, and identification of processes and dependencies for digital curation. This was one of the DCC’s primary aims when we developed it, so we’re pleased to hear your thoughts as you apply it to your own situation.

You rightly identify that it is both high level and generic so that organisations can overlay their own requirements. This can be done at different levels from high level overviews of the processes required, to the nitty gritty of which technologies will be used for each action, which standards will be used, which personnel to employ etc. The actions detailed are not prescriptive – individual organisations will find that some actions need to be added, deleted or moved to another point in the lifecycle.

I led the model’s development, so will try to answer some of the questions which arose from your discussions. Firstly I should say that the model was designed to be generic with no particular data curation discipline in mind – it should be equally applicable to IRs, digital collections, digital archives, electronic records management, eScience applications etc.

The model is best read from the inside ring to the outside ring – so that Data is the central subject of curation activity, with the Full Lifecycle Activities of “Description and Representation Information”, “Preservation Planning”, “Community Watch and Participation” and “Curate and Preserve” ongoing at every point in the lifecycle, while the Sequential Actions in the outside ring are reliant on previous actions being undertaken, for their success.

“Description and Representation Information” is included as a full lifecycle action because metadata may need to be added or amended when any of the Sequential Actions are undertaken, so for instance: the “Appraise and Select” action may include addition of metadata to explain the process and outcome of the appraisal; “Preservation Action” may include assigning specific preservation metadata. Similarly Representation Information may need to be collected and assigned at any point in the Lifecycle. When, what (and how) metadata or representation information is collected and assigned would be dependent on the low level lifecycle modelling of an individual organisation.

In the same way the actions of “Preservation Planning” and “Community Watch and Participation” have to be ongoing throughout the data lifecycle. The details of how these impact on the Sequential Actions would be identified for individual circumstances, with low level modelling.

“Curate and Preserve” are Full Lifecycle Actions – ongoing activities - which are also interdependent on each other. You need to curate data to be able to preserve it, when it is being preserved it also needs to be curated … they are not intended to relate only to the Full Lifecycle Actions which they are adjacent to in the diagram.

It is true that data transformations may take place at a number of stages in the data lifecycle, but the model assumes that the point of ingest into a managed enviroment, and the point where data is accessed, used and reused are the most significant. The “Transform “ action’s location in the sequential actions is very specific. It is not intended to refer to format migration for the purposes of preservation. This is covered by the “Migrate” occasional action. “Transform” refers to the reuse of data – sometimes for a completely different purpose and a completely different context, from that which was originally intended – and the subsequent creation of new data which itself has to be curated. Thus the sequential actions start again, with the new data created by the transformation.

Leslie Johnston said...

Sarah -- Thanks very much for taking the time to provide a clarification for me! I'm going to add a link to your comment in the posting to make sure that everyone reads it.