|
A Primer to Preserving and Accessing Digital Media at
Dartmouth
A white paper by Jay Collier, Dartmouth Web Publishing
Services
Summary: A growing portion of our educational
heritage is being lost every day as digital assets disappear in disorganized
storage or cannot be opened due to obsolete file formats. This is especially
true for the storage and distribution of integrated media assets — comprising
video, audio, and images created by faculty, staff, and students — which are
significantly larger and more complex than digital documents and records. In
addition, with the radical simplification of desktop media software, the volume
of produced media assets will increase exponentially in coming years. The best
way to protect these assets is to mandate campuswide storage and classification
standards for all producers — both casual and professional — and to commit
resources for a robust media asset management architecture that can integrate a
variety of preservation, classification, access, and distribution systems. By
assuring a sustainable workflow throughout the digital media life cycle, we can
prevent the loss of these valuable assets for future generations.
1. How We Got Here
Fifty years from now, historians and archivists may see the turn of this
millennium as a dark age for recorded knowledge. The explosion of digital
information technologies for the masses — easing the creation and the
distribution of words, pictures, and videos — has not been matched by a
parallel recognition of the need for preservation and access. As a result, our
online knowledge is being lost at an unprecedented rate.
The Library of Congress Digital Preservation initiative
(formally called the National Digital Information Infrastructure and
Preservation Program), funded by $98 million from the U.S. Congress, has begun
to support the preservation of digital assets, focusing first on the major
producers in education. With a first round of grants totaling $14 million,
recipients are attempting to identify universal processes that can be used to
model more sustainable preservation and retrieval processes.
More recently, the Library of Congress announced that it is
creating a World
Digital Library with funding support from Google.
One impediment to managing this content has been the variety of creation,
storage, and distribution methods, both proprietary and standard, as well as
the challenge of integrating processes in various fields. Systems just don't
always play nicely together. Texts are stored in .doc files, spreadsheets as
.xls, movies within .avi movies. Files are shared via e-mail, on the Web, and
on file sharing servers. Some are printed, some are read on-screen, and some
are viewed on video monitors.
Even if the asset is stored in a format that allows for sustainable
preservation, another impediment has been the lack of standards for storing
information about the asset, called metadata. Many files are put away, on a
hard drive or on a CD, with no information about who created them, or
classification of subject matter. A stored asset that cannot be found and
identified is still a lost asset. This would be as if Dartmouth had no library
and all books and journals were stored on office bookshelves.
2. The Digital Asset Life Cycle
In its simplest form, the complete life cycle of a digital asset involves:
creating, annotating, and storing a digital representation of an intended
message. This, for example, is what happens when you create a written report
and save it on your computer. You may or may not add information to the asset —
such as author and audience — but some of this metadata is saved by your
computer, anyway, such as creation date and format.

Figure 1.
When you want to revise the document, you first need to find the original
(sometimes, quite a challenge in itself!), open it, revise it, annotate and
save it again.

Figure 2.
It is the information about your asset, both the original and the revision,
that improves preservation, management, and future access. Using standard
methods for recording that metadata, no matter how simple, can greatly increase
success.
So, the loop of creating, storing, revising, and storing assets is common to
all digital assets. Tracking the people and processes in that loop, once the
asset is created, is called Digital Asset Management, or DAM.

Figure 3.
The workflow for each of those steps, however, is affected greatly by the
kind of asset. Is it enterprise data? Is it a business document? Is it a media
asset, like a photo or a movie? Such considerations factor into each phase of
the production process. In this paper, we will be focusing on media assets.
3. The Distinct Nature of Media Assets
Unlike enterprise data and business documents, media assets (audio, images,
and films and videos, which are also called moving images) have a distinct
flavor. For instance, they are often time based; they require large amounts of
storage and, therefore, special distribution methods are needed to provide
access. In addition, they often require new versions, called "derivatives,"
simply in order to be distributed to people on different kinds of Internet
connections, such as thumbnails or low-bandwidth video.
For these reasons, the media producer's workflow needs to be different from
an end user's workflow. Since the producer also needs to have a far deeper
access into the layers of a media asset (such as numerous audio clips and
graphics edited into a final video program), the access requirements are higher
than for the end user, who may simply be viewing or listening to the asset and,
perhaps, adding comments about it.

Figure 4.
The main distinction between these two workflows is that the version of the
asset accessed by the end user and delivered to their desktop computer is often
compressed, or smaller, than the original, so that it can be easily transferred
over a narrower Internet connection and over a longer distance. For example,
the original master studio recording of a musical performance will be much
larger than the compressed MP3 that is distributed to a listener who has
purchased the song through an Internet store. The common element is a central
repository that contains originals, revisions (or derivatives), and all the
information about them: what they contain, how they relate to each other, and
who can access them.

Figure 5.
We should say, here, that many different producers may contribute assets to
this storage repository, and many different end-users may receive derivatives
from the storage. Each of those relationships must also be tracked.

Figure 6.
For these reasons, owing to the distinct nature of its creation, storage,
and delivery, the management of media is considered a subfield of DAM, called
media asset management (MAM).
4. What's Not Covered Here
The focus of media asset management — and related areas such as digital
rights management and video-on-demand — is on the discrete binary file asset
itself, after it is created. Often, those files are subsequently combined in
various ways, through Web browsers, news readers, and media players. Those
presentation methods that integrate assets are in the realm of content
management, and are not covered here.
For example, podcasting is the simple process of placing a compressed audio
file on a Web server, then creating a text file that notifies subscribers that
it is available. After the asset is created, the compression and notification
steps are trivial — less than $30 for software and a small number of
keystrokes. Podcasting does not require formal asset management; it is
a publishing tool for the masses. Keeping track of that asset for future
findability and access, however, falls squarely into the metadata issues we're
considering.
Finally, media asset managers are interested, primarily, in the
preservation, protection, discovery, and distribution of pre-produced or
pre-recorded assets. Video or audio Webcasts of live events require very
different production and distribution methods and so are not included here.
Once they are recorded, however, those events do fall into the realm of asset
management.
5. "Playing Nice"
The first generation of asset management systems were monolithic; they
assumed that one vendor's hardware and software was sufficient to manage the
entire life cycle, from creation to delivery to revision. Customers soon
learned that there are too many different needs and skill sets to be
sustainably supported by any one company or organization.
Second generation asset management architectures now seek to define goals for
interoperability,
so that the highest level meta knowledge about a particular digital asset — who
made it, who needs it, who is allowed to use it, what makes it valuable — can
be shared amongst people and systems.
Applying open standards — to workflows, systems, and file formats — is an
investment in future compatibility. Keeping the simplicity of the user
experience foremost in mind at educational institutions helps faculty, staff,
and students participate actively in the preservation and dissemination of
their work.
6. Open Standards
So, how do we make sure that our assets are created and archived in
sustainable formats, stored easily and robustly, accessed quickly and
intuitively, and distributed and revised without losing track of their history,
among many different people, systems, and workflows? We need to identify open
standards for formatting and exchange of the actual material and its
metadata.
First, a review. At the heart of a universal media asset workflow model are
essence, derivatives (also called proxies), and metadata. Essence is the
original, uncompressed digital material. Derivatives are subsequent versions
that are compressed for delivery or changed to reflect new content. Metadata is
all the textual information about the asset: editorial, licensing,
classification, and technical.
Layers of a Media Asset
- Metadata
- Essence
- Derivatives (including proxies and revisions)
Figure 7.
In a universal model, all these "manifestations" of an asset are permanently
linked, and the relationship between assets layers at various times throughout
creation and distribution is modeled through standardized hierarchical
definitions called XML schemas. These are text-based descriptors that are
completely open, human readable, and collaboratively defined. By describing
workflows and relationships in this way, the full spectrum of common attributes
can be shared between systems.
Once standards are defined and accepted, individual systems can focus on
what they do best and hand off assets and metadata to other tools and systems,
as needed. In a Forrester Research report recommending this approach,
second-generation subsystems began appearing in 2003 as a counter to the
original model of one-size-fits-all enterprise asset management system. The
WGBH Educational Foundation, working with Sun, is currently demonstrating this
type of interoperable architecture for its intellectual property, including
video, audio, imagery, and transcripts.
Samples of Open Standards
- Metadata structure: Dublin Core, METS, IPTC, SCORM
- Metadata harvesting: OAI
- Data exchange: XML-RPC, SOAP, REST
- Image essence: TIFF, JPEG2000
- Moving image essence: MPEG4, MOTION-JPEG2000
- Audio essence: MP3, MP4-AAC
- Integrated multimedia instructions: SMIL
Figure 8.
Some processes in the workflow puzzle are not yet supported by open
standards-based systems. However, by adhering to standards whenever possible,
current needs can be met by current systems that will be more likely to "play
nice" with future systems.
7. User Scenarios
Students, faculty, and staff create and distribute huge numbers of media
assets — even if they are simply sharing a photo, movie, or sound file via
e-mail — and the volume is growing, as highlighted by the explosion of interest
in simple-to-implement photo sharing and podcasting services. Nevertheless, all
of these processes can be broken down into three types of primary workflows
stages: asset creation, storage, and distribution.
Primary Stages and Examples
Creation processes
- Capture the real event (image, audio, video).
- Create an abstract asset (illustration, synthesized audio, animation).
- Edit existing assets into new, integrated presentations.
Storage processes
- Store the original asset or derivative version on the desktop (essence or
proxy).
- Store the original asset or derivative version on an external server
(essence or proxy).
- Store information about the asset (metadata).
Distribution processes
- Deliver metadata about the asset to assist in the search and
selection.
- Deliver a low-resolution proxy (thumbnail or screen shot) of an asset to
assist in the selection.
- Deliver a derivative asset (.mpg or .mov for video, .jpg or .gif for
images).
- Deliver a copy of the original asset (essence).
- Use a copy of the original asset in the creation of a new asset.
Figure 9.
Let's look at three scenarios that could benefit from a systemic media asset
management perspective in more detail:
A. Scenario One: Image Galleries
Although solutions for image management have been around for many years,
they were generally large or custom-built — such as the Hood artifact photo
collection created for DCIS. Due to the distributed nature of desktop storage
and the typically small audience for each asset, creators often relied on
manual management of their own images.
As the costs of large workgroup and enterprise solutions has come down,
however, extensive feature sets — including robust metadata and digital rights
management — provide compelling reasons to explore such systems.
Little needs to be said about compression standards: The Library of Congress
defined standards for capture, essence, and derivatives more than five years
ago, and the Dartmouth Libraries have also released standards.
Maintaining standard metadata about those images, however, and retaining
linkages between essence and derivatives, is now starting to enter the general
lexicon, due to easy-to-use desktop image management tools like iPhoto and
Picasa. Moving to a larger system would be less of a disruption in workflow
than with other media assets.
The challenge is finding a tool that supports robust workflow (to allow
multiple to access and annotate assets), data exchange support (so other
systems can query metadata and acquire assets), and digital rights management
(to restrict access for searches and delivery to specific individuals and
groups).
B. Scenario Two: Video-on-Demand
The roots of video-on-demand (VoD) can be traced to early experiments in interactive
cable television in the late 1970s, and the CATV industry has developed
robust VoD systems over coaxial cable systems over the past decade.
Unlike signals propagated through the controlled environment of an analog
coaxial system, however, Internet-based video is a greater challenge.
Nevertheless, improved compression/decompression models, wider bandwidth, and
improved hardware processing are finally bringing IP VoD closer
to prime time.
The delivery of broadcast quality video (60 artifact-free fields at 720 x
480) is primarily possible only in controlled network environments. Even
digital video-on-demand services through cable TV services are still often of
significantly lower resolution than traditional analog signals, usually by as
much as half.
As IP video emerges, standards can be applied throughout the workflow.
Capture (of still and moving images and of audio) can happen more accurately
and then be stored without compression. Derivatives such as program masters can
be stored in systems that allow automated transcoding to lower-bandwidth
proxies for on-the-fly delivery on controlled and open networks. Transcripts
and other metadata can be searched so that portions of the assets gain value in
and of themselves.
Again, the challenge is to find systems that play nicely, through exchange
of metadata and essence, with other cataloguing, search, and delivery tools.
Supporting open standards is not a luxury; it is a requirement for success.
C. Scenario Three: Multimedia Lectures
Capturing and
delivering all of the synchronized sights and sounds presented during a
multimedia lecture is one of the more challenging scenarios to
manage: An instructor presents a lecture, with accompanying slides (analog or
digital) and/or movies, and a future student wishes to watch and listen to as
much of the experience as possible.
There are two methods for capturing and distributing such an event. The
first is to mix together multiple cameras, microphones, and audio inputs into a
single asset that then becomes available via video-on-demand. If the assets can
be combined into one file package, then the single file can be managed through
a media asset management system.
The other alternative is to record each element separately — video and audio
of the presenter, images and text from the presentation, and text outlines or
transcripts — and synchronize them on the fly, as they arrive at the desktop,
with a package of textual instructions. The capture costs for this process are
less than for the former, but synchronization and playback standards (such as
SMIL and M4B) are not yet universal.
Enhanced
podcasts or video podcasts are a promising possibility, but any proprietary
system (such as LiveStage and iShell, which deliver integrated QuickTime
assets) presents a steep learning curve.
The primary focus of media asset management is maintaining a standards-based
knowledge about the assets and derivatives created and distributed. In this
area, standards are evolving very quickly, so we would expect a shake-out in
the near future.
D. Other Scenarios
These are only three of the many media asset scenarios that professional
staff encounter on a regular basis. Others include:
- Sharing raw video with clients for comments, annotation, and/or
transcription.
- Editing a revised version of a promotional video and putting it on the
Web.
- Shooting and captioning images for a brochure and a Web site.
- Scanning and annotating slides and negatives.
- Creating and printing illustrations.
- Sharing restricted sets of media assets only with authorized users.
Each of these toolsets, workflows, and scenarios are changing often, and
solutions that are customized for any single combination risk being made
irrelevant. So how do we decide which to support, and how to make sure they can
be archived, preserved, and accessible in the future?
8. The Realms of Media Asset Management
An integrated media asset management initiative consists of five realms:
vision, standards, support, services, and systems.

Figure 10.
A. Vision and Architecture
The digital media life cycle is evolving so quickly that a clear vision —
one which considers a wide range of issues, from communications tools and
techniques, to psychology and sociology — is required to avoid as many
cul-de-sacs as possible. A high-level architecture must also integrate the
local reality, as well as emerging developments outside the institution. A
sustained, supported initiative to develop and document such a vision and
perspective requires specialties across the enterprise.
B. Standards and Best Practices
Defining open standards that consider the future integration workflow
solutions is the foundation for sustainable success. Much work is being done in
this area, with major theoretical contributions by information science
professionals, and Dartmouth could benefit from carefully-selected
partnerships.
The first level of partnership is amongst Dartmouth media professionals,
from strategy to production to delivery to cataloguing to archiving. Existing
standards (and knowledge of viable emerging standards) may be within the
portfolio of one group and not the others. Establishing a collaborative to pool
this knowledge will go a long way toward uncovering blind spots that could
affect future developments.
C. User Support Models
Long before larger systems are selected and deployed, asset producers
benefit from using standards and sharing knowledge within their current
workflow. By starting out on a sustainable path and sharing knowledge amongst
practitioners, early work could provide insight leading to success down the
road.

Figure 11. Sample
Support Workflow 2002
Centralized control at early stages is not desired, any more than
centralized stenographer pools survived the emergence of word processing.
Simple digital asset creation is becoming as easy as typing an e-mail. It is
the higher-level perspective — one that links strategy and tactics — that
benefits most from the experience of media production and archiving
professionals.
Several departments — including faculty, staff, and students — are already
engaged in this type of high-level work, and sharing knowledge about best
practices and common interests can benefit the entire institution. A digital
media collaborative that brings together producers and metadata experts,
storytellers, and researchers could make this possible. We need to create a
flourishing community that provides support amongst its members.
With an increased level of resource, Dartmouth could support more producers
and help them preserve and disseminate their digital work.
D. Custom Production Services
Once the explosion of desktop media production stabilizes, the demands on
media professionals at Dartmouth will grow, not only for production, but also
for coaching the high-level principles of telling a story with words, sounds,
pictures, and interaction.
The best analogy is the emergence of desktop publishing. At first, anyone
with a laser printer decided they were designers. Then, as the level of
awareness of design issues grew amongst desktop publishers, the respect for
sophisticated design also grew. Today, design is one of the factors that
distinguishes between otherwise similar products and services.
A parallel process may emerge with desktop media. At first, anyone will be
able to record a conversation and publish a podcast feed. As time emerges,
however, and tastes become more sophisticated, the demand for advanced media
production skills will grow.
The potential scope of anticipated services should be clarified, so that the
staff will be ready to triage the competing requests. Then, a strategic inquiry
into the value of intellectual property translated into media assets by
professional staff throughout the institution should be conducted.
E. Technical Systems Infrastructure
The role of enterprise systems in media asset management should follow a
deep understanding of current workflows, industry standards, future needs, and
the people who are doing this work. Otherwise, the danger looms of building
systems that will have to be torn down and rebuilt in the future.
Smaller workgroup solutions, deployed amongst the heaviest users, would
allow workflows to be tested and verified before rolling it out to a user group
that may be averse to change.
9. Getting There From Here
Each realm, workflow, and scenario requires staff and funding to grow. So
how do we identify the levels of resource that are needed?
One option is to consider a two-phase initial approach. The first would be
to commit the time and resource to define the big picture: The standard-based
architecture within which future implementations would evolve. The second would
be to develop a strategic plan that connects the institutional mission with the
benefits of enhancing specific scenarios and funding them.

Figure 12. Courtesy David McCarn, WGBH Chief
Technologist. Web site. PDF.
Here's an example. If standards for video production, storage, and
distribution were defined, then video-on-demand options could be evaluated for
sustainability and cost, in terms of funds and staff, over a longer period of
time. Without defined standards, no assurance can be made that a process
costing tens of thousands of dollars and hundreds of hours of staff time over
the first year of deployment would be in existence three years hence.
Also, a collaboration of communications professionals, media producers,
archivists, and metadata specialists could define those standards, but this is
a time-consuming process, and until there is the willpower to commit to those
resources and require subscription to a single set of standards, success can't
be assured.
With smaller content management solutions — such as Web log systems and RSS
feeds — essence and metadata can be extricated relatively easily should the
solution not be sustainable over time. Media asset management is different for
the reasons we've discussed. Decisions made now will ripple over the years. Now
is the time to build a sturdy foundation.
10. What's Next
It will be through a collaborative initiative — consisting of faculty,
staff, and students across the institution, supported at the highest levels —
that one of Dartmouth's primary assets — the preservation of knowledge through
digital media — will be saved for future generations.
Until we can work together, across boundaries, to preserve our educational
heritage, we will lose more of that heritage every day. Mike Murray, leader of
Dartmouth's Media Production Group, put it this way: "The need for
institutional digital asset management is inevitable. It's just a question of
when and how much water is behind the dam when the need arises."
Additional References
Digital Restrictions Management
News Coverage
Proposed Standards
|