Dartmouth College Library Bulletin
THE BULLETIN MEETS THE FUTURE
About the Internet and the World Wide Web
The media are buzzing with terms like Information Superhighway, Internet, and World Wide Web (WWW). Electronic villages are springing up everywhere, complete with electronic malls. Mary Alice Williams, speaking for Nynex, is telling you how easy it will be for you to be connected. Magazine and television advertisements are including electronic mail (e-mail) and WWW addresses. Even comic strip creators include their electronic address. Unless you work with computers all day, you might wonder what all this means and why you should be interested. I won't go into details about all of these things, but what I can tell you is that libraries are participating in this electronic information explosion, because our mission is to make information accessible. An electronic version of the Dartmouth College Library Online Catalog has been available to anyone who has access to the Internet (or who has a telephone, computer, and modem) since 1985. Over the years we have added electronic versions of Bowker's Books in Print and abstracts of journals in the fields of medicine, science, and literature. We have available the complete text of Shakespeare's plays and sonnets, the King James Bible, and the Central Intelligence Agency's World Factbook. Dartmouth students, faculty, and staff have access to all of these resources from their dorm rooms or offices. Public computer clusters are available in the libraries and Kiewit Computation Center.
The World Wide Web was born in 1990 at CERN, the European Particle Physics Laboratory in Switzerland. The World Wide Web Initiative, based at CERN, coordinates development of protocols and languages that make it possible for anyone to access the WWW. On the World Wide Web information is available in a variety of formats, including text, graphical images, sound, and movies. Nobody owns the WWW; it is a collection of hypertext and multimedia residing on computers all over the world. To get 'onto the Web' you need access to the Internet, a computer, and some software called a browser. There are a number of Web browsers available that can be run on a personal computer (such as IBM PC or Macintosh) as well as on a mainframe (generally a computer that supports multiple simultaneous users) such as one might find at a university computer center. Two of the most popular Web browsers are Netscape(TM) and Mosaic(TM). Web browsers understand a communications protocol called HTTP (HyperText Transfer Protocol) which describes how to transfer hypertext across the information network. Hypertext is a way of providing information so that as you are reading text, you can follow references to other text, images, or resources. In the encyclopedia entry for MINERAL, you might find notes suggesting that you 'see CRYSTAL' or 'see INDEX OF REFRACTION.' To read more about those topics in a printed volume, you would have to turn pages to find the other entries, or even open a different volume of the encyclopedia. In a hypertext encyclopedia the words 'CRYSTAL' and 'INDEX OF REFRACTION' would be marked in a special way so that you would know that they were links to other articles. Following the CRYSTAL link would allow you to jump directly to that entry. When you were done reading that entry, you could jump back to where you were in the MINERAL entry. Or, as you read about crystals, you might discover a reference to ELECTRON MICROSCOPES, and you might decide to follow that link instead of returning to the MINERAL article.
What is HTML?
Hypertext documents created for use on the WWW are coded in a format called HyperText Markup Language (HTML). HTML is derived from SGML (Standard Generalized Markup Language), both of which are languages used to describe the general structure or content of a document. SGML is used by printers to produce printed text. In a book, for instance, the title page, each chapter, and even each paragraph would be 'marked up.' The title page would be marked up further to indicate which part was the title, which the author, and so forth. The printer has a set of instructions (likely a computer program) that specifies for this particular book that the title should be printed in a certain size and the author a little smaller and that both are centered on the page. Because we have indicated using SGML coding which part is the title and which the author, the printing program is able to print the book as desired.
HTML is a set of tags (or codes) that are used to mark up hypertext documents. Some tags you might find in an HTML document include 'IMG,' which is a link to a graphical image, 'CITE,' which indicates that the enclosed text is a citation to a published work, or 'STRONG,' which indicates that the enclosed text should receive strong emphasis when displayed. In most Web browsers this will be presented as bold text. HTML is a standard that is still under development. The first versions provided only a small set of tags to mark data. The current version will allow greater flexibility by allowing you to indicate the presence of tables, centered text, and mathematical equations.
What HTML does not do is describe how the hypertext document should be displayed on your computer. The Web browser programs read the HTML codes and use a set of rules that describe how to display a particular tag, just as the printer does with the SGML-tagged book. While there are general conventions for processing tagged data, software developers are free to do what they want when they encounter a particular tag. In some cases, the browser program lags behind the latest version of the HTML standard. A good example of this is the support of tables in HTML. This support will be in the next version of HTML, but even before it is approved, HTML document creators are using the tags to mark tables. Some browser programs added support for tables in anticipation of its being approved in the standard. Others have not. So, if you are using a browser that can handle tabular data correctly, you will see a table; if you are not, then you will see just a string of text.
Once you have access to the Internet and a Web browser, the last piece of information you need is the address of a WWW document to 'browse.' The network address of a document is specified in a standard format by a Uniform Resource Locator (URL). For example, the URL to access Dartmouth College's WWW page (or document) is https://www.dartmouth.edu. The 'http' part indicates that this document will be transferred using HTTP, that is, it is a hypertext document. The 'www.dartmouth.edu' part is the Internet address of Dartmouth's World Wide Web server. A server is a computer program that accepts requests from a 'client' program such as a Web browser and 'serves' out the data requested.
Preparing the data
Articles for the Bulletin are received by the editors as word-processing documents. These documents, along with any illustrations, are given to the printer. To create electronic Internet-accessible articles, I start with the same word-processing documents. Word processors insert special invisible codes into your document when you change font or font size or make a word appear in bold text. The first step in converting the document to HTML is to convert the invisible formatting codes stored by the word-processing program so they can be manipulated. I instruct the word-processing program to save the file in Rich Text Format (RTF), which converts the formatting to text instructions that other applications (computer programs) can read.
Once the article is saved in RTF, a different program converts the RTF to HTML. Both of these conversion programs process their data in a matter of seconds. The most time-consuming part of this comes next. The program that converts RTF to HTML does not always guess right about which HTML tag to use. The HTML document must be checked to see if all of the correct tags are used. I said above that HTML describes elements of a document, and the browser program describes how to display that element. In a word-processing document, italics could be used to emphasize some text, or to show a citation for a published work. The program that converts RTF to HTML cannot tell whether the italics should be translated to the CITE tag, indicating a citation, or the EM tag, which indicates emphasis. The program plays it safe and doesn't guess at all, converting them all to the 'I' tag. The HTML standard includes tags to specify 'bold' and 'italics' even though that decision is supposed to be left up to the browser program. With the printed text in hand, the HTML document is checked, and wrong guesses about tagging are corrected manually. There are some elements standard to an HTML document that must also be included manually. These include tags that specify that the document is in HTML format, and a title tag that the browser can use to provide a title for the window in which the text is shown. Each HTML document has two sections, the header section, HEAD, which includes the TITLE tag, and the body section, BODY, which includes the remainder of the text. (fig. 1)
Here is an excerpt from an article that appeared in the November 1994 edition of the Dartmouth College Library Bulletin:
At the end of the eighteenth century and the beginning of the nineteenth, there were numerous donations of books and funds to purchase books. The Reverend John Murray of Newburyport, for example, presented a great polyglot Bible in 1783; Moses Fiske, sometime tutor in the College, gave a large collection from his personal library in 1799; Noah Webster provided a subscription to the New York Spectator in that same year. Elisha Ticknor and Caleb Brigham gave [[sterling]]100 each for the acquisition of books in 1805 and Joel Barlow presented a copy of his newly published Columbiad in 1807. But these men, generous as they were, were not collectors as we know them.
Converting this paragraph to an HTML-tagged document produced the following:
<html><head><!-- This document was created from RTF source by rtftohtml version2.5 --> <title>Notes from the Special Collections</title> </head> <body> <CENTER> <h2><EM>Notes from the Special Collections</EM> <p> COLLECTORS AND DONORS--<p> THEIR IMPORTANCE TO THE DARTMOUTH LIBRARY<a class="dclAnchorLink" href="#fn0">[*]</a></h2> <p> <h3>PHILIP N. CRONENWETT</h3> </CENTER> <p> [Some text removed for illustrative purposes.] <P> At the end of the eighteenth century and the beginning of the nineteenth, there were numerous donations of books and funds to purchase books. The Reverend John Murray of Newburyport, for example, presented a great polyglot Bible in 1783; Moses Fiske, sometime tutor in the College, gave a large collection from his personal library in 1799; Noah Webster provided a subscription to the New York <CITE>Spectator </CITE> in that same year. Elisha Ticknor and Caleb Brigham gave [[sterling]]100 each for the acquisition of books in 1805 and Joel Barlow presented a copy of his newly published <CITE>Columbiad </CITE> in 1807.<a class="dclAnchorLink" href="#fn3"></a> But these men, generous as they were, were not collectors as we know them.<p> <hr> <a name="fn0">[*]</a> This paper was read at the Book Collectors' Workshop, sponsored by the Friends of the Dartmouth Library, on 7 May 1994.<p> <a name="fn3"></a> See Chase, <CITE>History of Dartmouth<</CITE>, 2: 509, for a list of early benefactions to the Library.<p> </body> </html>
Viewing this document using the popular Web browser Netscape(TM) gives us:
In this display you can see footnotes indicated by [*] and . The underline in this browser program indicates that these are links to the footnote text at the end of the article. Following the link would take you directly to that footnote. In an example of this size the advantage of this is not apparent, but in an article that goes on for many screens, it is useful to be able to jump to the footnote and then jump back to the text.
Accessing recent Library Bulletins on the Internet
You can read recent editions of the Dartmouth College Library Bulletin on the Internet by pointing your Web browser to this URL:
 Philip N. Cronenwett, `Collectors and Donors--Their Importance to the Dartmouth Library,' Dartmouth College Library Bulletin, n.s., 35:1 (November, 1994), 21.