Week One.

Notes.

Content in Web design

As we went over last week, Web design can be split into two parts. The one most people think of when they hear the word design is presentation — the visual appearance of a Web page: colors, fonts, layout, and so forth. The other is content, or the actual text and information contained on a page. Content is, in fact, more important to Web design than appearance; without content, there would be nothing of substance for us to style!

On the Web, page content is generally written in a language called Hypertext Markup Language, or HTML for short. The word hypertext is a silly-sounding 1960s term that essentially indicates the ability to jump between documents through links between each one — essentially a more advanced version of the references or bibliographies found in a regular book; language is a word I’m fairly certain you all had down by the time you were in kindergarten. The important term here is therefore, of course, markup. To mark up a document means to impart upon it additional information beyond what is in the text. When you italicize something in your favorite word processor, for instance, the extra formatting is a form of markup.

HTML is not intended to be a formatting markup language — the job of formatting should be left to CSS, the language we will use later to determine a Web page’s presentation, so using heading tags just to get large bold text is highly frowned upon. Instead, HTML is used to indicate the meaning, or semantics, of parts of documents — semantic contexts: paragraphs, quotes, emphasized text, or even just acronyms. Sometimes, the differences between these contexts are a bit unclear, especially when they are presentationally indistinguishable. Emphasized text and the titles of books like Nineteen Eighty-Four are both, by default, italicized, but are indicated by two very different contexts.

General HTML syntax

The fundamental unit of an HTML page is the element, and a typical element looks something like this:

<p>Hi, I’m a paragraph.</p>

The magic of HTML lies inside the angle brackets, those pointy things that look like they could stab you if you aren’t careful — these are tags that indicate the semantic context of the text they surround. Tags always come in pairs — one opening tag buddied up with one closing tag (there is one special case we will learn about later, the self-closing tag) — and their names show the meaning of the content they enclose. Here, the semantic context is p, for paragraph. (Computer people are terse.) Closing tags are indicated by the placement of a slash before the normal tag name.

But sometimes tags by themselves just aren’t enough; they might need modifiers to indicate specifics about the tag’s meaning. For example, take the <a> tag, which is used to make links to other locations. (Like I said, terse.) We could write a link like we wrote the paragraph above:

<a>Google</a>

This is kind of useless, though, because we can’t say where the link should point. Enter the attribute. Attributes are defined as key-value pairs, and are used to specify that sometimes all-important extra information. The <a> tag just happens to have an href attribute that can be used to show where a link points, so let’s try that:

<a href="http://www.google.com/">Google</a>

… et voilà! A working link.

The structure of an HTML page

HTML treats content as a series of nested containers, kind of boxes within boxes. Imagine a toy chest belonging to an obsessively-organized five-year-old, for example. Inside this large outer container are several smaller boxes containing building blocks, toy vehicles, marbles, and so on. Each of these second-level boxes in turn contains some third-level boxes belonging to the group defined by the second-level box — the vehicles might be sorted into boxes for regular cars, pick-up trucks, tractor trailers, motorcycles, etc.

In much the same way, HTML is composed of elements nested within elements. The following fragment of a sample HTML document illustrates this model:

<body>
    <h1>Page Heading</h1>
 
    <div class="section">
        <p>This is a simple page.</p>
        <p>Here’s another paragraph.</p>
    </div>
 
    <div class="section">
        <p>This paragraph doesn’t belong to the subsections.</p>
        <div class="section">
            <img src="image.gif" alt="An image inside a subsection." />
        </div>
        <div class="section">
            <p>Hey, a paragraph inside a subsection.</p>
            <p>Hey, look, another one.</p>
        </div>
    </div>
</body>

Everything in this page is part of the body of the page — the stuff that shows up in the main content window of a typical browser — as indicated by the outermost <body> element. Inside of the body, in order, are a heading (<h1>) and two sections (<div class="section">). The first of these sections simply contains two paragraphs, but the second contains two subsections as well as a paragraph. Note how subsections are still marked up with a class of section, not subsection — subsections are still sections, just ones that fall within other sections.

Now that we have an idea of what the general structure of a page looks like, let’s examine a complete simple one.

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-/W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title>Hello World</title>
    </head>
    <body>
        <h1>Hello World</h1>
 
        <p>This is our first XHTML page. Some fancy <em>inline</em> stuff.
        <a href="http://evil.google.com/">A link.</a></p>
    </body>
</html>
 
<!--Last updated: Dec 3 2007 --gt;

The first three lines describe the versions of XML and HTML being used, and consist mostly of technical gobbledygook that you’ll probably just copy and paste into new documents. The important stuff opens with the <html> tag that starts off the structure of all HTML documents and contains every other element in the page. The attributes are unimportant unless you’re writing content in a language other than English, in which case you should change the en to the appropriate language code.

Up next is the <head> element, whose content contains information about the page. With a few exceptions, the elements here don’t show up anywhere on the page, but determine how certain parts are interpreted. Later, when we begin using CSS, its rules will go here. For now, the only element of importance is <title>, which contains the title of the page displayed in title and tab bars.

Now that all of that’s out of the way, we get to where most of HTML’s business is done: the <body> element, which contains the stuff that shows up in your browser window. Inside of the body, we have first a heading, denoted by the <h1> tag. There are six levels of heading tags, running from <h1> (the highest-level) to <h6> (the lowest-level). <h1> is generally used for page titles, with section headings (absent in this example) being denoted by <h2> elements. Following the heading is a short paragraph, which is enclosed in <p> tags as seen earlier.

Both the <h1> and <p> elements are what are called block-level elements, because they define large blocks of content within a page. Some block-level elements can contain others; almost all of them can contain inline elements, which define semantic contexts for text falling within the flow of a block-level element. The <em> (emphasis) and <a> (anchor, or link, as previously discussed) are examples of inline elements. Inline elements cannot be placed directly within the body of a page; they must be inside block-level elements. In this way, they’re a bit like orange juice, which must be placed within some sort of block-level container, whether a jug, a bottle, or a cup, before being stored in a refrigerator. (Well, okay, you can put orange juice directly in your fridge, but have fun cleaning up the mess.)

At the end of the document, we make sure to properly close our <body> and <html> elements by using the </body> and </html> tags, ensuring to maintain proper nesting order. Since we opened the body most recently, it must be closed first before we can end the document with </html>.

But wait! What’s this at the end? Isn’t the document over? Yes, technically. The last line contains an HTML comment, which is a notation that is completely ignored by browsers parsing the document. Comments start with , end with --gt;, and can go almost anywhere inside the content portion of an element, or even outside the <html> element. They’re handy for making notes to yourself — perhaps the time the page was last updated, as here, or run spellcheck or expand list. Two things to note: comments can’t contain <-/code>, for special syntactical reasons, and are visible when users view the source of your pages (so don’t write secret love notes in comments!).

That wraps up week one of Web Design Seminar. Next week: a whirlwind tour of specific HTML elements, and a taste of CSS. Stay tuned.