As we went over last week, Web design can be split into two parts.
The one most people think of when they hear the word design
is
presentation — the visual appearance of a Web page: colors,
fonts, layout, and so forth. The other is content, or the
actual text and information contained on a page. Content is, in fact,
more important to Web design than appearance; without content, there
would be nothing of substance for us to style!
On the Web, page content is generally written in a language called
Hypertext Markup Language, or HTML for short. The word hypertext
is a silly-sounding
1960s term that essentially indicates the ability to jump between
documents through links
between each one — essentially a more
advanced version of the references or bibliographies found in a regular
book; language
is a word I’m fairly certain you all had down by
the time you were in kindergarten. The important term here is
therefore, of course, markup
. To mark up a document means to
impart upon it additional information beyond what is in the text. When
you italicize something in your favorite word processor, for instance,
the extra formatting is a form of markup.
HTML is not intended to be a formatting markup language — the job of formatting should be left to CSS, the language we will use later to determine a Web page’s presentation, so using heading tags just to get large bold text is highly frowned upon. Instead, HTML is used to indicate the meaning, or semantics, of parts of documents — semantic contexts: paragraphs, quotes, emphasized text, or even just acronyms. Sometimes, the differences between these contexts are a bit unclear, especially when they are presentationally indistinguishable. Emphasized text and the titles of books like Nineteen Eighty-Four are both, by default, italicized, but are indicated by two very different contexts.
The fundamental unit of an HTML page is the element, and a typical element looks something like this:
<p>Hi, I’m a paragraph.</p>
The magic of HTML lies inside the angle brackets,
those pointy things that look like they could stab you if you aren’t
careful — these are tags that indicate the semantic context
of the text they surround. Tags always come in pairs — one opening tag
buddied up with one closing tag (there is one special case we will
learn about later, the self-closing tag) — and their names
show the meaning of the content they enclose. Here, the semantic
context is p, for paragraph
. (Computer people are
terse.) Closing tags are indicated by the placement of a slash before
the normal tag name.
But sometimes tags by themselves just aren’t enough; they might need
modifiers to indicate specifics about the tag’s meaning. For example,
take the <a> tag, which is used to make
links to other locations. (Like I said, terse.) We could write a link
like we wrote the paragraph above:
<a>Google</a>
This is kind of useless, though, because we can’t say where the link
should point. Enter the attribute. Attributes are defined
as key-value pairs, and are used to specify that sometimes
all-important extra information. The <a>
tag just happens to have an href attribute that can be
used to show where a link points, so let’s try that:
<a href="http://www.google.com/">Google</a>
… et voilà! A working link.
HTML treats content as a series of nested containers, kind of boxes within boxes. Imagine a toy chest belonging to an obsessively-organized five-year-old, for example. Inside this large outer container are several smaller boxes containing building blocks, toy vehicles, marbles, and so on. Each of these second-level boxes in turn contains some third-level boxes belonging to the group defined by the second-level box — the vehicles might be sorted into boxes for regular cars, pick-up trucks, tractor trailers, motorcycles, etc.
In much the same way, HTML is composed of elements nested within elements. The following fragment of a sample HTML document illustrates this model:
<body> <h1>Page Heading</h1> <div class="section"> <p>This is a simple page.</p> <p>Here’s another paragraph.</p> </div> <div class="section"> <p>This paragraph doesn’t belong to the subsections.</p> <div class="section"> <img src="image.gif" alt="An image inside a subsection." /> </div> <div class="section"> <p>Hey, a paragraph inside a subsection.</p> <p>Hey, look, another one.</p> </div> </div> </body>
Everything in this page is part of the body of the page — the stuff
that shows up in the main content window of a typical browser — as
indicated by the outermost <body> element.
Inside of the body, in order, are a heading (<h1>) and two sections (<div class="section">). The first of these sections simply contains two
paragraphs, but the second contains two subsections as well as a
paragraph. Note how subsections are still marked up with a
class of section, not subsection
— subsections are still sections, just ones that fall within other
sections.
Now that we have an idea of what the general structure of a page looks like, let’s examine a complete simple one.
<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-/W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Hello World</title> </head> <body> <h1>Hello World</h1> <p>This is our first XHTML page. Some fancy <em>inline</em> stuff. <a href="http://evil.google.com/">A link.</a></p> </body> </html> <!--Last updated: Dec 3 2007 --gt;
The first three lines describe the versions of XML
and HTML being used, and consist mostly of technical
gobbledygook that you’ll probably just copy and paste into new
documents. The important stuff opens with the
<html> tag that starts off the structure of
all HTML documents and contains every other element in
the page. The attributes are unimportant unless you’re writing content
in a language other than English, in which case you should change the
en to the appropriate language code.
Up next is the <head> element, whose
content contains information about the page. With a few exceptions,
the elements here don’t show up anywhere on the page, but determine how
certain parts are interpreted. Later, when we begin using CSS, its rules will go here. For now, the only element of
importance is <title>, which contains the
title of the page displayed in title and tab bars.
Now that all of that’s out of the way, we get to where most of HTML’s business is done: the <body> element, which contains the stuff that shows up in your browser
window. Inside of the body, we have first a heading, denoted by the
<h1> tag. There are six levels of heading
tags, running from <h1> (the highest-level)
to <h6> (the lowest-level). <h1> is generally used for page titles, with
section headings (absent in this example) being denoted by <h2> elements. Following the heading is a short
paragraph, which is enclosed in <p> tags as
seen earlier.
Both the <h1> and <p> elements are what are called block-level elements,
because they define large blocks of content within a page. Some
block-level elements can contain others; almost all of them can contain
inline elements, which define semantic contexts for text
falling within the flow of a block-level element. The <em> (emphasis) and <a>
(anchor, or link, as previously discussed) are examples of inline
elements. Inline elements cannot be placed directly within the body of
a page; they must be inside block-level elements. In this way, they’re
a bit like orange juice, which must be placed within some sort of
block-level container, whether a jug, a bottle, or a cup, before being
stored in a refrigerator. (Well, okay, you can put orange
juice directly in your fridge, but have fun cleaning up the mess.)
At the end of the document, we make sure to properly close our <body> and <html>
elements by using the </body> and </html> tags, ensuring to maintain proper nesting
order. Since we opened the body most recently, it must be closed first
before we can end the document with </html>.
But wait! What’s this at the end? Isn’t the document over? Yes,
technically. The last line contains an HTML
comment, which is a notation that is completely ignored by
browsers parsing the document. Comments start with , end with --gt;, and can go almost anywhere
inside the content portion of an element, or even outside the <html> element. They’re handy for making notes to
yourself — perhaps the time the page was last updated, as here, or
run spellcheck
or expand list
. Two things to note:
comments can’t contain <-/code>, for special syntactical
reasons, and are visible when users view the source of your pages (so
don’t write secret love notes in comments!).
That wraps up week one of Web Design Seminar. Next week: a whirlwind tour of specific HTML elements, and a taste of CSS. Stay tuned.