HTML is the name of the set of codes used in writing the vast majority of documents found on the World-Wide Web. There are many HTML tutorials on the Web; one is at W3C. Mine, you might say, is unnecessary. But each tutorial is a little different from the others, and I hope mine fills a void left by all the others and is helpful to some people.
Note that I will not discuss every aspect of HTML here. For many things, I will simply refer to you the W3C site, where you can read up on the information yourself; there's no reason for me to repeat it. There are aspects of HTML I will not even mention here, and these are discussed at that site; you're encouraged, after you complete this guide, to read through parts 7-18 and appendix B there.
Note also that this page is meant to be read in order, from top to bottom. I advise against your reading one of the later sections before the beginning.
Finally, realize that learning HTML is, in a way, like learning any other language: You practice it best by using it. I strongly advise you to make up HTML documents as you go through this tutorial; these will get more and more complex as you go along. At each stage, you might find it advisable to test the document in W3C's HTML validator to make sure that it is valid HTML.
This page will help you learn HTML version 4.01 (HTML4). Please realize that most of HTML, especially most of HTML4, is meant to be used to change the function of the text on a page -- what is part of a paragraph, what is part of a quotation, what is part of a list, et cetera. It is not meant to be used to change the way a page looks. In order to change the way a page looks, something called style sheets must be used. I will therefore also introduce you to Cascading Style Sheets, level 1 (CSS1).
An HTML file is essentially a text file with extra things in it, as you'll see. So you can write it using any text editor. Notepad is a text editor that comes with Windows, SimpleText comes with the Macintosh, and vi or Pico is often installed on UNIX. Any of these will do. Just write your HTML file in this program, and save the file with a filename that ends in .html rather than .txt.
Every HTML document consists of just two things: text and tags. A tag always starts with a left-angle-bracket (<) and ends with a right-angle-bracket (>). Each tag has stuff in it, between those two brackets. The first word in each tag is the most important; the rest is called the attributes of the tag. Thus, for example, <a href="http://www.example.com/~msh210/" title="whatever"> is a tag; a is the name of the tag, href is one attribute, which has a value of "http://www.example.com/~msh210/", and title another attribute, which has a value of "whatever". You should put quotation marks around the value of any attribute.
Some tags come in pairs; others appear singly. Those that appear in pairs always follow the same pattern: the second tag is the same as the first, but with no attributes, and with a slash before the first word. Thus, the tag that would go with <a href="http://www.math.wustl.edu/~msh210/" title="whatever"> is </a>. Those tags that appear in pairs must appear in pairs; those that appear singly must appear singly.
The name of the tag can appear in capital, lower, or mixed case, so that <html> is the same as <HTML>, and both are the same as <hTmL>. The same is true for the names of the attributes. The values of the attributes, on the other hand, are case-sensitive: an upper-case value is different from the same value in lowercase.
Within each tag, at least one space (or tab, or line break; but a space
will suffice) must be between the name of the tag and the first attribute,
and at least one space between each attribute and the next. After the last
attribute, before the right-angle-bracket (>), there
may be space, but need not be. Before the name of the tag, after the left-angle-bracket
(<), there must not be any space. Also, within each
attribute there should be no space, and within the name of the tag there should
be no space. Thus, each of the following tags is okay: <foo a="b" c="d" ></foo
><foo a="e" b="f">. But the following
are wrong: < foo a="b" c="d"><foo
a="b"c="e"><fo
o a="b" c="d">< /foo>. (By the way, there is no
real tag called <foo>; I was just using it by way
of example.)
Note that if you are using two pairs of tags you must close the second before you close the first. That is, say you use the <p> tag and then the <a> tag. Each of these has a partner, but you must use </a> before </p>. It's like rooms: you must leave an inner room before you leave an outer room, exiting first the one you entered last.
Often, I will refer to a tag by its name, writing, say, "the <a> tag" instead of "the <a href="http://www.yahoo.com/"> tag" or the like.
A Web page written in HTML has to follow a certain structure, a certain ordering of tags. The first tag in it is the <!doctype> tag. Now, technically, for HTML4, this tag looks like <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">, which states that the document it's part of is an HTML4 document. But until you get more familiar with HTML4, you will, I think, want to use <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">, which states that it's part of a "transitional" HTML4 document; this sub-version of HTML4 is less exacting in how carefully you stick to the rules of HTML4.
After the <!doctype> tag comes the following tags, in order: <html>, <head>, some other tags which are about the document, </head>, <body>, text which is to appear on the page and tags which change the page, </body>, and </html>. None of these six tags needs any attributes. Thus, the page would look like
<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
. . .
</head>
<body>
. . .
</body>
</html>
I'll first discuss the tags that go between <head> and </head>, and then the things that go between <body> and </body>.
The head of the page -- the part from <head> through </head> -- supplies information about the page to Web browsers, to search engines, and to any other program looking at the page. This information is not seen by the person looking at the page itself. There are only three tags that go in this section, between <head> and </head>: <title> ... </title>, <meta>, and <link>.
<Title> is a mandatory tag: it must be included. Between <title> and </title> goes the title of the page. In most browsers in Windows9x and Macintosh, the title of the page appears in the title bar of the browser window that's displaying that page; for example, the title of this page is HTML Tutorial. Choose your title wisely, as search engines will also display it.
<Meta> and <link> are optional: you need not include them, and many people don't. <Meta> is used to tell search engines about the document, and <link> is used to point to documents related to the one in question. Thus, for instance, a <meta> tag can indicate the name of the author of document, and another can include a copyright notice, whereas a <link> tag will, say, point to a page whereat you can contact the author or read a copyright notice. It's a fine distinction, but should be noted. The <link> tag usually has two attributes: href and either rel or rev. (<Meta> doesn't use either of those; it uses different attributes.) Href has as its value a URL (Internet address), and rel or rev is use to describe the relation between that URL and the page that the <link> tag is on: the URL is the rel-value of the page the link tag is on, and the page the link tag is on is the rev-value of the URL. Thus, for example, this page once had a <link rel="author" rev="made" href="mailto:msh210@nyu.edu" title="html.html"> tag. (I'll explain the title attribute more later.) I'm not about to give you a complete list of possible <meta> and <link> tags and how to use them, but I will tell you about one of them later, when I discuss CSS. For the mean time, I will point you to a couple of sites that have more information:
The body of the page -- the stuff between <body> and </body> -- will comprise text, mostly. But among the text will be some tags. Some of these relate to the text itself -- state that certain text constitutes a quotation, say, or a paragraph -- and others do not. The tags that affect the text all come in pairs, with the second tag beginning with a slash, as I explained above. Among the tags that affect the text, there are two types: logical and stylistic.
Logical tags indicate what the text is -- a paragraph, a quotation, emphasized, a heading, etc. These tags all take the attributes title, lang, class, and style, all four of which I will discuss later. The tags are:
These make headings on the page. Thus, the phrase HTML Tutorial -- Writing Web Pages at the top of this page is coded <h1>HTML Tutorial -- Writing Web Pages</h1>, and the word Introduction immediately below it is coded <h2>Introduction</h2>. The difference between these six tags is one of degree: <h1> is used for the highest-level heading on each page, <h2> for a subheading, and so on.
This makes a paragraph. Each paragraph starts with <p> and ends with </p>.
This tag takes the place of <p> if the paragraph is, specifically, information about the authorship of the Web page. But even in that case, you can still use <p>, and many people do, so the <address> tag is not very useful.
These two tags surround quotations. The difference is that <blockquote> is used when the quotation comprises a paragraph or more -- in that case, you put the paragraph(s) inside the quotation, like so:
<blockquote><p>This is a paragraph.</p>
<p>And this is another.</p></blockquote>
Even if there's only one paragraph in the quotation, you still need the <p> tag.
<Q>, on the other hand, is used for shorter quotations, quotations within paragraphs. Thus, <p>Descartes said, <q>I think, therefore I am.</q></p>. Note, though, that browsers are supposed to put in a quotation mark when they encounter a <q> or </q> tag, but many don't. Therefore, you are advised against using the <q> tag at all; use quotation marks instead.
There is a special attribute, cite, for these two tags. The value of the attribute is the URL from which the quotation was taken (if, of course, it was taken from a source that has a URL). Thus, for example, <q cite="http://www.exmaple.com/~msh210/aslfl_proposal.html">.
These are used for emphasis. <Strong> is used for stronger emphasis than <em> is. These must be used within a paragraph (or heading, or list), as in <p><em>Yes!</em> We have no bananas.</p>; you cannot do anything like
<em>
<p> ... </p>
<p> ... </p>
</em>
or even
<em>
<p> ... </p>
</em>
These five tags are used, respectively, for terms being defined, computer program code, output from a computer program, input into a computer program, and variables of a computer program. The latter four serve little purpose except in Web pages devoted to computer programming or the like. These tags must be used within a paragraph (or heading, or list); you cannot contain a paragraph within any of these tags.
This tag is used for a citation. For example: See <cite>312 U.S. 216</cite> for details.
These two tags are used for abbreviations and acronyms. Unfortunately, W3C is not very clear on the difference between <abbr> and <acronym>. See the specification for details, if you wish; otherwise, just use whichever you like.
These two tags are meant to mark insertions and deletions. That is, say an earlier version of a certain document differed from the current version, and you wish to represent this change in your quotation of the document. Then you'd use something like this:
<blockquote>
<p><del>Eighty-seven</del> <ins>Four score and seven</ins> years ago, ...</p>
</blockquote>
There is a special attribute, cite, for these two tags. The value of the attribute is the URL of an explanation why the insertion or deletion was made (if, of course, the explanation is at a location that has a URL).
You can use <ins> and <del> within a paragraph (or heading, or list), as I did above, or, alternatively, you can mark a whole paragraph (or heading, or list) as being an insertion or a deletion, as follows:
...</p>
<ins>
<p> ... </p>
<p> ... </p>
</ins>
<p> ...
Stylistic tags, called font style tags, indicate what the text should look like. They are <tt>, <i>, <b>, <big>, <small>, <sup>, and <sub>, which make text monospaced, italicized, boldfaced, bigger, smaller, superscripted, and subscripted, respectively. These tags all come within paragraphs (or headings, or lists); you can't contain a paragraph within a <b> tag, for instance. Note that whenever you mean that certain text has a property for which there exists a logical tag, you should use the logical tag, not a stylistic one. (None of the tags mentioned in this paragraph takes any attributes of importance except title, lang, class, and style, all four of which I will discuss later.)
There are other things that can be found between the <body> and </body> tags, and I will deal with these singly, in the sections to come.
A hyperlink is what the World-Wide Web is all about: hyperlinks are those things you follow to get to other pages, usually by clicking. To make a hyperlink in your page, you need the <a> tag. You put the <a> tag before the text (or image; see below) that should serve as a link, and </a> after. This tag needs one attribute, namely href; for this attribute's value, use the URL of the place you want to link to. If you wanted to link to NYU, say, your tag would look like <a href="http://www.nyu.edu">. Other attributes that can be used in the <a> tag are hreflang, rel, rev, and accesskey; some of these I discussed above, when discussing the <link> tag. If you really want to know more about these attributes you can check the W3C discussion of <a>.
<A> can also be used to send someone mail. Specifically, the link <a href="mailto:foo@example.com">, when followed, sends mail to foo@example.com. In this case, the title attribute is more useful than usually: in some browsers, it specifies the Subject line of the e-mail to be sent.
You may well want images (pictures) on your page. This is done using the <img> tag. This tag, unlike most, has no partner: there is no </img> tag, and you should not use one. This tag has two required attributes: src and alt. Src takes as its value the URL of the picture. (Thus, the HTML document you're writing does not actually include the image in any real sense, it just refers to the image.) Alt takes as its value the text that those who cannot see the picture can read in its stead. (Those who cannot see the picture include the blind, those using text-only World-Wide Web browsers, and those using graphic browsers with the graphics capability turned off.)
For example, if you wanted to include in your HTML document the picture located at http://us.yimg.com/images/new2.gif (which is a graphic representation of the word "new!") and put it among text instead of the word "new", then you'd want those who can't see the picture to see the word "new!" instead. Thus, "new!" would be the value of the alt attribute, and your tag would look like this: <img src="http://us.yimg.com/images/new2.gif" alt="new!">. (This, in fact, is almost precisely what Yahoo! does on many of its pages; e.g., http://dir.yahoo.com/Computers_and_Internet/.) A similar example is what I did with images at the very bottom of this page. For more info on how to choose an alt value, see Jukka Korpela's advice.
There are three types of lists: ordered, unordered, and definition. One of each follows.
Obviously, an ordered list has a specific order (you can't rise before you awaken, and you're unlikely to put your feet on the floor after you're already standing). An unordered list has no order (in fact, the one in my example has little sense, but others can have more sense but still no specific order). And a definition list, which may or may not have a specific order, is meant to have a word or short phrase explained by a longer phrase or paragraph.
Ordered and unordered lists (not definition lists) are rather similar: an ordered list starts with <ol> and ends with </ol>, and an unordered list starts with <ul> and ends with </ul>. In between those two tags come a series of list items, each of which is surrounded by <li> ... </li> tags. Thus, the examples above were made as follows:
<ol><li>Wake up.</li>
<li>Put your feet on the floor.</li>
<li>Stand.</li><li>Walk.</li></ol><ul>
<li>A cat.</li><li>An idea.
</li><li>An air conditioner.</li><li>This Web site.</li>
</ul>
However, since the only things in an ordered or unordered list can be list items, and since list items can only be in lists, therefore when the browser sees a <li> or </ol> or </ul> tag, it knows that that's the end of the previous list item. So you can leave off the </li> tags: the browser will figure out where the list item ends. Thus:
<ol><li>Wake up.
<li>Put your feet on the floor.
<li>Stand.<li>Walk.</ol><ul>
<li>A cat.<li>An idea.
<li>An air conditioner.<li>This Web site.</li>
</ul>
A definition list starts with a <dl> tag and ends with a </dl> tag. Between them come the list entries. For each entry, first comes a <dt> tag, then the term to be defined, then a <dd> tag, then the definition. If more than definition exists for a certain term, you can add another <dd> and another definition; in fact, as many as you'd like. Similarly, if more than one term has one definition (e.g., if WWW and Web have the same definition), you can have two <dt> tags. For example:
<dl>
<dt>Web
<dt>World-Wide Web
<dt>WWW
<dd>The set of all servers using HTTP
<dd><em>See also: Internet, HTTP, server.</em>
</dl>
None of the list-related tags -- <ul>, <ol>, <li>, <dl>, <dt>, or <dd> -- takes any attributes of importance, except title, lang, class, and style, all four of which I shall discuss later.
Whitespace is any spacing you put into your document: any spaces (made by hitting the space bar), tabs (made by hitting the tab key), or linebreaks (made by hitting Enter). When used in an HTML document, these are ignored. Thus, the following
<p>******This
********is
*********a
staircase.</p>
will look, in a browser, like the following (without the border, if you see one)
******This ********is *********a staircase.
which is not what you wanted. There are a few tags that can help you. The <br> tag adds a linebreak to the page; so
<p>******This<br>
********is<br>
*********a<br>
staircase.</p>
comes out as
******This
********is
*********a
staircase.
which is still not what you wanted, but, hey, it's better than before.
Another tag that can help you is <pre>,
which stands for preformatted
and means that any whitespace you put
in will show up in the document as it's shown by the browser. Thus,
<pre>******This
********is
*********a
staircase.</pre>
will come out as
******This
********is
*********a
staircase.
which is precisely what you want.
I will now discuss a few oddball things that I didn't get around to discussing elsewhere on this page.
The <hr> tag makes a line across the screen; it has no corresponding </hr>.
Since one left-angle-bracket (<) and one right-angle-bracket (>) are used in each tag, if you really want to include one of them in your document so that people can see it (for example, if you're writing about HTML or math), you can't just type it in as is. (If you do, the program reading your page will think it's part of a tag.) Instead, for a left-angle-bracket ("less-than" sign), use <, and for a right-angle-bracket ("greater-than" sign), use >. The same is true if you want to include these angle-brackets in, for example, an alt value. Likewise, if you want to include an ampersand (&) anywhere, you can't just type it in as is. (If you do, the program readng it will think it's the beginning of something like <.) Instead, you have to use &. The same is true when the ampersand appears in the value of, say, an alt attribute or an href attribute. Also, if you want to include a double-quotation-mark (") in the value of an attribute, you can't just type it in as is. (If you do, the program reading it will think that that's the end of the attribute's value. Recall that attributes are surrounded by double-quotation-marks.) Rather, use ". Note that although HTML tags can be written using capital or lowercase, these four things -- <, >, &, and " -- must be written in lowercase letters only.
There are a few attributes that can be stuck into almost any tag. Two of these are lang and title. Lang indicates the language of the tag's contents, and can be used in any tag that has a partner closing tag (e.g., <p> ... </p>), but not (in general) in a tag that stands alone (e.g., <hr>). Lang can be used in the <html> tag, but need not be used anywhere else unless the page is partially in one language and partially in another. Title can be put into any tag, but is most useful for the <a>, <link>, and <img> tags. In an <a> (or <link>) tag, title indicates information about what the link does or what it links to (except where the tag is being used for sending e-mail; see above). In an <img> tag, title can indicate the title of the picture. In an <abbr> or <acronym> tag, title can indicate what the acronym or abbreviation stands for. In general, title provides information about the tag in question.
These two attributes -- lang and title -- as well as style and class, which I'll discuss later, can also be put into the <div> and <span> tags. These two tags serve no other purpose than to hold such attributes. The difference between <div> and <span> is that <div> is used around a large body of text -- one comprising at least one paragraph or list, say -- whereas <span> is used within a paragraph or list.
Realize that there are several aspects of HTML that I did not even touch upon. Forms, frames, image maps, and tables, among other things, are discussed by the W3C. But the contents of this tutorial should be enough to start you on a decent Web site.
After you set up the logical layout of a document with HTML, you can style it; for this you use CSS, which I will now introduce you to. Here, too, I will not give you all the information that exists. For whatever I've left out, you can refer to the W3C page on CSS.
CSS is made up of rules. Each rule indicates what it (the rule) affects, and then what it does. One rule might be, "I will affect every <h1> tag. I will make them all blue." (A rule is not actually written like that; I just used it as an example. See the next paragraph for how to write rules.) It first indicates what it affects -- <h1> tags -- and then states what it does. These rules are meant to change the way things in an HTML document look; they do not change the logic of the document.
Rules are written as follows: First write what the rule affects, then an left-curly-brace ({), what the rule does, and a right-curly-brace (}). Between the two curly braces is one or more declarations -- things that the rule does. For example, "make it blue", "make it larger", "make it come out on the right side of the page" are all declarations. (They're not actually written like that; I just use those as examples.) If a rule has more than one declaration, the declarations must be separated by semicolons (;).
A rule can affect a certain tag; in that case, you can simply write the name of the tag before the left-curly-brace. An example is p { color: black }. But let's say you have two different types of, say, paragraphs on your page; some are important, and you want those in red, and others are not as important, so they should be in black. How do you do that?
Well, HTML4 has an attribute called class. The value for this attribute can be anything you want (but only use letters in the value). An important paragraph, then can start with <p class="impor">, and an unimportant one with <p class="unimp"> or simply with <p>. How does this help? Well, you can use CSS to indicate that a paragraph of a certain class should be red and of another class should be black. In our example, you'd write
p {color:black}
p.impor { color:red}
If you want the contents of every tag with class="impor" to appear in red, and not just a paragraph, you can write a rule like .impor { color:red }. (W3C discusses how to represent individual colors. Most colors require coding [that is, you must use a special code rather than a name of the color]. If you have a browser that shows colors, you can see the code for any color you may want to use.)
If you want the rule to affect one tag only when it's within a certain other tag, write the outside tag, then the inside tag, and finally the declarations in their curly-braces. For example, say you want a link to be red if it's in an unimportant paragraph, but black in an important paragraph. Then you can use the following:
p {color:black}
p.impor { color:red }
p a{ color:red}
p.impor a { color:black}
If you want precisely the same rule to affect two different tags (or classes), simply put a comma between the tag (or class) names. For example: h1 ,h2 {color: blue }.
All the names of the declarations that you will need are available at the W3C page on CSS. However, I will give you a few of them: Font and its related rules change the font face, font size, italicization, small-caps status, and font thickness of text; color changes the text color; these rules do not change the size of images, or anything else but text. Background and its related rules change the background on the page. Text-align changes the alignment of text within a <p>, say, or a <pre>; note, though, that it can only be used on tags (such as <p>, <pre>, and <h1>) which automatically add a line-break before and after themselves; it cannot be used on, say, the <big> tag. Text-decoration indicates whether text (this rule only affects text) should have a line under, over, through it, or none of the above, or whether it should blink.
Another interesting thing you can do with CSS is make the first letter or first line of something (say, a paragraph) have a certain style (say, be bigger or in a different color). Or, you can make links have different styles depending on whether they have been visited already or not, or depending on whether the mouse is currently over the text that is a link. These are discussed in the W3C's treatment of CSS; see there.
Note that whether you include spaces (or line breaks) before or after any of the colons, semicolons, or curly-braces in a rule -- or between rules -- is of no consequence.
The one thing I have not told you is how include a style sheet in an HTML document. There are four ways, of which I will tell you two (the others are not very useful, I find). One way is to write up the style sheet (e.g.,
body { color:#000000; background-color: #ffffff}
h1 { text-align: left;font-size: larger; font-weight:bolder }
h2 {text-align: left; font-size: larger;font-weight: normal }
ol { list-style-type: decimal; list-style-position: outside }
tt, pre { background-color: #ffffcc }
a:visited, a:link, a:active { text-decoration: none }
a:hover { text-decoration: underline }
img { border-style: none }
a img { border-style: none }
.banner { display: block; font-size: smaller }
) and save it as a .css file (e.g., whatever.css). Then refer to it in a <link> tag as follows: <link rel="stylesheet" type="text/css" href="whatever.css">, where whatever.css is replaced by the URL of your .css file, and the rest remains exactly as it is.
After doing that, there may yet be some occasional spots in your document where you want the general style sheet not to affect the tag in question. For example, say you wrote tt { background-color: #ffffcc}, so that every <tt> tag has a background color of yellowish, and you want a specific <tt> tag to have a white background. Then you can add a style attribute directly into the <tt> tag in the HTML document, writing <tt style="background-color:white">; note the absence of the curly-braces.
Both HTML and CSS allow for comments. These are parts of the document which in no way affect the document. That is, if there's a comment in the HTML coding right in the middle of this sentence, you can't tell by looking at the sentence in your browser. What, then, you ask, is the point of a comment? One purpose may be to remind yourself of changes to the document you'll want to make later. But whatever their purpose, they are allowed in both HTML and CSS.
In HTML, a comment is included in a <!> tag, which has as its attributes individual comments. Each comment must begin with a double-hyphen (--) and end with a double hyphen (--), but there can be as many comments as you like in each comment (<!>) tag. An example of a comment tag would thus be <!-- This is a comment. -- --This is another-- >. Note that whereas the name of every tag and its first attribute must always be separated by whitespace (e.g., a space), the name of the <!> tag and its first comment may not be separated by whitespace. I know this makes no sense, but that's the way it is.
In CSS, each comment begins with slash and asterisk (/*) and ends with an asterisk and slash (*/). But do not separate the slash from the asterisk with even so much as a space. Thus, a style sheet can look like:
body { color:#000000/*black*/; background-color: #ffffff/*white*/}
h1 { text-align: left;font-size: larger; font-weight:bolder }
h2 {text-align: left; font-size: larger;font-weight: normal }
ol { list-style-type: decimal /*1,2,3,4,...*/ ; list-style-position: outside }
tt, pre { background-color: #ffffcc /* yellowish */}
a:visited, a:link, a:active { text-decoration: none }
a:hover { text-decoration: underline }
img { border-style: none }
a img { border-style: none }
.banner { display: block; font-size: smaller }
There is much I haven't written, but it's available from the sites I've linked to.