Return to Links
A gentle, rambling HTML tutorial
Introduction
It's not always appreciated that when you use a Browser, such as Internet Explorer, what
you are actually doing is running a computer program - called, not surprisingly in that
case, Internet Explorer! (There are many other Browsers; some better, some better
still.)
As everybody knows, most computer programs require data in order to work.
The basic set of data a Browser needs is a Web Page. (Key concepts, such as Browser and
Web Page, will, for emphasis, receive capital letters.)
Computer programs are usually pretty finicky about their data: they like the data to be
just so or, in the jargon, in the correct format.
A Browser is no exception: if you present it with garbage for data, the results will be
unpredictable - or, actually, only too predictable! Remember GIGO: Garbage In Garbage
Out.
So, if for some crazy reason you want to produce Web Pages, you have to try to humour the
Browser by getting the format right.
That's where HTML comes in.
Some preliminaries about HTML
The idea of HTML is to provide you with a set of rules for writing a Web Page so that a
Browser will know what to do in order to produce the display that you intend.
Why is it called HTML? The letters stand for HyperText Markup Language. Three little
words, so let's take them one at a time - in reverse order, if you don't mind.
My dictionary says that Language is 'the method of human communication, either spoken or
written, consisting of the use of words in an agreed way'. So that's alright, then! HTML
is the method of communicating with a Browser in an agreed way, so that the poor Browser
can try to do what you intend. The good news is that, as with many languages, it's not
difficult to get started with HTML - trust me!
Inexplicably my dictionary has never heard of Markup. In prehistoric times - that is,
before the Internet - Markup was the standard way of communicating Text instructions to
typists or printers. (Nothing to do with increasing prices, by the way!) Markup consisted
of an agreed set of symbols - it was hardly a Language, really - and there were rules for
using the symbols. Similarly HTML has an agreed set of symbols, called Tags, and a set of
rules for assembling these Tags into a Web Page. A Tag consists of one or more symbols -
typically a word, but not always - enclosed in a pair of angle brackets, such as
<a>, <br />, <table> and so on.
Finally HyperText: fancy word; simple idea. HyperText allows you to tell the Browser to
jump about, either within a Web Page or between one Web Page and another. It is no
exaggeration to claim that HyperText is by far the most important concept behind the
Internet; there would be no Internet to speak of but for HyperText. Imagine only being
able to look at a single Page of Text, without the ability to Click and thus jump to
somewhere else. For some reason, the Tag in HTML which effects the jump is called an
Anchor and that's what the aforementioned <a> Tag is about. All will be made
clear!
The ideas encapsulated in HTML have been around ever since the Internet first began. We
needn't go into too much history but, as with any Language, HTML has developed over the
years. The dialect many people use is HTML4.01, which is certainly fine for producing
perfectly adequate results but, from the point of view of software purists (such as your
humble servant), it is a shade unhygienic - and where would we be without hygiene? The
version I normally use these days is called XHTML, standing for eXtensible HTML, in fact
XHTML1.1.
As with any other speciality, life in computerland is replete with grotesque acronyms.
It's worth being aware that there is a sort of governing body of recognised standards on
the Internet - nothing trivial such as moral standards, but relating, for example, to
what should or should not be recommended components of HTML in its various incarnations.
This body is the W3C, which stands for the World Wide Web Consortium. They try to
persuade the writers of Web Browsers and Web Pages to adhere to a common set of
standards. Needless to say, they have a Web site which, among other things, offers an
extremely useful free on-line HTML validation service, enabling HTML authors to quickly
check for any error or lack of conformity with whatever standard is being claimed.
Incidentally, the W3C tries to set and maintain software standards in relation to a
variety of trinkets, such as mobile 'phones and so on.
The great pioneer of the Web is the British scientist Tim Berners-Lee, who came up with,
amongst other things, the HyperText idea while at the Particle Physics Laboratory CERN,
Geneva. He is a W3C director.
OK, enough waffle. Let's see some HTML.
A skeletal Web Page
The following is the general shape of a properly written Web Page document:
[One or two lines telling the Browser what HTML or XHTML standard we are
adopting so that it knows how to interpret our instructions in our document.
We'll come to these lines shortly.]
<html>
<head>
<title>A skeletal Web Page</title>
</head>
<body>
[A sequence of lines of HTML Tags telling the Browser what to display.]
</body>
</html>
and that's all there is to it!
Well, OK, maybe a few bits are missing - such as any content within the <body>
whatsoever - but you get the overall structure. One or two comments are in order.
First, notice that the encompassing <html> Tag is opened at the top and closed
with </html> at the bottom. The <html> and </html>
obviously form a kind of outer pair for everything else which identifies the part of the
document containing the HTML Tags.
Next, observe that, within the <html>...</html> pair there are two
other main Tag pairs: the <head>...</head> pair and the
<body>...</body> pair.
The <head>...</head> pair contains descriptive information about the
document, such as its <title>...</title> which will appear at the top
of the Page when it is displayed.
The <body>...</body> pair obviously contains the real meat, the
document's displayable content.
Although there are about a hundred different Tags (in XHTML1.1), there's a core of about
twenty of them that are worth being familiar with. That's not too many, is it? You have
already met about half a dozen of them.
Provided you obey the appropriate set of HTML rules, most Browsers are not concerned with
the visual appearance of your document: a rat's nest that toes the HTML line is perfectly
acceptable to a Browser - but from the point of view of the human reader you would do
well to write prettily, in a clearly structured manner, for example using suitable
indentation where one Tag occurs within another - rather like the layers of an onion.
Programming - for that is what we are doing - is best regarded as an artistic
endeavour.
You can always see how the author of any Web Page on the Internet has done it: just click
on the word Source in the View menu heading at the top of the Page. It's a wonderfully
instructive revelation, sometimes - and, sometimes, a huge disappointment! It's also a
great way of pinching (sorry, sharing) other people's ideas; don't worry, everybody does
it on the Internet!
Finally, note that it is now standard to write Tag names in lowercase - not in UPPERCASE,
although many Browsers still allow this. (By the way, people who habitually communicate
over the Internet in UPPERCASE - the equivalent of SHOUTING - are regarded within the
Internet community as somewhat lacking in the social graces!)
A few practicalities
Lest we forget: we've seen a bit of what you do, but how do you do it, using what, and
with what result?
Perhaps the simplest tool for producing a Web Page is a Text editor. On a PC there are
two of these supplied free: WordPad, which is more or less OK for Web Page authoring
(though pretty grim for wordpro), and NotePad, which is pretty basic. I recommend the
former, at least to start with. Both can be found in the Accessories folder in the
Programs folder, which can be got at under the Start button.
I suppose I ought to mention that there are proprietary tools for designing and writing
Web Pages. They have two big disadvantages: you have to pay for them; and they usually
produce execrable HTML. (There's no law which states that someone writing a tutorial may
not have a dyspeptic opinion of something, is there?)
Having produced your wondrous work, say using WordPad, you need to save it in an
appropriately named file. Again, for PC users, this means giving it the correct
Extension, namely .htm or .html. (These extensions are a reflection of the fact that
Windows sits on top of DOS - which requires extensions - which sits on top of ... and so
on.) So, you might end up with a file called Masterpiece.htm or Masterpiece.html.
Finally - and very importantly - one Browser does not behave like another, believe it or
not. You would be amazed how differently they display a given Web Page. It is therefore
necessary to inflict your masterpiece on as many Browsers as you can lay your hands
on.
Document Types and other highly important rubbish
Let's quickly dispose of the one or two strange lines that come before the
<html> Tag.
We are talking about letting the Browser know what HTML dialect we are supposed to be
using: for example, if we are using strict HTML4.01, the first line of our document
should be
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Don't ask what it all means, just do it!
If we are using the strict XHTML1.1 dialect, then start the document with the two
lines
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
The first of these two tells the Browser to expect words typical of, say, the English (or
rather American) language: no funny Japanese or Russian words, for example.
Unfortunately, if you use XHTML1.1, the <html> Tag should be a bit more
complicated
<html xmlns="http://www.w3.org/1999/xhtml">
Again don't ask; just copy.
If you really must know more, see the subject of Document Type Definitions, DTDs, on the
W3C Web site. There you will also learn that an even bigger and better HTML dialect,
called XHTML2.0, is on the stocks, together with its attendant DTD paraphernalia.
Another Page turned in life
So a Web Page in strict HTM4.01 looks a bit like
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>A skeletal Web Page</title>
</head>
<body>
[A sequence of lines of HTML Tags telling the Browser what to display.]
</body>
</html>
and a Web Page in XHTML1.1 looks a bit like
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>A skeletal Web Page</title>
</head>
<body>
[A sequence of lines of HTML Tags telling the Browser what to display.]
</body>
</html>
So: already we have a dialectical dichotomy and, if you're confused by that, so you
should be and, what is more, prepare to be confounded; there are several more dialects,
none of which you need take any notice of whatsoever.
Notice, by the way, that we have sneaked in here our first example of a Tag which
includes an Attribute, so that, instead of a mere
<html>
we have the far more impressive
<html xmlns="http://www.w3.org/1999/xhtml">
where the xmlns is an Attribute and the equality sign announces the Attribute Value. An
Attribute modifies the meaning of a Tag. Attributes require Attribute Values that are set
with an equals sign and are enclosed within single or double quotes.
Gosh we are making progress!
Get a head
Whilst the <head>...</head> pair must contain a
<title>...</title> pair, several other Tags might also
appear,
but mostly don't.
While you are not looking, maybe this is a good moment to sneak in another bit of
confusing terminology: Element. Up to this point I have talked about pairs of Tags, such
as the <head>...</head> pair. The strictly correct term for a pair of
Tags is an Element. So we should really talk about the
<head>...</head> Element.
A fairly simple <head>...</head> Element might look like
<head>
<title>A fairly simple head</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<link rel="stylesheet" href="stylesheet.css" type="text/css" />
</head>
which gives us the excuse to talk, briefly, about the two new Elements: <meta
/> and <link />
Whatever they mean, the first thing to notice is that neither of them comes with a
closing friend - there is no such thing as </meta> or </link>. This
is because both these Tags are so-called Empty Elements. Empty Elements do not enclose
content and thus need no close Tags. In fact, as you can see, they contain their own
closure: the space before the / character is not a misprint; it is required by the rules
of XHTML and should probably be got used to in other HTML dialects.
Perhaps the simplest example of an Empty Element is <br /> which, as you might
guess, is used to insert a line Break on the Web Page.
Before we sketch their meanings, note that both <meta> and <link>
have Attributes, just as we saw with the XHTML version of <head>. Note also
that, whereas the title Element caused something visible to happen - namely, a title to
appear at the top of the displayed Page - the effects of these two Elements are not
directly visible: they work behind the scenery, so to speak.
The <meta> (short for meta-information) Element has a number of functions. A
common use is to enable so-called indexing tools, such as the Google search engine, to
quickly identify information about the Web Page and, indeed, about the Web site
containing that Page - assuming that you want such publicity. Another common use is for
so-called Client-Pull Page loading, enabling a document to automatically load another
document after a specified delay (say 10 seconds). The above example illustrates its use
in telling your Browser (at the bits and bytes level) that you are using a particular set
of characters, namely those set out in the ISO-8859-1 definition (basically, the usual
set for writing in English).
The <link> Element specifies relationships between the current document and
other documents. In modern HTML authoring this frequently means linking the document to a
Style Sheet. This is a whole topic in itself, rather exotically called Cascading Style
Sheets (CSS), which we can only touch upon in a skimpy account of the subject. The idea
of CSS is to separate matters of style (such as, for example, which fonts to be used or
whether some of the Text should be in red or
blue) from those of document structure. A Style
Sheet, linked to by means of the <link> Element, will contain the appropriate
instructions to effect this.
That's enough on the head of the Web Page beast; now for it's Body.
Vile body
Arguably the basic ingredient of a Web Page is Text; after all, the T of HTML does stand
for Text and, not surprisingly, there are more HTML Tags dealing with the processing of
Text than anything else. For example, Text can appear in a variety of sizes, of use in
the creation of Headings; it can be laid out in Paragraphs; it can be subject to Breaks;
it can be italicised or emboldened; it can even come in a variety
of colours.
To illustrate, the Headings Tag comes in six levels <h1> to
<h6>
and has the following effects:
<h1>Heading level 1</h1>
displays as Heading level 1
namely, quite large, whereas
<h6>Heading level 6</h6>
shows up as Heading level 6
quite a bit smaller.
You might have thought the original designers of HTML would have chosen the smaller
number for the smaller size - but no, they didn't think of that and so we are stuck with
it the wrong way round.
The effect of a Break in a line is to cause the line to <br />
drop to the line below.
<pre>
It's a slight oddity that Browsers ignore multiple spaces,
returns,
tabs and other formatting characters.
So, if you want the Text to appear, Preformatted, more or less
as you typed it,
then the <pre>...</pre> Element is what you need.
</pre>
<p>
It's sometimes nice to group one or more parts of your Web Page
by use of the Paragraph Tag <p> and this has the effect of causing the Browser
to insert a blank line before and after each group.
</p>
There are several more Text Markup Elements (or, in XHTML1.1-speak, the Text Module), but
that's about enough for a lazy introduction, such as this, so we'll just mention one last
fun Element: acronym. It's perhaps best illustrated by holding the cursor over the
following set of letters for a few moments: XML.
How is that done? Like this:
<acronym title="eXtensible Markup Language">XML</acronym>
in which you will also notice another example of an Attribute (namely, title), together
with its Attribute Value (in quotes).
Listmania
People love making lists; HTML is list heaven for such people because there are so many
tricks one can play. Here we shall content ourselves with just two examples. HTML lists
come in two basic forms: Unordered, using the <ul> Tag, and (you've guessed it)
Ordered, using <ol>. Each comprises a set of List Items with Tag
<li>.
An example of an Unordered List: the HTML sequence
<ul>
<li>This Unordered List Item</li>
<li>That Unordered List Item</li>
<li>The other Unordered List Item</li>
</ul>
displays as
- This Unordered List Item
- That Unordered List Item
- The other Unordered List Item
And an example of an Ordered List: the HTML sequence
<ol>
<li>The first Ordered List Item</li>
<li>The second Ordered List Item</li>
<li>The third Ordered List Item</li>
</ol>
gives
- The first Ordered List Item
- The second Ordered List Item
- The third Ordered List Item
Turning the tables
It would be difficult to exaggerate the importance of the <table> Tag, together
with its associated Tags, three of which we touch upon here: not only does it enable the
obvious function of laying out a simple set of data; it actually provides a convenient
structure within which to design entire Web Pages. In this sketch of the subject we can
only illustrate the former and merely mention the latter.
A <table>...</table> Element contains a set of <tr>
Rows.
Within a <tr>...</tr> Element there is a set of items of
<td> Data.
An item of Data looks something like this <td>...</td> Element.
It's sometimes nice to grace each column with a table Heading Element:
<th>...</th>
A simple table:
<table border="0">
<tr>
<th> </th>
<th> Column1 </th>
<th> Column2 </th>
</tr>
<tr>
<td> Row1 </td>
<td> Datum11 </td>
<td> Datum12 </td>
</tr>
<tr>
<td> Row2 </td>
<td> Datum21 </td>
<td> Datum22 </td>
</tr>
</table>
displays as
| |
Column1 |
Column2 |
| Row1 |
Datum11 |
Datum12 |
| Row2 |
Datum21 |
Datum22 |
Only moderately exciting - but it's progress. It's probably best to leave the reader to
figure out what is happening by comparing input with displayed output. The only
non-obvious item is the occurrence of the rather strange in the third line,
standing for Non-Breaking space, which is always used to fill in a 'vacant' cell in a
table, namely the one in the top-left corner.
A slightly different simple table:
<table border="1">
<tr>
<td rowspan="3"> TallDatum </td>
<td> Datum11 </td>
<td> Datum12 </td>
<td> Datum13 </td>
</tr>
<tr>
<td> Datum21 </td>
<td> Datum22 </td>
<td> Datum23 </td>
</tr>
<tr>
<td> Datum31 </td>
<td> Datum32 </td>
<td> Datum33 </td>
</tr>
</table>
displays as
| TallDatum |
Datum11 |
Datum12 |
Datum13 |
| Datum21 |
Datum22 |
Datum23 |
| Datum31 |
Datum32 |
Datum33 |
You probably get the general idea of the <table>...</table> Element.
Try and imagine this greatly expanded up to lay out a whole Web Page.
There is another popular way of laying out Web Pages which you should, at least, have
heard of: it's called a Frame. In this method, several separate partial Pages are
combined to produce one complete Web Page.
Pulling up the Anchor and jumping ship
Now for the big one: this is what makes the Internet the Internet. No doubt much to the
relief of the reader, it's the last bit of HTML we shall expand upon: the famous Anchor
Tag <a>. This Tag, when used in the form of the <a>...</a>
Element indicates the portion of the Web Page that is a HyperLink and names the target
destination for that HyperLink. This is what it looks like:
<a href="http://www.w3.org/TR/xhtml11/doctype.html">W3C XHTML1.1 Tags list</a>
and this is what you see on your Web Page:
W3C XHTML1.1 Tags list
and, assuming that the target destination actually exists, you will be transported there
in a trice or two.
You can also use the Anchor Tag to leap about in the current Web Page:
Back to Top
Java jive
As we have seen, HTML (or XHTML) is a fairly crude Language which ultimately grew out of
the primitive Markup idea. But HTML is still fairly limited. It is possible to
considerably beef it up by importing components from other computer Languages. It would
not be appropriate to detail these here, but mention might be made of JavaScript, Java,
Flash, Perl, VisualBasic, etc.
For example, it's fairly easy to implement a simple graph-drawing capability within a Web
Page using JavaScript. This would be an example of the so-called Client-Side computing;
namely an embedded program running on your own computer.
By contrast there is Server-Side computing: a program running on the remote computer on
which your Web Site resides. For example, a Perl program, running on the remote computer,
might count how many hits (visitors) each Page of your Web Site experiences and compile
appropriate statistics.
Infamous last words
To describe - as we have done - the foregoing account as an HTML tutorial is a gross
abuse of language. Basically, we have just walked by on the other side of the road. The
aim has really just been to familiarise the reader with a few relevant words from the
subject and to provide a bit of background to those words. The rest is up to the
reader.
One suggestion: there are loads of free (proper) tutorials out there on the Internet; try
Googling.
It's quite hard to quote any sort of appropriate reference: the books on the subject are
either too fat and expensive or too thin and superficial. The fat, expensive one many
professionals use is called:
HTML and XHTML The Complete Reference, by Thomas A Powell, published by
Osborne/McGraw-Hill
I suggest you persuade your local library to buy the latest edition of it!
Back to Top
Return to Links