HyperText Markup Language, HTML, is used to describe a document. While Javascript has keywords and operators for declaring variables, initializing loops, and computing values, HTML only has keywords used to describe things like paragraphs, section headings, tables, and lists. You can’t compute with HTML, but you can describe web pages. Later, you can mix in Javascript to make those pages more active and interactive.
If you write a document in Word, you can describe the typesetting style of a document directly. For example, you can select some words, set the font size to 134, make the font bold, and change the font color to purple. You can do that for the heading text for each section of the document, if you want.
The problem with setting styles in this way is that it is hard to maintain a uniform style across documents, or to change styles quickly as a group. In Word, if I later decide that the headings of your document should be orange and flashing in a gothic font, I’ll need to go through the entire document and select each piece of heading text, and then change the font size and style. And what if I miss one? That would be ugly.
HTML only lets you specify the intended use, not the style, of a section of text. You can specify that some text should be a header. When the browser displays that text, it will display that text using the style for headers. You can specify that style using a different language, CSS. By separating meaning (or semantics) and style, it is possible to display documents using consistent styles, and change those styles in nice simple ways.
Tags, written using angle brackets, specify the intended use of text. Click on the ► button to see how the HTML example below is displayed:
The example HTML uses <h1>
to mark a section
header and <p>
to mark a paragraph. Tags
almost always come in pairs: a start tag like
<h1>
, and an ending tag with the same name following
a forward slash, like </h1>
.
It might seem annoying to have to start and end each paragraph using a paragraph tag, but the advantage is that the HTML parser (the program that reads and interprets the HTML) knows with perfect certainty that you intended the text within the paragraph markers to be a paragraph. Perhaps you prefer to typeset paragraphs with some extra vertical space between them (as is the current setting on this page and in the display of the Constitution), or perhaps you are publishing a book and paragraphs should not be separated. By marking paragraphs with tags, you can make the styling decision later.
You may have noticed the 1
in the
<h1>
tag. Section headings may be of different levels
of importance. The top level h1
tag specifies that this
heading is of primary importance: perhaps it is the title of the
document, or the title of a chapter. An h2
tag would
specify a second-level heading. For example, a document might have
sections with h2
headings. And those sections might have
subsections with h3
headings. In general, it is considered
good practice to not skip headings on the way down: if you used an
h1
for a previous heading, you would typically not skip
down to h3
directly for a subsection, but would use an
h2
next.
Objective: use header and paragraph tags in an .html document.
Here’s a transcription of the constitution, in case you have not yet memorized it. Write HTML code including the title (The Constitution…), up through the first two paragraphs of Section 2.
Here is a solution.
In the Constitution example above, each starting tag was followed by some text and then directly by its closing tag.
<p> We the People of the United States, in Order...</p>
What if we wanted to make
We the People of the United States
stand out, since in the
original document, it’s written larger? We could use the
<strong>
tag to indicate that this text should be
strongly emphasized when typeset, perhaps by making the text bold, or
purple, or flashing orange.
Notice that We the People...
is still part of the first
paragraph, so it should be inside the paragraph tags, but since it is
strong, it should be inside strong tags. The pattern is
<p> <strong> some text </strong> <p>
.
We say that the strong
tags are nested
inside the paragraph tags. Tags can be nested as deeply as needed, but
some nestings don’t make sense; for example, you cannot nest a paragraph
inside a paragraph.
The pieces of HTML code we have seen so far are really just fragments of web pages. There is other information that the browser needs in order to present a web page correctly, and that search engines need to correctly index the page. Here is a complete example of a simple web page; when writing a new web page, you are welcome to copy-and-paste this text as your starting point. (You do not need to attribute the author, since this is a generic, standard structure, containing no intellectual contribution.)
Let’s look at the structure of the document. There are many types of
documents on the web; it’s easiest for a browser to identify a document
as an HTML page if the first line of the HTML document declares the
document type. So, HTML documents should start with the line
<!DOCTYPE html>
. This is not really a tag, and so it
appears without a paired closing tag.
<!DOCTYPE html>
is not really part of the HTML
language, but the rest of the document is HTML. Traditionally, the HTML
part of the document is enclosed in a
<html> </html>
tag pair. Scroll down in the
example to find the closing </html>
tag.
Indentation is used to show nested tags. In the
example, everything inside the <html> </html>
tag pair is indented.
Complete HTML documents are separated into two sections, the header and the body, both of which are within the HTML of the document. Information for the browser is placed into the header and not displayed; the body contains the material to be shown on screen.
The document should have a title that will be displayed in the
browser tab, or used as the name when you bookmark the page. The title
of this document that you are reading is .
You can set the title of the document using the
<title></title>
tag.
The title is not displayed anywhere on the page itself, but is just used internally by the browser. Therefore, it occurs in the header section of the HTML document.
Do not confuse the words heading and header. The
header is the first section of the HTML document where information for
the browser is placed; headings, like those marked by the
<h1></h1>
tag pair, are used to put titles on
parts of the content text.
Even if you make a mistake in your HTML, for example, by forgetting
to add <!DOCTYPE html>
, most browsers will take their
best guess and display your page anyway. This is problematic, since
different browsers might do different things, giving different users
different experiences. It’s good practice to use an HTML validator to verify that your
HTML code is legitimate. Try the validator now on a few favorite web
pages, including this one. As of this writing,
www.bloomberg.com
has several warnings
and errors. Oops.
The <ul>
tag creates an unordered list of
items, which might be displayed as a bulletted list. Each new item is
described by a <li></li>
tag pair. Here’s an
example.
Because the list items are contained within the list, the items are indented. This is not a requirement, but is good stylistic practice – it makes the document easier for the original author or later maintainers of the web page to read.
An ordered list is typically used to display a list with
numbers or letters preceding the items. Ordered lists are indicated
using <ol></ol>
tags; items are marked using
<li></li>
tags. Change the list in the above
example into an ordered list; don’t forget to change the closing
tag.
To write an outline, you can nest ordered lists inside of other lists:
This outline includes only arabic numerals, rather than letters or Roman numerals; a quick web search will tell you how to make this outline look more like a traditional outline, but we will not go into the details.
Objective: Modify HTML code.
Tables, denoted by <table></table>
tag
pairs, contain rows, marked by <tr></tr>
tags.
Each row contains table data entries, each marked by
<td></td>
tag pairs. Here is an example:
When you display the table, you’ll notice that the headers for the
columns (dates) and rows (stock symbols) are not displayed any
differently than any other table data items. Row and column headers
should be marked with the <th>
tag rather than the
<td>
tag. Fix the table now.
The first line of the last exercise starts with <!--
.
This is not really a tag, but rather, indicates the start of of a
comment that will not ever be displayed. Comments are notes for the
human reader. Comments begin with <!--
and end with
-->
. Be careful: anyone can view the HTML source of a
webpage that you put on the web, so these comments will be viewable to
anyone who wants to see them, even though they do not show up on the
browser’s standard display of the web page.
Tags in HTML surround text, whether that text is a
paragraph, a heading, or an item in a list. But HTML documents also
include links to other documents, or to images for display. The
<img>
tag is used to indicate that an image should be
displayed. The <img>
tag is not followed by a
closing </img>
tag.
To specify where the image that is to be displayed should be found,
the <img>
tag includes an an
attribute, src
. The attribute is assigned a string
value (using quotation marks) that indicates the URL where the image is
located. The URL may be the complete web address of the image, as in the
example, or it might be simply the name of the file, if the file is
located in the same directory as the current web page.
The example <img>
tag includes an optional
attribute, style
, which can be used to set the width and
height to which the image should be scaled. (The original image of the
Earth is 2048x2048 pixels, which is too large to display in this little
window.)
The image file is not included in the
.html
file in any way. If you reference an image in the
current directory, and then move the html file somewhere else, the image
will not follow the .html
file, and no image will be
displayed.
The web is always changing. If you use a complete image URL, and the image at that URL is no longer accessible at some point, no image will be displayed. To avoid this potential problem, it is often better to copy the image file into your directory and link to the new, local version.
There are tradeoffs when considering copying resources like images into your own directory, vs. linking to a remote resource from some other location. You may be charged for bandwidth (network transfer costs) for resources that you keep locally.
There are also copyright considerations. Do not, under any circumstances, assume that any particular image found on the web can be used by your web site, with or without attribution.
By default, the creator of an image retains copyright, and you cannot
use that image on your site, whether included locally or linked remotely
with an img
tag. However, some images might have licenses
that permit you specific rights. For example, the Creative Commons Attribution 4.0
license that many images on Wikipedia use may allow others to use
images, as long as certain rules for attribution (citation) are followed
carefully.
Links to other documents have two primary components: some link text
(the words displayed in your browser), specified as the text between
<a>
and </a>
tags, and the URL of
the page you want to link to, specified using the src
attribute of the <a>
tag:
The link in this example is nested inside a paragraph, but you might imagine nesting them inside of list items, to create an unordered list of links, or inside of table data items, to create a table of links.
You can nest an image inside a link to use the image as a link.
Objective: Combine image and link tags to make an image link.
Use the image of the Earth from the previous example as a link to the
wikipedia article on “space”
(https://en.wikipedia.org/wiki/Solar_System
):
Here is a solution.
You can embed a web page within a web page, using the
<iframe>
tag. Why would you ever want to do this? The
small HTML coding windows on this very page are one example: the HTML
text is written in the editor on the left (a fancy textarea), but an
iframe is used to display the rendered output on the right.
Another reason to use an iframe is because a particular website provides
some complex service, and you want to make use of that service in your
web page. Embedding youtube videos in your web page is an example of
this: you’d like to display a fancy video player, and youtube provides
some HTML code to embed in your web page to make it happen.
I found the following HTML code by finding a video on youtube, clicking on share and then embed.
<iframe width="560" height="315" src="https://www.youtube.com/embed/ZCBE8ocOkAQ?rel=0" frameborder="0" allowfullscreen></iframe>
In the HTML code above, the text ?rel=0
is my own
addition, and requests that youtube not play any related movies after
the Falcon lands. This is an example of a GET request,
a way of passing information to the server (the remote computer that
sends the webpage across the network to your computer) as part of the
request for the webpage. GET requests will be discussed in more detail
in a later chapter.
iframes have a benefit for the website providing the resource: the
content of that website is displayed in the manner the provider prefers.
We displayed a youtube video through an iframe view of the youtube
website; if youtube.com
suddenly decides to play an ad
before or after that video, it can do so. Or youtube could decide to
ignore the ?rel=0
hint, and play related videos.
youtube.com
also gets information about visitors to the
site, even if those visits happen through iframes. You’ve just visited
the youtube.com site!
There are some restrictions on iframes. For example, a website
provided over a secure encrypted connection (such as this one), cannot
include an iframe reference to an unsecure website, since a website with
unsecure content cannot itself be secure. You can tell if a website is
provided through an encrypted connection if the address begins with
https://
rather than http://
.
The decision of who owns and controls the content of your website is
an important one. For example, if you use an <img>
tag to embed a remotely-provided image, the owner of that image can
change what image appears at that web address, or delete the image
entirely, breaking your site. It might be better to download the image
and store it locally, and link to the local copy. Similarly, iframes
should be used very sparingly, and link only to trusted
providers.
When you type an address into a web browser, or follow a link from a web page, your computer requests that some remove computer send you the content of a document. Let’s call your computer the client and the remote computer the server. Your reqest may pass through several intermediate computers on its way to the server, and the response from the server may pass through several intermediate computers on its way back to the client.
Depending on the type of information sent, this could be a problem. If you send your credit card information to Amazon, but on the way, it passes through some other computer controlled by evil people, you might find yourself with a large credit card bill for things you did not order. If you have permission to download some secret designs for a new invention, and do so, those designs might pass through the hands of anybody who has control of any computer between the server and your machine, the client.
Or maybe I build a web page that looks like Amazon’s, place myself in between some people and Amazon (not hard to do; most computers on the internet are on some route to Amazon), and collect personal information by impersonating Amazon: a spoofing attack.
Encyrption is used to encode information at the source before sending it to the destination, so that only the intended computer can decode and understand the information.
Creating an encrypted connection requires some additional
computational work, and so not all websites are encrypted. You can tell
the difference based on the address: sites that start with the prefix
https://
are provided through an encrypted connection, and
sites that start with the prefix http://
are not. As a rule
of thumb, it’s wise to enable encryption for commercial websites. For
example, amazon.com
, and the page where I enter my credit
card information, are provided over an encrypted channel. In addition to
the web address, you can verify encryption by looking for a lock icon in
your browser’s address bar.