Web pages: HTML

HyperText Markup Language, HTML, is used to describe a document. While Javascript has keywords and operators for declaring variables, initializing loops, and computing values, HTML only has keywords used to describe things like paragraphs, section headings, tables, and lists. You can’t compute with HTML, but you can describe web pages. Later, you can mix in Javascript to make those pages more active and interactive.

Semantics vs. style

If you write a document in Word, you can describe the typesetting style of a document directly. For example, you can select some words, set the font size to 134, make the font bold, and change the font color to purple. You can do that for the heading text for each section of the document, if you want.

The problem with setting styles in this way is that it is hard to maintain a uniform style across documents, or to change styles quickly as a group. In Word, if I later decide that the headings of your document should be orange and flashing in a gothic font, I’ll need to go through the entire document and select each piece of heading text, and then change the font size and style. And what if I miss one? That would be ugly.

HTML only lets you specify the intended use, not the style, of a section of text. You can specify that some text should be a header. When the browser displays that text, it will display that text using the style for headers. You can specify that style using a different language, CSS. By separating meaning (or semantics) and style, it is possible to display documents using consistent styles, and change those styles in nice simple ways.

Heading and paragraph tags

Tags, written using angle brackets, specify the intended use of text. Click on the ► button to see how the HTML example below is displayed:

The example HTML uses <h1> to mark a section header and <p> to mark a paragraph. Tags almost always come in pairs: a start tag like <h1>, and an ending tag with the same name following a forward slash, like </h1>.

It might seem annoying to have to start and end each paragraph using a paragraph tag, but the advantage is that the HTML parser (the program that reads and interprets the HTML) knows with perfect certainty that you intended the text within the paragraph markers to be a paragraph. Perhaps you prefer to typeset paragraphs with some extra vertical space between them (as is the current setting on this page and in the display of the Constitution), or perhaps you are publishing a book and paragraphs should not be separated. By marking paragraphs with tags, you can make the styling decision later.

You may have noticed the 1 in the <h1> tag. Section headings may be of different levels of importance. The top level h1 tag specifies that this heading is of primary importance: perhaps it is the title of the document, or the title of a chapter. An h2 tag would specify a second-level heading. For example, a document might have sections with h2 headings. And those sections might have subsections with h3 headings. In general, it is considered good practice to not skip headings on the way down: if you used an h1 for a previous heading, you would typically not skip down to h3 directly for a subsection, but would use an h2 next.

Exercise: headers

Objective: use header and paragraph tags in an .html document.

Here’s a transcription of the constitution, in case you have not yet memorized it. Write HTML code including the title (The Constitution…), up through the first two paragraphs of Section 2.

Here is a solution.

Nested HTML elements

In the Constitution example above, each starting tag was followed by some text and then directly by its closing tag.

<p> We the People of the United States, in Order...</p>

What if we wanted to make We the People of the United States stand out, since in the original document, it’s written larger? We could use the <strong> tag to indicate that this text should be strongly emphasized when typeset, perhaps by making the text bold, or purple, or flashing orange.

Notice that We the People... is still part of the first paragraph, so it should be inside the paragraph tags, but since it is strong, it should be inside strong tags. The pattern is <p> <strong> some text </strong> <p>. We say that the strong tags are nested inside the paragraph tags. Tags can be nested as deeply as needed, but some nestings don’t make sense; for example, you cannot nest a paragraph inside a paragraph.

Complete HTML documents

The pieces of HTML code we have seen so far are really just fragments of web pages. There is other information that the browser needs in order to present a web page correctly, and that search engines need to correctly index the page. Here is a complete example of a simple web page; when writing a new web page, you are welcome to copy-and-paste this text as your starting point. (You do not need to attribute the author, since this is a generic, standard structure, containing no intellectual contribution.)

Let’s look at the structure of the document. There are many types of documents on the web; it’s easiest for a browser to identify a document as an HTML page if the first line of the HTML document declares the document type. So, HTML documents should start with the line <!DOCTYPE html>. This is not really a tag, and so it appears without a paired closing tag.

<!DOCTYPE html> is not really part of the HTML language, but the rest of the document is HTML. Traditionally, the HTML part of the document is enclosed in a <html> </html> tag pair. Scroll down in the example to find the closing </html> tag.

Indentation is used to show nested tags. In the example, everything inside the <html> </html> tag pair is indented.

Complete HTML documents are separated into two sections, the header and the body, both of which are within the HTML of the document. Information for the browser is placed into the header and not displayed; the body contains the material to be shown on screen.

The document should have a title that will be displayed in the browser tab, or used as the name when you bookmark the page. The title of this document that you are reading is . You can set the title of the document using the <title></title> tag.

The title is not displayed anywhere on the page itself, but is just used internally by the browser. Therefore, it occurs in the header section of the HTML document.

Do not confuse the words heading and header. The header is the first section of the HTML document where information for the browser is placed; headings, like those marked by the <h1></h1> tag pair, are used to put titles on parts of the content text.

HTML validation

Even if you make a mistake in your HTML, for example, by forgetting to add <!DOCTYPE html>, most browsers will take their best guess and display your page anyway. This is problematic, since different browsers might do different things, giving different users different experiences. It’s good practice to use an HTML validator to verify that your HTML code is legitimate. Try the validator now on a few favorite web pages, including this one. As of this writing, www.bloomberg.com has several warnings and errors. Oops.

Lists

The <ul> tag creates an unordered list of items, which might be displayed as a bulletted list. Each new item is described by a <li></li> tag pair. Here’s an example.

Because the list items are contained within the list, the items are indented. This is not a requirement, but is good stylistic practice – it makes the document easier for the original author or later maintainers of the web page to read.

An ordered list is typically used to display a list with numbers or letters preceding the items. Ordered lists are indicated using <ol></ol> tags; items are marked using <li></li> tags. Change the list in the above example into an ordered list; don’t forget to change the closing tag.

To write an outline, you can nest ordered lists inside of other lists:

This outline includes only arabic numerals, rather than letters or Roman numerals; a quick web search will tell you how to make this outline look more like a traditional outline, but we will not go into the details.

Exercise: tables

Objective: Modify HTML code.

Tables, denoted by <table></table> tag pairs, contain rows, marked by <tr></tr> tags. Each row contains table data entries, each marked by <td></td> tag pairs. Here is an example:

When you display the table, you’ll notice that the headers for the columns (dates) and rows (stock symbols) are not displayed any differently than any other table data items. Row and column headers should be marked with the <th> tag rather than the <td> tag. Fix the table now.

Comments

The first line of the last exercise starts with . Be careful: anyone can view the HTML source of a webpage that you put on the web, so these comments will be viewable to anyone who wants to see them, even though they do not show up on the browser’s standard display of the web page.

Images

Tags in HTML surround text, whether that text is a paragraph, a heading, or an item in a list. But HTML documents also include links to other documents, or to images for display. The <img> tag is used to indicate that an image should be displayed. The <img> tag is not followed by a closing </img> tag.

To specify where the image that is to be displayed should be found, the <img> tag includes an an attribute, src. The attribute is assigned a string value (using quotation marks) that indicates the URL where the image is located. The URL may be the complete web address of the image, as in the example, or it might be simply the name of the file, if the file is located in the same directory as the current web page.

The example <img> tag includes an optional attribute, style, which can be used to set the width and height to which the image should be scaled. (The original image of the Earth is 2048x2048 pixels, which is too large to display in this little window.)

The image file is not included in the .html file in any way. If you reference an image in the current directory, and then move the html file somewhere else, the image will not follow the .html file, and no image will be displayed.

The web is always changing. If you use a complete image URL, and the image at that URL is no longer accessible at some point, no image will be displayed. To avoid this potential problem, it is often better to copy the image file into your directory and link to the new, local version.

There are tradeoffs when considering copying resources like images into your own directory, vs. linking to a remote resource from some other location. You may be charged for bandwidth (network transfer costs) for resources that you keep locally.

There are also copyright considerations. Do not, under any circumstances, assume that any particular image found on the web can be used by your web site, with or without attribution.

By default, the creator of an image retains copyright, and you cannot use that image on your site, whether included locally or linked remotely with an img tag. However, some images might have licenses that permit you specific rights. For example, the Creative Commons Attribution 4.0 license that many images on Wikipedia use may allow others to use images, as long as certain rules for attribution (citation) are followed carefully.

Links

Links to other documents have two primary components: some link text (the words displayed in your browser), specified as the text between <a> and </a> tags, and the URL of the page you want to link to, specified using the src attribute of the <a> tag:

The link in this example is nested inside a paragraph, but you might imagine nesting them inside of list items, to create an unordered list of links, or inside of table data items, to create a table of links.

You can nest an image inside a link to use the image as a link.

Exercise: Space

Objective: Combine image and link tags to make an image link.

Use the image of the Earth from the previous example as a link to the wikipedia article on “space” (https://en.wikipedia.org/wiki/Solar_System):

Here is a solution.

iframes

You can embed a web page within a web page, using the <iframe> tag. Why would you ever want to do this? The small HTML coding windows on this very page are one example: the HTML text is written in the editor on the left (a fancy textarea), but an iframe is used to display the rendered output on the right.
Another reason to use an iframe is because a particular website provides some complex service, and you want to make use of that service in your web page. Embedding youtube videos in your web page is an example of this: you’d like to display a fancy video player, and youtube provides some HTML code to embed in your web page to make it happen.

I found the following HTML code by finding a video on youtube, clicking on share and then embed.

<iframe width="560" height="315" src="https://www.youtube.com/embed/ZCBE8ocOkAQ?rel=0" frameborder="0" allowfullscreen></iframe>

In the HTML code above, the text ?rel=0 is my own addition, and requests that youtube not play any related movies after the Falcon lands. This is an example of a GET request, a way of passing information to the server (the remote computer that sends the webpage across the network to your computer) as part of the request for the webpage. GET requests will be discussed in more detail in a later chapter.

iframes have a benefit for the website providing the resource: the content of that website is displayed in the manner the provider prefers. We displayed a youtube video through an iframe view of the youtube website; if youtube.com suddenly decides to play an ad before or after that video, it can do so. Or youtube could decide to ignore the ?rel=0 hint, and play related videos. youtube.com also gets information about visitors to the site, even if those visits happen through iframes. You’ve just visited the youtube.com site!

There are some restrictions on iframes. For example, a website provided over a secure encrypted connection (such as this one), cannot include an iframe reference to an unsecure website, since a website with unsecure content cannot itself be secure. You can tell if a website is provided through an encrypted connection if the address begins with https:// rather than http://.

The decision of who owns and controls the content of your website is an important one. For example, if you use an <img> tag to embed a remotely-provided image, the owner of that image can change what image appears at that web address, or delete the image entirely, breaking your site. It might be better to download the image and store it locally, and link to the local copy. Similarly, iframes should be used very sparingly, and link only to trusted providers.

Security and encryption

When you type an address into a web browser, or follow a link from a web page, your computer requests that some remove computer send you the content of a document. Let’s call your computer the client and the remote computer the server. Your reqest may pass through several intermediate computers on its way to the server, and the response from the server may pass through several intermediate computers on its way back to the client.

Depending on the type of information sent, this could be a problem. If you send your credit card information to Amazon, but on the way, it passes through some other computer controlled by evil people, you might find yourself with a large credit card bill for things you did not order. If you have permission to download some secret designs for a new invention, and do so, those designs might pass through the hands of anybody who has control of any computer between the server and your machine, the client.

Or maybe I build a web page that looks like Amazon’s, place myself in between some people and Amazon (not hard to do; most computers on the internet are on some route to Amazon), and collect personal information by impersonating Amazon: a spoofing attack.

Encyrption is used to encode information at the source before sending it to the destination, so that only the intended computer can decode and understand the information.

Creating an encrypted connection requires some additional computational work, and so not all websites are encrypted. You can tell the difference based on the address: sites that start with the prefix https:// are provided through an encrypted connection, and sites that start with the prefix http:// are not. As a rule of thumb, it’s wise to enable encryption for commercial websites. For example, amazon.com, and the page where I enter my credit card information, are provided over an encrypted channel. In addition to the web address, you can verify encryption by looking for a lock icon in your browser’s address bar.