Dave's Writing Guidelines
Over the years I have collected a few thoughts on the preparation of research
papers, and I list them here for easy reference.
--David Kotz
I usually mark paper drafts by hand, using red and blue ink; see this
pdf for an example of those
handwritten marks. Here are some things that I commonly mark on
student papers. I mark them by writing the hashtag in the
margin. I tend to mark only the first one or two occurrences of
any given mistake in any given paper. I don't try to explain all
of them fully here; see some of the books
below for specific advice about English usage. Day's books
are particularly good for learning the conventions used in scientific
writing.
- #AC (define acronyms)
- Always define acronyms at their first use.
- #BW (B&W - black and white)
- Ensure diagrams and graphs are visible when printed on B&W
printers — many readers (and reviewers!) still
print the papers they read, and many do not routinely use a
color printer. Check out this
interactive color exploration tool and note the
checkboxes for "colorblind friendly" and "print friendly"
modes.
- #CA (compound adjectives)
- Hyphenate compound words that are used as an
adjective. For example, in "open-book exam", the
phrase "open book" is used as an adjective for the noun
"exam." These hyphens are used much like parentheses;
"open-book exam" is read like "((open book) exam), whereas
"open book exam" is read "(open (book exam))", not the
likely meaning. Exception: no hyphen after a word
ending in ly.
- #CN (citations are not nouns)
- Citation references are not nouns; they are parenthetical
remarks. Thus, it is not correct to write
"In [13] they show that P=NP." You should write "A
seminal paper proved P=NP [13]." Or, "Jones and
Smith show that P=NP [13]." The reference can go
at the end of the sentence (my preference) or at the end of
the relevant phrase (sometimes better when multiple refs
are cited in the same sentence). See also #ET.
- #CO (commas)
- Commas can be tricky. Here is an interesting
article
about some subtle comma issues. See also #OX.
- #CQ (curly quotes)
- In LaTeX, quotation marks are written in pairs using
backticks (to open quote) and forward ticks (to close quote),
`like this' for single quotes and ``like
this'' for double quotes. The double-quote character
(") should not be used in LaTeX source, as it always
formats as a right-curly double quote.
- #CX (capitalize cross-references)
- Capitalize the word "Figure" when referring to a
specific figure, such as "see Figure 4." Same with
Section, Table, Equation, and so forth. Do not
abbreviate these words.
- #D - dashes, em-dash and en-dash
- The em-dash is a long horizontal line — like
this — used to set off a remark from the rest of
the sentence. There are two common approaches to
formatting an em-dash — with and without
spaces. In LaTeX, I prefer to use an en-dash,
which is slightly shorter than an em-dash, and also to
place a non-breaking space prior to the en-dash, and a
breaking space after the en-dash, like this~--
note the tilde, the double dash, and the space. (The
other method would be like this---note the triple
dash, and no space before or after the triple dash.)
- #EG (e.g. and etc.)
- The abbreviation "e.g." replaces "for example," and belongs
at the beginning of a list of examples. The abbreviation "etc."
replaces "and so forth", and belongs at the end of a list of
examples. They are never used together, because each implies
that the list is just a set of examples, not a complete list of
all possibilities. To use both on the same list would thus be
redundant. Its meaning is different than "i.e."; see
#IE. It is always followed by a comma; see also #LA.
- #ET (et al.)
- The Latin phrase "et al." is often used to
abbreviate a long author list. Note that the first
word et is a word and the second al. is an
abbreviation, so only the latter has a period. Format
this phrase in LaTeX as "et~al." The tie
(tilde) prevents a line break between the words, which I
feel looks awkward. See also #LA, #TI.
- #FM (footnote mark)
- Place a footnote mark after punctuation, not before.
Like this.2 Not like this3.
Do not leave any white space (not even a newline)
between punctuation and the footnote command; otherwise
the footnote mark may be rendered on a separate line.
- #FR (floating references)
- A floating figure or table should always appear on the
same page, or a later page, than its first reference in the
text. LaTeX will arrange this placement properly as
long as you put the {figure} or {table} environment
after the first \ref to that float.
My practice is to put the environment immediately after the
end of the paragraph containing the first
\ref. The forward-reference is ok and is
resolved by the second pass of LaTeX.
- #H (however)
- "However" should (usually) not begin a sentence: rewrite
"However, I found that the red ball had been missing for
weeks" as "I found, however, that the red ball had been
missing for weeks". In that usage, note "however" is
surrounded by commas. It is ok for it to be at the
end of a sentence, however. Above, I say "usually"
because this is not a strict rule.
- #HH (text between headings)
- Always include some text between headings, even if only one
or two sentences; the goal is to provide an introduction to the
following subsection, or a transition from the prior section to
this section and the following subsection.
- #IE (i.e.)
- The abbreviation "i.e." replaces "that is," indicating an
explanation of the phrase before it. It is always followed by a
comma; see also #LA. Its meaning is different than "e.g."; see
#EG.
- #IN (Internet vs. internet)
- When writing about general interconnected computer networks,
call them 'internets' (not capitalized). When writing
about the specific public internet that is based on the IP
protocol, call it the 'Internet'. The Internet is an
instance of all internets. Unfortunately it is
common, though technically incorrect, to equate the
Internet and the World-Wide Web (which is also capitalized,
please note). The Web (or WWW) is a subset of the
Internet.
- #IT (its vs. it's)
- The word "its" is often confused with the contraction
"it's". The word "its" is the possessive form of "it", much
like "his", "hers", and "theirs" are possessive forms of "he",
"she", and "they", respectively; notice that all end in 's' but
none include an apostrophe. On the other hand, "it's" is a
contraction for "it is", and thus (see #NC) should never appear
in scientific writing.
- #LA (Latin abbreviations "i.e.", "e.g.", "etc.", "vs.")
- The abbreviations "i.e.", "e.g.", "etc.",
"vs.", are indeed abbreviations and thus should have
periods as shown. Of those, "i.e.", "e.g.", should
always be followed by a comma, as should "etc." when in the
middle of the sentence. (Why? because they replace
"that is", "for example", and "and so forth", which are
always delimited by commas.) At the end of a sentence
I prefer to use "and so forth" rather than "etc.."
See also #ET.
- #NC (no contractions)
- Do not use contractions in formal writing.
- #OC (Optional caption)
- In LaTeX, the \caption command is used within a table or
figure to provide the text for the caption.
That same text is used in the List of Tables and List of
Figures, in multi-chapter documents (like a thesis).
If the caption text is long, it will not display well in a
List of Tables/Figures. So add the optional argument,
e.g., "\caption[This is a short caption]{This is the full
caption that might go on for several sentences.}
LaTeX uses the optional argument in the List of Tables/Figures.
- #OX (Oxford comma)
- In a comma-separated list, use a comma after every item
except the last. For example: "Alice, Bob, and
Charlie are frequent collaborators in security
research." (You may often see lists skip the final
comma, i.e., "Alice, Bob and Charlie", but please avoid
that approach.) This final comma is the so-called serial
comma (or "Oxford comma"); although controversial, it
avoids any ambiguity,
so I advocate its use in every list. Clarity is
critical in scientific writing. See also #CO.
- #P (passive)
- Avoid the passive sentence structure. It obscures
the subject of the sentence, and leads to ambiguity.
For example, "The prototype was built" is a passive phrase
whereas "We built a prototype" is an active phrase.
The active structure is almost always shorter, and
clearer.
- #PN (page numbers)
- Please include page numbers in your document.
- #PR (preposition)
- Do not end a clause or sentence with a preposition
(with, for, to, from, under, on, in, and so forth).
- #RI (reduce ink)
- Reduce "ink" in tables and graphs. I often see tables
with a line all the way around the table, between every
column, and between every row... ugh, it looks like a
spreadsheet. All that "ink" is unnecessary and
distracting. Read Tufte's book (below); it will
transform the way you think about presenting data.
- #SC (Sentence case)
- Use "Sentence case" for titles and headings, rather than "Title
Case"; that is, capitalize only the first word of a title
or heading. This is my personal preference, but I can
live with Title Case if a publisher's style (or co-authors)
insist on it. Regardless of the choice, use a
consistent approach for all headings.
- #SP (spelling)
- Check your spelling! The spell checker can also
catch many typos.
- #T (tense)
-
Scientific writing uses tense in specific ways.
- It is common in scientific research papers to use the
first-person plural, that is, to write "We
developed a method..." rather than "I developed a
method...", even on a single-authored paper. The
rationale I've heard is that the plural recognizes that
science is normally a collaborative process.
- When referring to your own experimental results, use the
past tense. Thus, "We ran ten trials
and the average execution was 10.4 seconds."
Past tense to say you did the work, and past tense to
describe the result.
- When writing about related work, describe their results in the
present tense. Thus, "Jones
et al. conducted a survey and found
that most Dartmouth students wear green
clothing." Past tense to say they did the work, but
present tense to describe their result.
- When referring to other parts of your paper, use present
tense. That is, do not say, "This paper will
discuss...", say, "This paper discusses...".
Similarly, say, "The argument above proves that..."
rather than "The argument above proved that...".
Why? Because, despite the fact that your reader is
reading through the paper, over time, the paper stands
complete, in the present. The argument not only
proved, but still proves....
- #TH (this)
- The word "this" should almost always be followed by a
noun: instead of saying "this is red", you should be more
specific with "this ball is red." If you leave it
out, your reader may mentally insert a different noun than
you had in mind... things that are not ambiguous to you can
be ambiguous to your reader.
- #TI (tie)
- Use a a 'tie' (a non-breaking space) right
before any number or citation, so it won't appear at the
start of a new line, like this
[13]. Or like this:
Figure
4.
In LaTeX, use a tilde for a non-breaking
space: "blah~\cite{jones:PNP}." or
"Figure~\ref{f:pretty}."
Also use a tie after an inline-item number; in LaTeX:
"(a)~first item, (b)~second item, (c)~third item."
- #UN (units)
- Units: The convention in computer science seems to be
the following: when measuring storage, mega and kilo refer
to powers of two; thus a megabyte is 220 bytes
and a kilobyte is 210 bytes. When
measuring network bandwidth, mega and kilo refer to powers
of ten; thus one megabit-per-second is 106 bits
per second, and one kilobit-per-second is 103
bits per second. When abbreviating, k=103
but K=210, and m=106 but
M=220. Furthermore, b=bits and
B=bytes. Thus, 10 MB is 10 times 220
bytes, but 10 mbps is ten million bits per
second.
- #UND (underline)
- Do not underline words and phrases. For emphasis,
foreign words, etc., use italics.
- #URL (URL)
- If you want to mention a URL, do not place it inline,
in the text. Put it in a footnote, or a reference at
the end of the paper. In-line URLs can produce
awkwardly long lines (or broken URLs), and anyway, few
people actually want to read a URL. With most
conferences restricting the number of pages for a paper's
body, but not the references, citing a URL (rather than
placing it in a footnote) saves critical space in the paper
body. It also allows you to provide further details, such
as the title of the web page, and the date you visited
it. Also: avoid URLs that refer to CGI scripts or
include search parameters, as these tend to have a short
lifetime.
- #V (verbosity)
- Avoid verbosity, e.g.,:
- "in order to..." becomes "to..."
- "at this point in time" becomes "at this time"
- "more and more common..." becomes "more common..."
- "a number of" becomes "several"
- "utilizes" becomes "uses"
- #VY (very)
- It is rarely useful to use the word "very". How
much hotter than "hot" is "very hot"? This story may
be apocryphal, but Mark Twain once said that he would just
replace "very" with "damn" everywhere, and then the editor
would surely take them all out.
- #WF (Wi-Fi)
- The term "Wi-Fi" is always hyphenated and capitalized. It
is a trademark of the Wi-Fi Alliance and they chose that
specific spelling. "WiFi" is incorrect.
- #WN (whether or not)
- It is rarely appropriate to say "whether or not";
usually you should just say "whether". If you do use
"whether or not", don't spread the words across the
sentence.
- #WT (which vs. that)
- Be careful how you use "which" and "that".
"Which" nearly always follows a comma, because it is used
to add information, whereas "that" is used to qualify:
- The ball, which is red, fell down the hole.
- The ball that is red fell down the hole.
In the first sentence, there is only one ball involved, and
we mention almost as an aside that it is a red ball.
In the second sentence, there are presumably many balls
involved, but it is the red ball that fell down the
hole. The following sentence is ungrammatical:
- The ball which is red fell down the hole.
- #/ (slash)
- Avoid using a slash when you mean "and" or "or". If you
write a slash, the reader may have a different
interpretation than you.
- #@ (\@ to end a sentence)
- In LaTeX, when ending a sentence with a capital letter and a
period, you need to tell LaTeX that it is the end of a
sentence and not an abbreviation in mid-sentence. For
example, my name is David F. Kotz. The "F" in that
sentence is a capital letter followed by a period, and
LaTeX rightly supposes that it is not the end of a
sentence. LaTeX puts a little more space between
sentences than between words, so it is good to get it
right. As another example, I started a project called
CRAWDAD. That sentence ends with a capital letter and
a period; in LaTeX, it should be coded like this:
CRAWDAD\@.
Although you must follow the publisher's requested style,
I recommend use of my corresponding
customized bibtex
style, and proofreading your paper's printed "References" section
for alignment with the following principles:
- citation key: if you are using an 'alpha' bib style,
which generates citation labels like [AK12] instead of numbers,
ensure the key generated by BibTeX looks reasonable; if not, add a
"key" field to override what is automatically produced.
- Title: make sure the words appear in the correct case
(e.g., mHealth, Wi-Fi). Put {braces} around the entire title, to
force BibTeX from changing capitalization. For example, title =
{{Use of TLS and Wi-Fi in mHealth}} will format correctly, but
without the double braces it will print as Use of tls and wi-fi
in mhealth. When using Mendeley, add no braces to the title
field (or any other field); Mendeley automatically exports the
title field with double braces.
- Authors: Ensure the author names are listed exactly as
they appear on the paper. Check carefully for any accented
characters, initials without dots, or multi-word last names.
- Journal (for journal papers): rewrite as needed to spell
out the journal name properly. IEEE's online library is notorious
for exporting publication names backwards, like this: "Computer
Systems, IEEE Transactions on"; you need to fix that to say "IEEE
Transactions on Computer Systems". ACM sometimes abbreviates the
publication name. My principle is to spell out the full journal
name with no abbreviations.
- Booktitle (for conference papers):
rewrite to spell out the conference name properly;
it should start with "Proceedings of" or "Proceedings of the",
it should not include the year or the anniversary, and
it should conclude with the acronym in parentheses.
Thus, "Twelfth ACM Symposium on Mobile Computing Systems and Applications"
should read
"Proceedings of the ACM Symposium on Mobile Computing Systems
and Applications (MobiSys)".
NOTE: I strip the year and anniversary because they are redundant
with other metadata (notably the year) and I like to include the
short name of the conference because it is helpful shorthand to
those who know the conferences. Note it should not be as short as
"Proceedings of MobiSys" or just "MobiSys".
- Volume, Issue (number): important for journal papers, but should
usually not be present for conferences.
- Page or page range: not essential, but if you have the
information, it can't hurt to include it. Ensure use of endash in a
page range; in BibTeX that's a double dash
pages={14--19}.
- Chapter number: important for book chapters.
- Edition: important for books, especially textbooks.
- Year: essential for all references.
- Month: good for journals, especially if there is no issue
number within a year or volume. Not critical for once-a-year
conferences. When citing a web page, the month and year should be
the date you visited the page.
- Day: essential for newspaper articles, blog postings, and
some journals that publish more than once a month (which happens in
some medical journals).
- Publisher: important for journals and conferences and
books; a short form is good ("ACM Press", "IEEE Press", "Springer",
etc).
- DOI: very important; my custom styles print the DOI.
- URL: important if no DOI is available; use the shortest,
cleanest URL you can find.
- organization: although never seen for references scraped
off the web, it can be useful for white papers or web pages; this
field is used by my custom styles for @Electronic publications.
- address: delete this field, except for Books; there is no
value in listing "New York, NY" as the address of ACM as a
publisher. Back in the day when one needed to send a letter to a
publisher of a book, the address was important; now, geographic
addresses are irrelevant.
- location: delete this field; there is little value in
listing the location of a conference.
- school: important for MSThesis or PhDthesis.
- institution: important for TechReport.
- series: usually best to delete this field, though I do
include it for Springer LNCS, in which case
series={LNCS}.
- ISSN or ISBN: useful for books, but not essential.
Your paper draft should have a title, authors, date, revision number, and
abstract.
A typical structure is
- Introduction
- Background
- (description of your system)
- Experiments
- Results
- Related Work
- Conclusion (or Summary, but note those are different things)
- Future Work
- Acknowledgements
- References
See also my list
of favorite papers/books about writing and research.
New, and still-relevant references:
Older references - may be harder to find.
- Robert A. Day.
How to write a scientific paper.
IEEE Transactions on Professional Communication,
PC-20(1):32-37, June 1977.
A fun read; although old and not specific to computer science, it
still has a lot to say.
- Robert A. Day.
Scientific English: a guide for scientists and other professionals.
Oryx Press, 1992.
- Robert A. Day.
How to Write and Publish a Scientific Paper.
Oryx Press, Phoenix, AZ, 5th edition, 1998.
- Lyn Dupré.
Bugs in Writing: A Guide to Debugging Your Prose.
Addison Wesley, 1995.
- William Strunk Jr. and E. B. White.
The Elements of Style.
Avoid this book. It has many good features, and is a classic, but
many experts now recommend against it.