Tutorial
The MS Word to Web Page Tutorial.
An academic's guide to quick web-page construction. How to take a Word document containing footnotes, tables, and academic apparatus and produce a clean HTML version without breaking the citation format.
Why this tutorial exists
Most academic prose is written in Microsoft Word. Most academic prose that ends up on the web does so by going through Word's "Save as HTML" or "Save as Filtered HTML" export. The output of either is — to put it kindly — not what you want. The HTML is full of Office-specific markup, the footnote anchors are mangled, the table widths are hard-coded in pixels, and the stylesheet repeats the entire Office font stack for every paragraph.
This tutorial is the practical workflow for producing a clean HTML version of a Word document, by hand, in roughly the time it would take to fight with the export. It assumes basic HTML literacy and a text editor — anything from Notepad through TextWrangler through VS Code will do.
The workflow at a glance
- Save the Word file as filtered HTML as a starting point — not as the deliverable.
- Open the resulting HTML in a text editor and strip the Office-specific markup. The boilerplate to remove is described below.
- Replace the inline styles with a small external stylesheet containing only the rules you actually need.
- Reformat the footnote apparatus using the approaches from the Footnote series.
- Test print fidelity by printing the page from the browser. The print stylesheet should produce something that looks like an academic article.
The Office boilerplate to remove
The exported HTML file will contain a large <head> block of Office-specific declarations and a long inline stylesheet. The following can be deleted without affecting the visible page:
- All
<xml>blocks beginning with<w:namespace tags. - All
mso-prefixed style declarations. - All
<o:p>empty paragraph markers. - The
class="MsoNormal"attribute on every paragraph, after you have added the corresponding rule to your external stylesheet. - Any
style="tab-stops:"attributes, which are meaningless on the web.
Footnote conversion
Word exports footnotes as a sequence of paragraphs at the bottom of the document, with anchor links from the body markers to those paragraphs. The anchor IDs are usually _ftn1, _ftn2, etc., and the back-references are _ftnref1, _ftnref2. Both work, but they are ugly and they collide with the numbering of any other Word-exported document on the same page. Rename them to something readable (fn1, ref1) and update the corresponding href attributes.
Wrap the resulting list of notes in a single ordered list (<ol class="footnotes">) and apply the citation styling from the superscripted-CSS-marker worked example.
Tables
Word exports tables with explicit pixel widths on every column and explicit border colors on every cell. Strip these and rely on a small table.data stylesheet that handles the borders, the header-row treatment, and the alternating-row shading. The result is responsive and considerably smaller.
Print stylesheet
The simplest useful print stylesheet hides navigation chrome, collapses sidenotes back into bottom-of-page footnotes, sets the body type to a print-appropriate serif size, and adjusts the page margins:
@media print {
header, footer, nav, .no-print { display: none; }
body { font-size: 11pt; line-height: 1.4; }
.sidenote {
float: none;
width: auto;
margin: 0.4em 0;
padding: 0;
border: none;
font-size: 9pt;
}
@page { margin: 1in; }
}What this saves you
A Word document of about thirty pages with seventy-some footnotes will go from an unfiltered HTML export of around 400 KB (mostly Office boilerplate) down to roughly 50 KB of clean HTML plus a few KB of stylesheet. The page will be readable on any browser, will print as a normal academic article, and will be straightforward to maintain.