Document Formatting When Joining Texts From Various Sources

I have mounted, on a volunteer basis and in a lay capacity, the annual reports for a community group to which I belong, since about 2008.

Up to that point, the group’s annual reports were individual committee reports delivered to the secretary, individually printed out as and when received, and then stapled together with handwritten pages numbers when it had to be distributed, with an added cover page, and an extra page listing the reports and their page numbers. This did have the charm of not requiring a herculean effort and time requirement in both mounting the report, and on “printing day”, to print literally a thousand pages or more, depending on the number of pages to the report and the number of copies to be drawn. Admittedly, it does not take into account possible collating, as per how one might print out the reports (ie. pages with colour drawings and photos vs black and white, etc.).

The year I took on mounting the annual report, I believed that the annual reports should have been in an electronic format such as PDF so that it could be placed on the group’s website. But that was barely the beginning of why I took on the job.

To fulfill the technical goal of making a PDF for download from the website was not too difficult. Two easy options would have been to either scan the report once produced the “old fashioned way” and produce a PDF from all the images, or, at least for those received in electronic format, create individual PDF documents plus scan for those received on paper, then use a PDF joiner to string the PDF files together into a single document. In fact, at the time, I gathered as many previous annual reports as I could and scanned them, making them available on the website.

However, going forward, I did not consider either option to be satisfactory.

The aesthetic appearance of the annual report irked me. It wasn’t the old school printing on paper — to this day, I still print lots of paper copies for distribution. Rather, I saw an opportunity to put to the test some angst stemming from a bit over a decade earlier when the community group’s recipe book to which I’d contributed led to my having had a few ideas on improvements to the text’s basic formatting and overall layout. (The actual recipes, variety, organization, editing, and recipe testing that I learned went on behind the scenes, and the like, were beyond the scope of my interest, although one common error, separate from my angst, was a mild nuisance.) I of course wisely kept my opinions to myself, both at the time of the recipe book in the mid 1990’s, as well as at the time of initially volunteering to mount the annual report.

As can be surmised from the above, each report came from almost as many different people as there were reports, depending on how many committee reports given individuals would take on. Each person would typically type their report on their computer and email it to the office, or perhaps print it out at home and drop it off at the group’s office personally. They used whichever word processor they had: Sometimes simple text editors, or MS Write, or MS Word, presumably ranging through Word 98, Word 2003, and Word 2006. Presumably some people had Macs with whichever word processor they might have had. I believe that the secretary, who was sometimes typing up the reports which were submitted handwritten, was using a version of Wordperfect. Finally, I was submitting my reports at that point using OpenOffice.org. Presumably, there may have been other text editors or word processors used. Each instance presented a random opportunity for default settings to be different, as well as for the user to change the settings to those that suited their own personal taste.

As a result, each report predictably had formatting unique to each author, sometimes unique to each individual report, if two or more reports were submitted by the same person.

The various differences in formatting in the reports received included the following, without being an exhaustive list:

– varying text fonts and font sizes, and occasionally, more than one of either or both in a given report;
– varying line spacing;
– varying paragraph indentation, including lack thereof;
– line jumps or lack thereof between paragraphs;
– varying page margin widths;
– varying text alignment, typically either left justified, or left and right justified;
– the occasional use of italics over the whole document, beyond that which would normally be used;
– the inclusion or lack of section titles, sometimes (or not) rendered bold and/or italicised and/or underlined and/or capitalized;
– tables listing figures in formats unique to each table and report, or simple lists with varying bullet styles;
– varying spelling conventions, ie. American vs. British vs. Canadian spellings (ie. neighbor vs. neighbour, or center vs. centre);
– varying naming conventions: Sometimes full names, sometimes initialized first names with full last names, sometimes full first names with initialized last names, or sometimes very informally with only first names;
– varying honourific format conventions: sometimes honourifics, titles, and/or ranks would not be used, with persons simply named, and sometimes referred to with variations of their title such as Reverend, Rev., The Reverend, The Rev., etc.
– varying naming conventions for committee names, multi-word names, places, and the like, which were sometimes fully spelled out, and sometimes initialized, abbreviated, and / or contracted;
– etc.

As such, as alluded to in a previous post, minor changes and differences in formatting between the individual reports created subtle (or, depending on the changes, more obvious) visual changes in how each report appeared compared to each other, when joined and printed on paper or read on a computer screen. Multiple permutations and combinations of the above formatting issues often led to creating wildly varying end results which go beyond the subtle, creating a patchwork of formatting over the multiple reports joined together into a single document. This may be jarring to the eye of some readers, particularly when it is not a subtle, unified, overarching design choice, but rather the result of a decided lack of unified design choice.

This link shows a hypothetical example of how such a report could look (you’ll need a PDF reader) — with various individual reports each having unique blends of formatting as compared to each other. Note that I intentionally use the “Lorem Ipsum” text so as to highlight the formatting.

The obligatory let’s tie it all together part at the end:

When I collect the individual reports and create one document, I cut and paste all the electronic reports (and rarely, type up handwritten reports) into a single document, imposing a uniform text formatting throughout in the form of a standard font, font size, line spacing, (lack of) paragraph indentation, page margins, and standardized and / or uniform versions of the other items above. Pages are automatically numbered, and standard page headers and footers are automatically added throughout, with date codes to distinguish between earlier and later versions. Basic spelling and typing conventions are applied and made uniform. Note that I don’t dictate or edit writing style, so one report might have section headers, while another may not, nor do I edit for turns of phrase and the like.

This link shows the above hypothetical report changed (you’ll need a PDF reader) to show the same reports with some basic text formatting across the whole document made uniform, while allowing each author’s text flow (and implicitly, were each text to be unique, writing style as well) to remain relatively untouched.

Have I addressed my angst from the mid 1990’s? Yes.

Is the document formatting on the annual reports I produce every year a work in progress, with subtle improvements, changes, and the like every year? Of course.

A text formatting riddle

I’d like to propose my version of a little visual puzzle I saw years ago. In the following table, the same text is repeated in each cell. In eight of the cells, an element of formatting has been changed from the appearance of the text using a basic set of formatting, while the ninth contains, in this case, the default settings on my wordprocessor on my system. The riddle is to find which cell has not been modified as compared to the other eight. (View a slightly larger version of the table here.)

A hint of sorts: What the basic formatting settings are, or which word processor I used on which system or OS, all represent red herrings to solving which is “the original”, or “vanilla”, version.

a text formatting puzzle

Scroll down for the solution.
Scroll down to see the solution
The solution is B2, the cell / square in the centre of the table.

All the other cells have one thing changed from the B2’s qualities.

A1) The font was changed (from a Serif font to a Sans Serif font);
A2) The font size was changed to a slightly larger point size;
A3) The cell’s background colour was changed to a light grey;
B1) The text was italicized;
B2) Standard, unchanged text using my word processor’s standard settings;
B3) The text colour was lightened from a standard black to a grey;
C1) The text was capitalized;
C2) The text was made bold;
C3) The text’s line spacing was increased.

Besides at the core being what I perceive to be a fun riddle, it demonstrates how subtle differences can be made to standard document formatting in a variety of ways. It also alludes to the challenges presented by receiving documents from multiple sources for integration into a single document, such as a community group’s newsletter, or a community group’s annual report, presenting content and / or reports from its various members, leaders, subgroups, committees, and the like. In a forthcoming post, I will further discuss basic issues of varying formatting, and the need for standard formatting in a text document from the perspective of a layman editor of a community group’s annual report.

FUDCon 2011 — lightning talks

Today at the lightning talks at FUDCon 2011, the one that caught my attention was called “The Dreyfus Model: how do novices think differently from experts?” The subtitle was along the lines of “Why won’t anyone help me, I have documentation!”  Here is a pdf archive of her talk I made at the time since as of at least 2020 or earlier, it disappeared.  20210425 update:  I have found a new link to Mel’s lightning talk at https://melchua.com/blog/2011/02/02/ive-followed-your-instructions-and-i-still-cant-bake-croissants/

The gist of how Mel presented the subject was that someone is looking for a bread recipe on the internet and comes up with:

Croissants

flour
butter
other stuff
bake

She explained the various cryptic parts of this “recipe” and how obvious it may seem to an experienced baker, but to a newbie, even figuring out that Croissants is a type of bread, let alone what the “other stuff” is can be difficult to grasp, or the concepts of “oh you have to buy those ingredients first — how much? And what’s this? You need an oven? Now, when they say bake, how long? And how will I know it’s ready? Oh yeah, you need to let the bread rise first …

She went on to say how installing certain bits of software and using them may seem trivial to an experienced user, but knowing how to draw in a tarball, extract it, get all the dependencies, compile it, and all the various steps required was not easy for a newbie, especially in a culture that takes several things for granted and literally may skip steps between major milestones.

Ultimately her message lay in the importance of clear, concise, complete documentation.

When I started learning linux, I had to relearn things too, and found things challenging. I quickly learned that things were not as obvious to myself and that when someone said “oh just do this” what they were really saying was “do this 10-15 item list as root under the following circumstances using the proper switches” — not always an obvious task when you say “install package X” while omitting all the necessary parts before and after.