Information Filing

Printers  > General >

Click for Site Homepage
Navigation Icons Guide
Click for the Printer IndexPrinters Index
Index to printers and printingTopic Index
Intro Text

Distribution, Filing & Retrieval

Paper serves a lot of roles; for instance, it is used to view, share and think about information. Paper can be used as a way distribute, file and retrieve information. Paper can be used for sheer decoration as well. Most people would expect paper's importance in some of these roles to diminish as computer screens take over.

Paper is good at "display". By comparison with screens paper can be big and cheap, high resolution and light weight. White paper gives a high contrast with black ink. Paper isn't usually self illuminating which some screen technologies can be - but it needs no power. An obvious limit on most paper is that messages are fixed. Screens overcome this - but at the cost of resolution, contrast, portability, power - and a lot of money.

Disks are good at storage. Storing bulk information on computer disk is almost always much less costly than paper - possibly several thousand times less costly. Basically it's a matter of size and material. A ecent terrabyte hard disk holds data in dots under 50 nanometers across; this allows disk to hold the equivalent of millions of paper pages - the contents of a large library. Not even microfilms or DVD can pack information so tightly. Currently no storage mechanism beats hard disk for density but that doesn't mean the scientists aren't trying to find one.

Distribution, filing and retrieval of information are inter-related activities for a computer. Once information has been transformed into computer data it can be stored, moved and retrieved at speeds quicker than human perception. Computer systems can really score over paper - or fall down - depending on how well they can handle this.

Given something very systematic to store like stock records or insurance payments computers are fast and efficient. Small records like this were once all computers could handle but given today's huge hard disk capacities size is no obstacle - a computer record might be a photograph, a music track or a movie.

Computers have traditionally had more of a problem with unstructured information. The problem isn't holding the information; there are ways to store anything that can be expressed on any other medium. The problem is retrieval. Computers can search for the text "horse grazing" very easily (Google Results ... about 250,000 for "horse grazing". (0.40 seconds) at 25th January 2010 ). There is more of a problem if you want a specific picture - Google images eturns 13,700 results but the search is text based and gives some near-relevant pictures of coins and tackle. There are lots of attempts to get computers to "understand" pictures.

Relevance is a problem as well. If I want to learn how to use the Open Office Query by example grid, Google gives just a few examples but the same query for Microsoft Access gives 15,100 results. It's quicker to have a couple of textbooks on a shelf and go straight to the chapter than search the web. Knowledge workers often tend to make heaps of documents, the physical object can be retrieved more quickly, has more associations and is easier to annotate than something on a screen.

Screens are one of the problems with information held in a computer. Most modern screens are much too small, particularly if they are portable.  Even the typical desktop computer screens with 1280x1024 or similar resolution are not quite good enough to show two A4 documents side by side.  This is fine for entry and lookup in a database but not very good for knowledge work. Creating a new document typically involves having two or three sources you are reading and referring to and another that is in progress. Handling the documents on the screen involves a lot of flicking between windows.

Limitations imposed by screens should be disappearing.

The transition from CRT to LCD flat screens is nearly complete so manufacturers will be competing to offer larger screens, and higher esolution.

Most recent graphics adapters can handle two screens at the same time. The operating system can  use these to present one desktop. With a little practice the border between the two screens no longer seems weird.

Screen and disk technology both improve in performance and fall in price over time. Computer retrieval of unstructured infomation has improved as well because recent systems can throw huge amounts of search power at the problem

Information Organisation

People often rather expect that the ways paper and computers do things should be similar. Things are confused because:
 

Word processors are the most popular application programs and they are basically intended to put things on paper.

Many other popular programs use paper as a metaphor for what they are doing - it helps people to relate to them because several generations of people are likely to be more used to paper than computers.

The way computers hold information generally has nothing to do with how it looks on a printed page. In fact the only time information in a computer gets to look like printing is when it is assemled in the printer's memory. A print preview on a screen only looks a bit like a page - screens just don't have enough resolution to show printed pages accurately.

Computers are rather different from anything else in human experience. Everthing in the computer is binary, so text turns to ASCII  or UTF-8 code and pictures into bitmaps.

The sentences and paragraphs of a human readable page can be produced from what the computer stores, but may not reflect how it actually stores them. Word processors often hold information as a linked list. If you were to type a document correctly from beginning to end then the text in memory might be sequential. If you insert text into a line the machine doesn't shuffle the subsequent text around, it puts a pointer to where it will store the information and actually adds the new phrase to the end of the document. Hence if you open an older Microsoft Word document in a text editor there are recogniseable bits of the text but not in the order you might expect, mixed with a morass of odd markers and keys specifying the fonts and layout. More recent documents are in xml to an "open" specification - but nothing like the paper document.

Bits & Bytes

Within a computer there are a heirarchy of memory devices holding the program and the user data.
 

Processors do billions of operations per second on information held in random access memory (RAM) - but all the operations are simple things like add, compare and move.

Disks store trillions of bits in cylinder, heads and sector patterns. As pointed out elsewhere one hard disk can hold information equivalent to a library 

Networks transfer millions of bits per second in packets.

Individual bits have no meaning. In fact nothing in the computer has "meaning" in the conventional human sense of the word. Everything - programs, information, databases, whatever - is just a bit-pattern. The processor uses the bit-pattern programs to do various transformations on bit-pattern data and produce bit-pattern outputs. People give "meaning" to the inputs and outputs.

Originally computers were rather rudimentary things - impressively fast calculators usually programmed to run through streams of sums for scientific analysis or commercial transactions.

Computers only "understand" things to the extent that given enough programs searching through enough information they can behave in an interesting and informative way.

At the moment Internet methods work sufficiently well that about half the population of the US and UK voluntarily spend more time "surfing" than they do watching TV.

Other computer systems for handling information are often less impressive but still progressing. Over half of IT projects are judged a failure. The general feeling seems to be that almost anything is better than paperwork.

--

Information in Electronic Form

Textual information tends to start its life in electronic systems. Using computers as "glass typewriters" became common in the 1970s with the oll-out of green-screen terminals and began to dominate in the late 1980s as PCs and word-processing software dropped in price. Most documents were filed in a computer - somewhere -.  Computer files weren't a great benefit because paper remained the accepted standard way to hold or distribute documents.

The invention of the World Wide Web in 1990 and its rapid adoption over the next few years allowed things to change. The Web provides a set of standards:
 

HyperText Markup Language (HTML) as a way to lay out and interlink documents.

HyperText Transport Protocol (HTTP) as a way to request and transfer documents over the Internet.

Uniform Resource Locators (URLs) as a way to specify what document is wanted. URLs can be embedded in documents as links - click on a link to get the next page.

The Web was quickly adopted as a good way to organise text and heterogenous information online. At first it suited scientific and technical information, then it was adopted more widely. Motives for creating web pages can ange from pure altruism to direct sales and are often a mixture of the two.

Web pages can take information in as "forms" and give it out as pages. Forms allow web servers to take information and collect it in databases or create new pages from it. Chatrooms and mail can use other protocols - but often it is just as convenient to generate web pages on a server.

Web pages can be like their paper counterparts but they need not be. There is no inherent pagination - a web page can be as short or long as its creator wants. A page can be any shape the viewer cares to make their browser. In theory a web-page can fit a mobile phone screen although that might be too small to view it succesfully, or it can be a window or fullscreen on anything from a TV upwards.

The web has grown rapidly -  the total content has doubled every few months since its invention and now exceeds 20 billion public pages. Internet technology can be used on private networks  as well as the Internet giving an "Intranet". It is thought that there might be ten times more information on Intranets than there is on the Internet.

Search engines are probably the biggest innovation the web has brought about. Given a lot of material the obvious question is how anyone finds what they are looking for. Databases do this by maintaining indexes often built as records are added but there is no central database for the web. DNS servers can list most possible web sites but there is no way to indicate what they hold.

Search engines like Google, Yahoo and MSN send "robots" or "spiders" to visit pages and index what they find. The spiders tend to follow links to find pages - although they can use other strategies to find pages that have no links in. For a while the competition amongst search engine operators was to find the most pages. Whilst it is interesting to know that about 6,920,000 pages refer to aardvarks in some way (As Google showed when I asked) this isn't necessarily helpful. Google's innovation was to ealise that links can be treated as votes for a page. If someone links to your page that is because they liked it or found it helpful and they are encouraging others to visit it. Google pagerank lists lots of pages but tends to put those with lots of inbound links at the beginning.

Google is reckoned to use pagerank and about 150 other algorithms to put its lists in order.

Web standards aren't perfect. There isn't any inherent way to protect copyright or ask for payment per document. Pages that a link points too can't inherently point back. Web forms - documents you can fill in online -are rather poor. Web forms can be augmented by "JavaScript" - which might be described as flakey.  The Web relies on fixed servers that can be found by DNS - which creates central bureaucracies. Creating new pages was originally something anyone could do, but as all sorts of design-oriented extensions were added authoring became a specialist job. 

Web-2 ideas like wikis, blogs and facebook have reversed this to some extent. Now anyone can put something onto a web page and have it appear worldwide. This is not without problems of innacuracy, untruth, malice and deception - or people revealing  some youthful indiscretion and finding it raised in a job interview ten years later.

Some people think paper should be obsolete because it is clumsy. However paper works as  both a store, a display, and an aid  to memory. the colour of the bindings on books is an instant indicator to their owner of what is inside - something computer desktops now imitate by making up little previews of the files.

--

Information on Paper

Paper clearly does some information handling tasks very well. An appointments book for doctors and solicitors for instance. In a small practice with one or two practitioners it is no trouble to have a book open at today's date and tick off clients as they arrive or pencil in new appointments made by phone. In a surgery with ten busy doctors it might be rather more difficult to find the earliest appointment using paper records - although five columns per page in a 2-page spread should do it. The book files and etrieves information in one simple action that is almost unbeatable - at least for a small system. Larger systems get more benefits from computers - for instance they let doctors and colleagues book and reject appointments on-line.

Paper is often potrayed as old-fashioned. Quite famously, air traffic control is still usually based on a radar and paper flight progress slips. Annotating and reading the paper slips is quick and simple and no electronic method has yet proven better.

Diaries, organisers and notebooks are all widespread paper tools. So are wall displays and collages which help people collaborate in groups. Recently, researchers have found document piling to be important behaviour for "knowledge workers". Heaps of documents on real desks act as an associative fast acting store. Folders on the virtual desktop of a computer don't do this job nearly as well. (System designers and psychologists are looking at ideas like "haystacks")

Of course one of the main uses for paper is "read once" material - newspapers, leaflets, pamphlets, novels, encyclopeadias and dictionaries  in ising order that the thing may be retained and read again. Newspapers, magazines and pamphlets are often disposed of immediately.
 

Paper Information:

Computer systems have intruded into a world that has been using paper as the preferred medium for information for more than 500 years.

Reading and writing were once restricted knowledge. There wasn't an expectation of universal literacy in the US and Europe until the second half of the 19th century. In the UK the 1870 education act finally suggested schooling for all (although a significant percentage were schooled but emained illiterate).

During the 19th century the role of paperwork grew from a matter of writing a few things to a major industry.

Filing Paper - Diaries, Daybooks, Shelves, Cabinets and Stacks.

A lot of human ingenuity has been applied to "paperwork" - not surprisingly since about half the population are almost continually involved in it. There are all sorts of systems:

Libraries are one pinnacle of filing. An academic library will no only file thousands of books and several series of journals but will try to maintain author and topic indexes across much of the material. How much detail a library index can sustain depends on the human resources that can be applied.

Librarians make a career out of indexing and filing, so they might be expected to like it. For most people, filing is a burden and a nuisance.

The other pinnacle of making information accessible was once commercial transactions: sales and purchase records, shipment notes, customer records and payroll were kept in ledgers, pendaflex files and index cards. Office systems were often quite idiosynchratic because there was a lot of room for innovation. 

Commercial accounts are now usually computer based. The computer will keep running totals of  purchases, sales, taxes, and often of stock. There is often a problem, it can be difficult to inspect ecords in a computer  and there are often problems. Accounting uses double-entry book-keeping so the figures should at least balance. The real problem is often stock - many computer stock systems bear little resemblance to the real position.

Most people with any experience of offices have found filing paper to be the bain of their lives.

Different offices will have their own experiences but on average:

Even the most straight-forward stacking of paper records needs shelf-space and filing boxes. Any volume of material starts to impose a cost.

A thousand sheets of A4 pack into 100 millimetres of space in theory but most practical filing has bindings, gaps and can't be under pressure so several hundred thousand pages - the sort of records a doctor, dentist or primary school might keep - become several metres of shelving to head-height.

Costs of holding any volume of paper can mount up. Some trades - taxi services, jobbing builders, and many smaller shops can probably avoid keeping much paperwork. Professions - doctors, teachers and lawyers - are expected to maintain records. Historically the volume of information seems to have been part of the distinction between a trade and a profession. The historic distinction between trades and professions is probably disapearing. Most work in manufacturing and services now involves vast quantities of administration, some of which is semi-automated. A famous example from the 1980s was that US navy cruisers carried 26 tonnes of manuals for their weapons systems.

Filing paper records typically takes several minutes per document - find the file, insert the record. If someone on minimum wage files 60 items an hour (1 per minute) each item is costing about 10p to file. Retrieval probably costs about the same. Most people competent to do filing aren't on a minimum wage - and are nothing like this quick/ Somewhat confirming this, a Lawrence Berkley Labs study estimated the cost of handling paper at 20 times the cost of purchase. (which might suggest 1p for the page and 20p to file it)
 

Filing paper may be obvious - alphabetic order by last name for instance. Topical filing is usually more difficult - would you organise information on libraries under "leisure" (most councils do), education or business?

Paper files can get rather non-obvious. Politicians have electrorates so it may seem logical to file correspondence by lastname. What if a fifth of letters are about housing and a meeting is fixed with the city housing manager - is there now a special file for housing - do the letters move or are they copied or summarised? If all the members of a congregation send a form-letter about abortion do they each get a file or one file for the issue. Pretty quickly, paper files develop idiosynchracies only their owner can explain.

Filing is sufficiently difficult that even in a formalised corporate setting about 3% of paper documents are filed incorrectly and 8% are eventually lost. US managers spend 3 hours per week looking for these documents and the overall cost of misfiling is upward of $120 per document. (Sarantis quoting Sellen).



Costs of filing paper can be so high that it can be suggested that the best thing to do with most of it is simply throw it away the moment it stops being useful. To generations steeped in the importance of paper this is shocking. Given that the material was created using a computer and the computer file can readily be available, however, immediate recycling may be simplest and cheapest.

Bridging Page and Screen

At the moment many people live in the best and worst of worlds.  Part of their record system is in the computer and part on paper. Often the computer is used to generate neat paper records that they then file away in ledgers and filing cabinets.

One answer to this is the multifunction printer. The print engine has a scanner on top and possibly a fax board inside. Better models can scan to email and to a documant management system.

The main limitation at the moment is that the control panel screens aren't good enough to easily navigate through the document system to find and print pages.

<Unfinished>

 
Last reviewed 25th January 2010. Copyright Graham Huskinson 2010.