Storing Information

Printers  > General >

Navigation Icons Guide
Printers Index
Index to printers and printingTopic Index
Intro Text

Traditional paper remains one of the most popular ways to distribute and store information. In some areas such as technical and academic information it is undercut a bit by digital media and the Internet but basically it is still vibrant, with record numbers of books being published.

One reason there are so many books is that computers make the authoring work much easier - it is far less difficult to type on a computer than on a traditional keyboard. Computer typed pages are actually readable where handwritten manuscripts often need interpretation. Digital presses can produce as few as a hundred copies of a book -something which only became feasible recently.

Paper has been very popular with worldwide consumption rising rapidly in the last 30 years. Paper use has risen quickly in advanced economies with high existing consumption levels, apparently as a result of computerised systems. Research has found that "introduction of email into an organization esulted on average in a 40 percent increase in paper consumption". (Greengard cited by Sarantis 2002).
Paper acts as a display, as an aid to comprehension, as a way to retrieve information and as a long term store. Paper is familiar, convenient, immediate, flexible and it gives an easy way to handle copyright.

Doomed by Disk?

The rise of computer storage and networking increasingly begs questions about the rationale of paper.

As a way to store and find information paper is now hideously ineffective, with costs hundreds of times higher than for storage on disk. Almost all the arguments in this section favour moving information onto disk and the Internet as quickly as possible. However there are other arguments, outlined nearby, which suggest that trying to view information on screens can reduce comprehension levels 25% or more. Contradictions like this add spice to life. We can move all our files onto disk and then stare at a screen without eally understanding what is there!

In practice society is moving from paper based to computer based systems. The momentum seems unstoppable whether computerisation works well - or not. When a new system is going in all the talk is of new services, productivity boosts, gains for customers. Then there may be "teething troubles". Later on most systems are seen to have weaknesses that lead to their replacement. More than half of IT projects are ultimately judged to be failures. However most organisations are oddly reluctant to pull the computers out and go back to paper.

One of the impressive things about computers is their ability to store, find and share enormous amounts of information. Computers do this much better than paper can.

--

Paper & Disk as Storage

Paper's historic role has many facets. A victorian gentleman and scholar's standing was in large part a matter of having a good library.  Paper does many things, it is a store, a display and a way of processing information (underlining, crossing out, adding up).

The core role for a lot of paper is to store information. Legal, financial, medical and technical data are rarely read but act as stores. Most pages of manuals and journals are never actually read - they are there in case they are needed. Paperback novel pages are read by one or two people - but then stored in case they are ever wanted again. (Few people throw books out)

A nice example is of the typical US navy cruiser of the 1980s. Vessels went to sea with 26 tonnes of manuals for its weapons systems. The weight of paper was sufficient to change the performance of the vessel. (Seghers 1989).Quoted in Tom Forester's paper Megatrends or Megamistakes

Some sort of comparison between paper and disk purely as information stores might be helpful.


Disk Technology

Disk is just a store. To a human most modern disk drives look the same, there is nothing much to see, just a box the size of a sardine tin with a row of connector pins. Old 400 megabyte drives look just like new 400 gigabyte drives with a thousand times more capacity.

No technology in history has changed as rapidly as computer storage. Storing information in a computer means two broad things - memory chips and hard disk. Memory chips are mainly used as processor working space. Disk is the workhorse for long term storage. Memory chips and disk have both advanced, with cost per bit halving every year or so for 50 years and forecast to continue doing so for at least another decade.

Hard disks are the definitive data store. In spring 2006 a hard disk with 200 gigabyte capacity retails in the UK for about £50. Capacity at that price will probably double next year to 400 gigabyte. What has driven the market has changed over time. Technologically, manufacturers discovered the giant magnetoresistive effect, scale economies and new formats like those for notebooks and i-pods. The market proved inexhaustible - originally it was just business and scientific data, then text, music and now video.

200 gigabytes of disk space treated as video is several hundred hours of viewing. Treated as text it is enough space to store all the worlds telephone directories, or every book published in the US and UK last year. The capacity of a palm -sized box is comparable with a large library. If trends to higher capacity continue for few more years hard disks large enough to store the US Library of Congress book collection will be freely available. It is just about credible that people will be able to hold the whole of human knowledge in the palm of their hand.

All knowledge in a palm sized pack is an interesting idea but raises all sorts of questions ranging from comprehension to copyright. The idea may never be tested because of equally rapid developments in networking - broadband Internet and wireless connections. Video and text on demand might come from a local disk or from a network.

Paper technology.

Technologically things might seem to be a bit one sided in favour of disk, with paper rather unchanging. That isn't quite true. In fact paper use has shot up - EU paper consumption nearly doubled between 1983 and 2001CEPI2001 quoted by ACRR.org, US paper use trebled over 30 years.Again, the quote is from Tom Forester, many Internet refs concur According to the FAO, global paper consumption tripled from 1970 to 2000 and is expected to grow by half again before 2010. Part of the change reflects economic growth, but a lot is a consequence of the way computer printers have made it easier to write and print documents.

Some people might see a contradiction between growing use of computer storage and growing use of paper. This paradox dissapears, however, if people tend to glance at information on screen and then print it out - which is what often happens. Paper is sometimes being used in it's traditional ole to store information, but increasingly it is used to view it.

Printing is undergoing it's own technological revolution. Production moving from big, centralised fixed-plate printers to
 
Digital printers - which may still be expensive centralised resources but can produce as few one or as many as a thousand copies of documents ranging from leaflets to books quite economically.
Personal printers - which are low cost to buy and every user can have one - although they may not be cheap to run for large numbers of pages.

Digital print is capable of more improvements - although nothing print technology can do would match the continual growth in disk drive capacities.

Cost of printing could be reduced quite markedly. There is no very specific reason why inkjets cost so much more than conventional printing ink. Inkjet Inks (This Site)

Environmental issues are being addressed. Conventional paper is often seen as environmentally destructive, which is probably a fair assesment - particularly in Asia and South America. Recycling has been expanding rapidly, however, so although virgin forest is still winding up as pulp and lots of paper goes to landfill it is conceivable that paper technology could achieve zero net emission. Recycling can also cut costs because de-inking and recycling requires less energy input. Paper Environment and Recycling (This Site) (It is probably easier for disk drive production to achieve environmental neutrality because so much less material is involved - If a 200 gram disk holds the equivalent of 20 million pages then that is equivalent to 100 tonnes of paper).

New papers can be developed. Several companies are developing "e-paper" - electronic papers which aim to give some or all of the properies of screens but in more paper like forms. Some of these might be written then erased by passing through a printer of sorts; others would hold a wireless microchip and battery to change them as needed. Presumably all these technologies would cost more than conventional paper, so they could not rival it as a data store.Click for Electronic Paper (This Site)

Paper is the conventional way of storing information. The storage role is rapidly being taken by disk. Disk simply costs much less as a way to hold information.


Paper & Disk Costs.

Paper costs about £2 per 500 sheet ream - 0.4p per page. To be useful paper has to be printed - this could cost anywhere from 0.1p per side for bandprinter text - up to 4p for text from an inkjet. The average page is printed by a laser printer and a low cost printer should manage just under 1p per side. Of course, offset litho is cheaper than laser printing but only suits bulk print. All sorts of detailed arguments can be advances but  we only need rough aggregate figures here - lets suggest 1p per page.

Really dense A4 printed paper holds just over 5,000 characters per side, and there are two sides - so 10,000 characters per page is a high data density for paper. Being generous to paper, lets suggest that costs might be as low as a penny per page of 10,000 characters.

Disks hold text characters as bytes - usually a character per byte (there are several ways text can be coded). On this basis, a million densley printed pages of text (several thousand large books) equates to 10 gigabytes of computer data. (Ten gigabytes are ten thousand million characters) The 200 gigabyte capacity of a typical disk is potentially 20 times larger - 20 million printed pages.
 
At a penny per page 20 million pages of print costs £200,000,
whilst the disk costs just £50 - a four thousandth the price.

Disk costs will fall in real terms as performance improves over the next decade. It is unlikely that the real cost of printing will change dramatically although improvements in the recycling and ink chemistry might have marginal effects.

Crude comparison between costs for holding a bulk of information in print or on disk suggests that the age of paper has gone. Paper costs at least 4,000 times as much as paper - and we haven't even looked at the cost of shelving and a filing clerk yet!  Of course:
Not everyone wants bulk information - some people go so far as to suggest one "good book". The interest here is the sort of information that scientists, engineers, lawyers, factor managers and people writing or reading a novel or a newspaper are dealing with.
Disks aren't usually filled, many are less than half full because people leave room for growth. This used to be important because systems had one disk and upgrading was expensive. It is now easy to plug another disk in and they might as well be filled. 
Text isn't information. "E=MC2" and "Mary had a little lamb" are both texts but have markedly different informational value. Tasks are very different in how much information they need.
Lots of people don't want textual information  - but disk can hold pictures as well - and audio, video and anything else that can be turned into a digital data stream. 

Confounding Factors.

Making any comparrison between computer and paper storage involves lots of assumptions and presumptions. Storing information is rarely as simple as implied by the comparrison between characters on paper and bytes on disk. Some arguments may sway things back in paper's favour, others not.

Factors that might change the calculation include: computer costs, whether information is text or pictures, storage costs for paper and disk, backup, and filing and retrieval costs.

There are arguments both ways for each point but it is difficult not to favour the low cost and flexibility of disks and networks.

Computer costs might seem to apply to disk and not to paper; but actually they apply to both. Paper is instantly readable, if you can find the right page. Computer disks are no use without a computer and most people have several together with the screens, network and printers to use them. Perhaps the cost of these things should be added to the cost of the disk. There is typically several thousand pounds worth of equipment and that might swing the balance back towards paper. Things aren't so simple because most offices introduced computers primarily to replace typewriters, a network to share printers and at least innitially pressed on with storing most things on paper. Desktop computers were often introduced to prepare neat paperwork - not because of their big disks.

Pictures take more resources on a computer than text. A piece of A4 printed at 300 dpi contains about 8 million pixels and as a colour image uncompressed this might be 24 megabytes of data - 48 megabytes for the two sides. A 200 gigabyte disk will only store 4,000 page images like that but it would make no sense to try. Images are compressed as JPEGs or vectors typically to a megabyte or less so the disk capacity is 200,000 pages - a hundredth the text capacity.

Economics still favour disk but not so dramatically. On this formula disk costs 1/40th of a penny per page. Printing costs rather depend on how it is done. A home inkjet page might cost £1 - so the economics are unchanged at £200,000 (costly ink!). Web offset printing for bulk coloured magazines can get the price down below 1p per page. However this might not be a fair comparison, bulk magazines are probably more comparable with web sites or bulk DVDs than with individual storage on hard disk.

Documents are typically scanned and held in CCITT Group IV "TIFF" format. An A4 monochrome page scanned at 300 dpi typically produces about 30-50 kilobytes of data. This is ten times the amount produced by text but a twentieth what might be produced by colour pictures. Document management systems commonly store the scan and an optical character recognition attempt to translate it as text. Although OCR commonly mistakes a few characters there are generally enough accurately dealt with to allow the computer to index material. Storing 40kb TIFFs alongside 10kb texts a 200Gb disk would hold 4 million pages.

Whichever way pages are equated to disk the results favour disk - usually impressively so with a huge volume of documentation stored in a small, low cost disk. Disk prices are likely to go on falling.

Data on Paper

The price of printing is just a small part of the cost of holding information. The price of shelving and cabinets to store paper in can easily be double or more the price of printing.

Paper storage costs can vary hugely. Assuming the paper takes book form - like phone directories or academic journals - then they can be stacked on shelves. Reams of A4 paper with no binding are 50mm thick, one page is 0.1mm thick. One metre of shelf might conceivably hold 10,000 pages. A million sheets of paper takes up 100 metres (rather over 100 yards) of shelf space - even if all the paper is packed perfectly flat.

100 metres of shelf made of unvarnished floorboard stacked 5 high ( - the sort of thing archives store old files in -) will be 20 metres long and contain nearly 300 metres of tongue & groove floorboard allowing for uprights. A joiner is unlikely to charge less than £2000 for such a thing. Buying something more elegant will probably cost significantly more. Of course it is rather unlikely that paper will pack perfectly flat so significantly more shelving may be needed.

This amount of shelving will occupy 20 metres of wall-space. The shelf and narrow aisles for human access will probably take about a metre of floorspace. The shelves will reach above head height - so nothing else can occupy the space. Using densley packed shelving a million pieces of paper will occupy about 20 square metres of floor. This in turn has an annual rental cost.

Weight is a problem to watch for. Each sheet of paper weighs a feather-like 5 grams but a thousand weigh 5 kilos. A metre of paper is 10,000 sheets and 50 kilogrammes. Stacked 5 high that is 250 kilos - quarter of a tonne and quite a load on the floor. A million pages weighs 5 tonnes. Just for completeness the 20 million page equivalent of a 200 gigabyte hard disk would need 2000 metres of shelf and weigh 100 tonnes.

Using 4 drawer filing cabinets the case for paper is even worse. Each drawer is about 500mm deep, so 50 cabinets are needed. Bargain-basement prices might get the cabinets for £50 each - £2,500 -before buying the pendaflex binders which may double the price. This is probably an underestimate of the cost of paper - Heather Sarantis reckons "To store 2 million paper documents, an organization can expect to spend between $40,000 and $60,000 on filing cabinets alone" - so 1 million would be $20k-$30k and in the UK we might equate that to £15-25k?

Given the cost of actually printing any significant volume of material (£10,000 for a million pages) it might as well be nicely presented.

These costs are for creating a moderate-sized store for papers - the sort of thing a doctor, accountant or politician might accumulate fairly easily - not for filing and retrieving the material which is a job considered elsewhere. The costs of filing involve labour to do the job - and can easily be much higher than the physical cost of the paper itself.

Cheap disks are actually much bigger than 1 million sheets - something between 4 and 20 million sheets - so the cost advanatges may be greater still.

Data on Disk

Disk storage costs are less straight forward. A disk will fit inside a computer so there is no space requirement. In practice a lot of systems have a special dedicated storage computer called a "server" - and that does occupy some space. Servers are sometimes quite large - but that seems to have more to do with the expectations of managers buying computer gear than with any real need.

Unfortunately disks aren't entirely trustworthy; they are mechanical and will eventually fail. Disks are also vulnerable to viruses and operator error so duplicates usually have to be kept. There are lots of strategies -

RAID - (Redundant Array of Inexpensive Disks) uses several disks - usually three - so that if one fails a controller can use the other two to both detect and correct the error. RAID at least triples the cost of disks because there are two extra disks and possibly a special controller.

RAID isn't good enough on it's own because the disks still do what they are told - including operator or virus instructions to delete everything. Offline archives aim to avoid this kind of problem using duplicate disks, CDs DVDs or tape. Big operations use DLT (Digital Linear Tape) because although drives are expensive these tapes have large capacities - which the other media don't.

Backup strategies vary. Original work needs a backup because any one "bit" of information is vulnerable to accident. Endless replication is often the basis of things (as with RAID) but for any large quantity of information it is expensive and unnecessary. It is more usual to make a complete backup once a year, month or week then incremental backups of what has changed. There are strategies to do this like "tower of Hanoi".

Backup does multiply the cost of disk store. However:

Backup is only really necessary for original and changing data - and in many computers the operating system, programs, reference information, downloads and copies of material like CDs and DVDs occupy more space. If duplicate sources are readily available there is usually no reason to back this material up.
 
With fast changing material lots of backups are needed - things like share prices and stock levels. However only the originator of unique information really needs to back it up.
With unchanging material copied to lots of other people few or no backups are needed - references like dictionaries, encyclopedias, manuals and periodicals can often just be downloaded again if there is a problem.

Compression can reduce disk storage problems. Most operating systems compress and decompress disk data on the fly achieving something like a two-fold increase in storage space for text. The apparent increase in space might be traded for the ability to roll changes back (a form of backup). Further compression can be achieved as data goes to tape.

Economies of scale apply to disk, although it isn't always been easy to exploit them. RAID systems are inherently bigger and faster than individual disks and are usually shared from a server by a network. Tradition and bad network technology have limited disk sharing to people within an organisation. Renting space on systems was popular in the 1970s, rather died in the 1980s and has risen again as people have learned to trust web servers. Peer to peer sharing within organisations, between people on the web and shared space on web servers all suggest new patterns.

Virus and trojan infection of data may create an extra problem. At one time data couldn't contain instructions. Then people wanted tricks like embedding programs in data - so a virus might hide as a macro or and ActiveX component in a "Word" or "Excel" file. A virus payload might take a long time before activating so reloading from the backup might reload the virus. Pure text, numbers, sound and picture can't hide viruses - so it is possible to eliminate this threat.

What Disk Can't Do

Disk has some less evident limitations that don't apply to paper. The magnetic fields on disk tend to weaken and merge over time, so a floppy, hard disk or tape put into an archive can be effectively blank after ten years or less. Theoretically it might be possible to recover the data by adjusting thresholds or reading equipment - it would be an expensive technical exercise. The problem with old data is that the machinery to read it has often ceased to exist - there are few if any workable 5¼ inch floppy drives or 9 track tape drives left. Paper archives left in a basement fox and decay a bit. Disks become completely ususeable.

Things aren't quite as bad as they may sound providing each new disk is loaded with it's successors contents. Providing the data is re-written from time to time it is likely to last.

Some companies put old data on microfilm to preserve it. Microfilm Archiving (This Site)
 

Summing Up.

Storing information on disk costs very much less than storing it on paper. With any large quantity of information such as those stored by most organisations and professions the cost difference is  massive.  The cost of storing paper may be many times that of holding the same information on disk - and this is without considering the cost of filing.

Some disk information does need a lot of backup precautions but as suggested above an increasing part of today's systems don't - they could be replicated from elsewhere if a copy were lost. With most organisations having several sites and many computers all with big disks it makes sense for them to use a form of peer to peer filesharing.

As computers systems get bigger the proportion of original and fast changing information tends to fall so it gets easier to make the system esilient. Most people could back up all their original work onto a couple on flash-pen drives.

Some of the points about disk backup should apply to paper as well. Paper archives tend to lose material that gets binned raher than re-filed. Fire and flood would destroy paper and disk equally. Paper files can suffer operator error with a shredder and a bin-bag - its slower than logging on as root and typing "rm *". At least disks and tapes can easily have offsite backup. Duplicating paper means a massive photocopying exercise.

The confounding factors basically come down to three things:
 
Coding of text on disk is efficient - visual coding of documents and pictures is less cost effective - but still a lot more cost effective than printed paper.
Digital data needs a backup. Because holding information on digital systems is low cost it is often duplicated anyway so only part of the data in an orgnaisation actually needs backup.
Paper ought to be backed up - but usually isn't.
The cost shelving, cabinets and floorspace for paper in any large quantity is usually massively greater than for disk. 

Because each organisation will have it's own mix of pictures and data and it's own clerical procudeures bound up with the materials used it isn't eally fair to say "disk costs something between a hundredth and a thousandth the cost of paper" - but the advantage is of that order.

It seems fair to suggest that storage costs on computer are massively smaller than those of  paper - and that is before even considering the filing and handling costs of paper.