| Organizing our digital lives or where did I save that file?
In the early 1970’s, Citibank implemented a large project to create unified customer account statements. There was no way of printing a monthly customer statement that included all the customer’s accounts, checking, savings and loans. Each account’s transaction data and related information resided in its own system, in its own database. There was no way of connecting the account systems, checking, savings and loans, to quickly extract a single customer’s data from each system and print them together on a single statement. This seems like it should have been simple. It was not. The best solution, a relational database system for customer accounts, did not exist, and could not exist back then. At the time, the relational data model was unknown outside of computer science circles. Nowhere was it implemented. Instead, Citi suffered with hierarchal, flat structure databases, similar to a spreadsheet’s organization.
Several decades later, we still suffer this same type of information retrieval issue. We are a chaotic species. Personal directory structures are haphazard, fractal-like constructions that contain odd, inconsistently named files.
There are two extremes to file organization. At one end are those who save everything to the Documents folder. At the other extreme are the people who create hundreds of subdirectories with files in each one. Most of us exist somewhere in between, vacillating from one side to the other. Besides geeks like me, people spend little time thinking about directory structures and file names.
A four-drawer file cabinet is an example of a real world, hierarchal, file system. It has several drawers. Each drawer has hanging file folders. Each file folder may have papers, bills, letters, or other documents, or other folders with documents or additional folders. A document resides in a specific drawer and in a specific folder or subfolder. It cannot be in two places unless a duplicate is made, using twice the storage space. A misfiled document is a lost document. This is the physical world of filing data. It does not have to be true in the digital one.
Dr. Codd described the relational data model in 1970. 38 years later, it is yet to be deployed in the most logical place, a computer file system. Data can be related to many, one or no other bits of data. We can relate a document to many things, for example, a bill to both a credit card and a client project. I hoped that Vista and Leopard would do away with hierarchical digital file systems. Why do we have to remember where we saved a file? Why can Google search billions of entries, on hundreds of millions of computers worldwide, and return an answer in fractions of nanoseconds, while it takes us minutes or hours to locate files on our own computers? The answer is Google uses a relational database and incredible, proprietary indexes.
Search tools like Google Desktop, and Windows Search can be installed on a Windows machine to make finding files easier. However, they add overhead to the operating system, slowing down our systems as they continually update indexes so that they are ready when we call them. Apple’s Leopard has indexing built in. Searching for files or applications can be almost instantaneous on a Mac. However, the underlying Mac file system is still hierarchical. Finder, Mac’s version of Windows Explorer, still shows us the typical folder/subfolder structure.
One of the things I love about Gmail is that the folks at Google threw away the hierarchical model and deployed a relational model for email message storage. The online Gmail client uses labels, a form of metadata, to allow us to organize our messages. So something can be filed in many places without duplicating the message. This is a wonderful step forward. Yet many users are upset that Gmail does not have folders. What fools we users be!
|