Home Directory Organization Policy

Top level directories:

In the home directory there will be various top level directories which help organize documents. The main goals of the directory policy are:

  1. Make files easy to file and easy to find.
  2. Make backing up easy even without specialized software
  3. Make merging files from different home directories on different computers easy
The goal is not to have everything perfectly organized (which usually presents an overwhelming problem of learning a complex filing policy) but to significantly reduce the search space and greatly increase the likelihood of finding files by hand, or with search tools (such as find). There are several broad methods of classification: In the absence of a database filesystem, we usually cannot easily use multiple classification methods, so the following gives us a policy for how to sort various files.

Names

The names of files should in general make it clear what their contents are. Directories should be named with a noun or a numerical date, as appropriate.

Non-dated Directories:

In general (but not strictly), items placed in top-level non-dated directories are files which we are not the authors of, or are tied to some existing piece of software (e.g. mail readers or web servers).

Projects

A key concept in the organization of files is the notion of a project. Projects are dynamic objects which grow and change over time. A single project may have many works associated with it (proposals, talks, papers, code), or it may only have a single work (one paper for instance). For the purposes of filing, we assign a name string to each project. As with all names in this filing standard, the project name should not contain any spaces. In fact, it is prefered that names only use characters which are allowed in URL strings. In general, only alphanumerics [0-9a-zA-Z] and the special characters [$-_.+!*'(),] should be used as project names. Once established, the project name will be used in many places. At some point, using a vocublary such as DOAP to describe the projects might be useful (particularly when coupled with a search engine).

Dated Directories:

There is a top-level directory named years. In the years directory one keeps a directory for each year (e.g. 2004/ ). All of the following directories fall in these year subdirectories. At the start of each year, all new work goes into the new year but the old work stays in the old directory! If a file is to be editted in the new year, COPY the latest version from the old year to the new year and start work from there. It may be tempting to put that one file from the next year in with the previous: DON'T DO IT! If you do, you will break the rule that once a year has passed, no new files will enter that directory. A dated directory should become read only after that year has passed.

This policy makes backups more straight-forward (you only need to backup the current year material). These directories are "creative" directories, directories where you will be creating content to put in those directories on a regular basis. It is not for putting content that you primarily obtain from other sources. A main motivation for doing a year based structure is that it can only get so messy. At the end of the year, you turn over a new leaf. You know you have backed up all the files from the previous year, and there won't be any more for that year. It keeps things manageable.

As a convienience, we recommend creating a link in the top level from "current" to the year you are working in. On UNIX systems this can be done with the ln command (e.g. "ln -s years/2004 current"). In that manner, you can always use the same paths for your latest material, even though the years go by.

Mail

So far, the filing system we have described need not be particular to any information system (such as a computer file system). It might also be applied to email messages for instance. For filing of email, we use the same dated classifications as above. However, we do add one directory: sent. The sent folder contains all the messages sent during a particular year.

Ideally, email is stored on a mail server, and only locally cached copies are kept by any client. Thus, we do not need to consider where these files are kept locally. At the end of each year, it might be prudent to backup all the messages from the server onto a local filesystem containing all the files for that year.

Web publishing

The public_html directory should not keep copies of the above filed material. Instead, when an item is to made public, a link is made from the public web directory into the directory structure above. As such, backing up only requires that we back up the main directory structure, and nothing from the public_html directory. Note that the public_html directory should have the same directory hierarchy (for public files) as exists in the usual directory hierarchy.

To make publishing easier, we recommend the development of a simple script that can take a list of files in the directory structure, and make links to them in the public_html directory.

Revising This Document

This document specifies a "best practice" for filing. These practices may be a bit biased towards academic usage or software development. As usage patterns change, this document may need modifications, but the basic goals (finding files easily and backing up easily) will not change. Changes in technology could dramatically change this document. In particular, the widespread availibility more powerful search tools with integrated backup could remove the need for careful filing. As long as backing up files is not easy, most users will not do it. It needs to be as obvious as possible (i.e. make a DVDs which contain all the files for a given year and you are set). A search technology without an easy backup solution will not neccesarily change the filing policy.