Home Directory Organization Policy
Top level directories:
In the home directory there will be various top level directories
which help organize documents.
The main goals of the directory policy are:
- Make files easy to file and easy to find.
- Make backing up easy even without specialized software
- Make merging files from different home directories on different
computers easy
The goal is not to have
everything perfectly organized (which usually presents an
overwhelming problem of learning a complex filing policy) but to
significantly reduce the search space and greatly increase the
likelihood of finding files by hand, or with search tools (such
as find).
There are several broad methods of
classification:
- authorship (did I write it or did someone else)
- date (when was it made, or last editted)
- event (a file related to a certain event, a trip, etc..)
- topic (what is the file about or most related to)
- type (is it a movie, a text file, etc...)
In the absence of a database filesystem, we usually cannot
easily use multiple classification methods, so the following
gives us a policy for how to sort various files.
Names
The names of files should in general make it clear what their contents are.
Directories should be named with a noun or a numerical date, as appropriate.
Non-dated Directories:
In general (but not strictly), items placed in top-level non-dated directories
are files which we are not the authors of, or are tied to some existing piece
of software (e.g. mail readers or web servers).
- bin: executable programs and scripts. This name is
not as descriptive as it could be, but since it is
well known in the unix world, we keep with that
standard. Any scripts in here that we are the
authors of should be kept in a version control
system and only copied here so that the shell can
find them to execute them.
-
media: Media files organized by type. These in general
should be documents from other sources, not
those which you are the author of. These are
the kinds of files that in general you can get
from the web or some other source. They are not
critical to backup as the items from your dated
directories are. Files here should be given
names which make them recognizable by their name
alone. Don't be afraid of longer names.
-
articles: Copies of articles downloaded from
the internet. These are NOT works being
written by us. This is not where we put
software documentation, etc. (see docs).
- audio: music, voice messages, speeches, etc...
each author should have his own directory, with
each of their works in subdirectory (albums,
speeches, etc...).
-
docs: Holds documentation for software or
hardware used. This can include programming
manuals, software installation howtos, etc...
These are things that were downloaded,
not items of original authorship.
-
images:
-
video:
- tmp: personal temporary directory. This is your junk
drawer. If everything gets deleted from here,
you won't be sad. Don't put anything in here
that is better suited somewhere else.
A key concept in the organization of files is the notion of a project.
Projects are dynamic objects which grow and change over time.
A single project may have many
works associated with it (proposals, talks, papers,
code), or it may only have a single work (one paper
for instance).
For the
purposes of filing, we assign a name string to each project. As with all
names in this filing standard, the project name should not contain any
spaces. In fact, it is prefered that names only use characters which are allowed in
URL strings. In general, only alphanumerics [0-9a-zA-Z] and
the special characters [$-_.+!*'(),] should be used as project names. Once
established, the project name will be used in many places. At some point,
using a vocublary such as DOAP to
describe the projects might be useful (particularly when coupled with a search
engine).
There is a top-level directory named years.
In the years directory one
keeps a directory
for each year (e.g. 2004/ ). All of the following
directories fall in these year subdirectories. At the
start of each year, all new work goes into the new year
but the old work stays in the old directory!
If a file is to be editted in the new year, COPY
the latest version from the old year to the new year and
start work from there.
It may be tempting to put that one file from the next year in
with the previous: DON'T DO IT! If you do, you will break the
rule that once a year has passed, no new files will enter that
directory. A dated directory should become read only after
that year has passed.
This policy makes
backups more straight-forward (you only need to backup
the current year material). These directories are
"creative" directories, directories where you will be
creating content to put in those directories on a regular
basis. It is not for
putting content that you primarily obtain from other
sources.
A main motivation for doing a year based structure is that it
can only get so messy. At the end of the year, you turn over
a new leaf. You know you have backed up all the files from
the previous year, and there won't be any more for that year.
It keeps things manageable.
As a convienience, we recommend creating a link in the top level
from "current" to the year you are working in.
On UNIX systems this can be done with the ln command
(e.g. "ln -s years/2004 current").
In that manner,
you can always use the same paths for your latest material, even
though the years go by.
-
advising: This is where we keep files associated with the
advising of students.
-
application_data: Here we place backups of important
application file (such as GnuPG keys, SSH keys, signatures,
address books, etc...)
-
employment: This is where we put documents related to our employment.
Job forms, department personel documents, benefits information,
health insurance information, etc. If you have more than one
job, you make a directory for each job and place materials inside
the directory for the appropriate job.
-
misc: Miscellaneous. Hopefully, nothing winds up here, but
there may be files that do not fit properly into any
other category, and this directory is the catch all.
- notes: This is where we put various notes to ourselves.
These may remind us how we set up something on
our computer system, or a list of todo items, or
an encrypted list of passwords. These are short
items that we wrote ourselves and we reference
regularly.
-
personal: invariably some personal information will find
its way onto the system. This includes saved
personal emails, tax filing forms, product ordering
emails, health information, personal finance,
and the like. This is a bit of miscellaneous
directory. Note, it excludes anything that
fits better in any other directory.
-
photos: This is where we put snapshots. If the number
photos grows too rapidly we may put them in
subdirectors with the month number: (01/ 02/
etc..). These photos may be personal or work
related. They may be taken by us, or by
co-workers or friends. Generally, they should be
related to us somehow, and not random pictures
downloaded from the Internet.
-
projects: This represents the "Project oriented" data files.
For each project there is a main name associated with
it and these names form subdirectories of the projects
directory. In each project, all the files associated
with that project are placed in directories specifying
the type of data, for instance:
projects
|
+-Brunet
| |
| +---papers
| |
| +---proposals
| |
| +---talks
|
+-RetinalNeurons
A project may have: code, data, papers, posters, proposals,
or talks. Other works may be associated with projects
in the future, but it is prefered to use a name from
the above list when possible.
Note that some of these items (for instance source
code or papers) should be versioned, but the working
directories for those versioned entities should be
kept in the subdirectory of the project they are
associated with.
-
service: where we keep files associated with different service
or volunteer activities.
-
teaching: This is where we put files related to courses we are
teaching.
The subdirectories should be given a name of the
form: "number-title", such as
"cs233-cryptography". This allows us to see the
topic even if we have forgotten the mapping of
names onto numbers.
-
travel: This is where we put ALL travel related
documents. Each trip will get a name in this
directory. All documents related to that trip
go into that directory (reimbursements,
conference registrations, etc...).
-
versions: This is where we put files for version control
systems, such as CVS repositories, Arch archives,
Subversion repositories, etc... For convinience, we may put
symbolic links from public_html or even our home directory to
the most recent dated archive if we like, but the actual files
reside in this dated directory (or a subdirectory). We assume
that these directories can be browsed with web based software
or by looking inside the directories directly. We recommend
putting a text file called "objects.txt" in this directory
which lists the name of all the modules in the version control
for this year. This allows us to easily look in this one file
to remind ourselves what "works" are contained in the version
control. Often times, we will want our version control
repository to be on a computer which is addressable on the
internet (which sometimes excludes desktop workstations).
That is fine. The account where the repository is stored
should follow this part of the directory hierarchy policy.
Mail
So far, the filing system we have described need not be particular to any
information system (such as a computer file system). It might also be applied
to email messages for instance. For filing of email, we use the same
dated classifications as above. However, we do add one directory: sent. The
sent folder contains all the messages sent during a particular year.
Ideally, email is stored on a mail server, and only locally cached copies are
kept by any client. Thus, we do not need to consider where these files are
kept locally. At the end of each year, it might be prudent to backup all the
messages from the server onto a local filesystem containing all the files for
that year.
The public_html directory should not keep copies of the above filed material.
Instead, when an item is to made public, a link is made from the public web
directory into the directory structure above. As such, backing up only
requires that we back up the main directory structure, and nothing from the
public_html directory. Note that the public_html directory should have the
same directory hierarchy (for public files) as exists in the usual directory
hierarchy.
To make publishing easier, we recommend the development of a simple script
that can take a list of files in the directory structure, and make links to
them in the public_html directory.
Revising This Document
This document specifies a "best practice" for filing. These practices may be
a bit biased towards academic usage or software development. As usage
patterns change, this document may need modifications, but the basic goals
(finding files easily and backing up easily) will not change.
Changes in technology could dramatically change this document. In particular,
the widespread availibility more powerful search tools with integrated backup
could remove the need for careful filing. As long as backing up files is not
easy, most users will not do it. It needs to be as obvious as possible (i.e.
make a DVDs which contain all the files for a given year and you are set). A
search technology without an easy backup solution will not neccesarily change
the filing policy.