Looking at cheap storage for hosting backup/archive… driving force: cost.  NetApp is too expensive.

The following can be taken into consideration:

— Amazon S3 web services. Since it’s only used for backup/archive the cost should not be too high. Need to analyze the cost though. Another potential issue is legal/privacy…

     Technically the best way is probably to use a S3 file system driver so the backup/access is transparent to the apps and existing apps doesn’ t need to be modified. the following is a list of  S3 file system drivers, for example:

  •   Fuse over Amazon: http://http://code.google.com/p/s3fs/wiki/FuseOverAmazon

     Note that  the Hadoop project provides two file systems that uses S3:  http://wiki.apache.org/hadoop/AmazonS3 . However seems you have to use the hadoop to access the file systems and they are not accessible by normal apps.

— GFS-like distributed file systems, So that we can use cheap/commodity intel hardware to construct the storage cluster. Currently there are two open source GFS like DFS implementations.:

  • CloudStore ( formerly Kosmos File System / KFS): http://kosmosfs.sourceforge.net/.   quoted form the web site: “Web-scale applications require a scalable storage infrastructure to process vast amounts of data. CloudStore (formerly, Kosmos filesystem) is an open-source high performance distributed filesystem designed to meet such an infrastructure need” It’s written in C++ and can be mounted as a file system via FUSE on linux.
  • Hadoop HDFS File System:  part of the Hadoop Core project. http://hadoop.apache.org/core/docs/current/hdfs_design.html.   Hadoop is developed in Java.  There is also some effort to mount HDFS on linux/systems: http://wiki.apache.org/hadoop/MountableHDFS.