All proceeds from Ad Clicks goes to the author of this site.

 

Tuesday, May 02, 2006

Why ZFS for home

I’m getting annoyed at people that keep saying ZFS is okay for servers but I don’t need it for home. ZFS scales from one drive to an infinite number of drives and has benefits for all of them.

Let’s take a look at the average home computer a single drive holding a mix of files, up to 300GB drives are common. That is a lot of data to lose and its getting easier lose data these days. Further more new hard drives aren’t getting any more reliable with time. Of course you can lose things on new hard drives by just misplacing them in one of the 1000’s of directories you can use in an attempt to organize your files.

What do other operating systems and file systems provide to fight this situation? In Linux you can use raid, redundant array of inexpensive drives, then if a hard drive fails your data is safe. Okay the fun part begins when you try to enable raid, the obvious choices are raid 1 (mirroring your data) or raid 5 (that uses part of your drives as parity protecting your data uses less space but requires a minimum of 3 drives to work). I won’t bore you with the technical details I will just show a small sample of the commands to create a raid 1, a mirror image of one drive onto a second drive.

These instructions were taken for http://unthought.net/Software-RAID.HOWTO/Software-RAID.HOWTO-5.html#ss5.6

You have two devices of approximately same size, and you want the two to be mirrors of each other. Eventually you have more devices, which you want to keep as stand-by spare-disks, that will automatically become a part of the mirror if one of the active devices break.

Set up the /etc/raidtab file like this:

raiddev /dev/md0
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
device /dev/sdb6
raid-disk 0
device /dev/sdc5
raid-disk 1

If you have spare disks, you can add them to the end of the device specification like

        device          /dev/sdd5
spare-disk 0

Remember to set the nr-spare-disks entry correspondingly.

Ok, now we're all set to start initializing the RAID. The mirror must be constructed, eg. the contents (however unimportant now, since the device is still not formatted) of the two devices must be synchronized.

Issue the

  mkraid /dev/md0

command to begin the mirror initialization.

Check out the /proc/mdstat file. It should tell you that the /dev/md0 device has been started, that the mirror is being reconstructed, and an ETA of the completion of the reconstruction.

Reconstruction is done using idle I/O bandwidth. So, your system should still be fairly responsive, although your disk LEDs should be glowing nicely.

The reconstruction process is transparent, so you can actually use the device even though the mirror is currently under reconstruction.

Try formatting the device, while the reconstruction is running. It will work. Also you can mount it and use it while reconstruction is running. Of Course, if the wrong disk breaks while the reconstruction is running, you're out of luck.

Looks like fun right? Before ZFS the situation wasn't much better in Solaris. A typical home user will see this and say I will do this next week. Then next week never comes. Of course doing raid5 only gets more complex in Linux at least, for ZFS its just a slight change to the commands used to create a mirror. In ZFS we execute two or three commands and we are done.

# zpool create data mirror c0t0d0 c0t1d0
# zfs create data/filesystem

Done. The only complex part is getting the last two entries and you can find those by running the Solaris format command. The red is added to help readability

#format < /dev/null Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c0t0d0
/sbus@1f,0/SUNW,fas@e,8800000/sd@0,0
1. c1t2d0
/sbus@1f,0/SUNW,fas@1,8800000/sd@2,0
#

That takes care of drive failure; another problem is accidental deletion, accidentally installing a broken application or any change you would like to undo. Linux’s answer to this is backups, either on optical media, tape or perhaps another harddrive. This is expensive or time consuming, choose one. So the typical home user will most likely put this off to till another day and won’t have a backup for there data.


ZFS has snapshots, they are easy and painless and have a very low cost in resources to create. Snapshots are basically a picture of your data; these are taken in real time and are nearly instant in ZFS, to get these in any other OS you need to buy expensive raid hardware or an expensive software package something that no home user will want to buy.

For example I want to protect my mp3 collection so I put it on file system all its own.


# du -sh /mp3
17G /mp3
#

And then I took a snapshot of it for protection.

# time zfs snapshot data/mp3@may-1-2006
real 0m0.317s
user 0m0.017s
sys 0m0.030s
#

Not bad, 1/3 of a second to protect 17 Gigabyte of data. That can easily be restored should I make a mistake and delete or corrupt a file or all of them.

And here is a little script I created to take snapshots of all my zfs file systems and puts a date stamp on each one. Each snapshot takes very little space, so you can make as many as you need to be safe.

#!/bin/sh
date=`date +%b-%d-%Y`

for i in `/usr/sbin/zfs list -H -t filesystem -o name` ;
do /usr/sbin/zfs snapshot $i@$date ;
done



A few minutes in crontab or your desktop graphical crontab creator and you can have this script execute daily with no user intervention. Below is a sample line to add to your crontab that that takes snapshots at 3:15 am

15 3 * * * /export/home/username/bin/snapshot_all

To see your snapshots, is easy you just look in .zfs/snapshot that is in each zfs filesystem. You can even see individual files that make up a snapshot by changing directories further into the snapshot. This even works if the file system is shared via NFS.


Now let’s take a look at how to recover from mistakes using snapshots. First lets create filesystem, and populate it with a few files.


#zfs create data/test1
#cd /data/test1
#mkfile 100m file1 file2 file3 file4 file5
#ls
file1 file2 file3 file4 file5
#

We now have 5 files, each 100 megabytes, lets take a snapshot, and then delete a couple files.

# zfs snapshot data/test1@backup
# rm file2 file3
# ls
file1 file4 file5

The files are gone. Oops a day later or a month later I realize I need those files.

# cd ..
#zfs rollback data/test1@backup

So all we do is rollback the using a saved snapshot and the files are back.

# ls
file1 file2 file3 file4 file5
#

ZFS makes it easy to create lots of filesystems, in Linux you are limited to 16 file systems per drive (yes I know you can use the Linux volume manager but of course that adds even more complexity to the raid setup outlined above, as drives get bigger you end with hundreds or even thousands of files and directories per drive making it easy to lose files in the levels of directories. With ZFS there is no real limit to the number of filesystems and they all share the storage pool, they are quick and easy to create.


#time zfs create data/word_processor_files
real 0m0.571s
user 0m0.019s
sys 0m0.040s
#

Little over half a second to create a filesystem and you can create as many as you like.

The next problem the home user may face is running out of space. Typically the user heads down to the local electronic or computer shop and gets another hard drive or two if they want to be safe and use raid, so they get to head back to the raid setup guide, of course. Depending on the filesystem you may be able to grow the filesystem with more cryptic commands turning your filesystems into a raid 1+0, but its pretty complicated, so most people resort to keeping them simple and moving files back and forth between the filesystems to get the space they need.

With ZFS it is only one command to add the drive(s) to the pool of storage.

#zpool add data mirror drive3 drive4

Afterward all your filesystems have access to the additional space. If money is a little tight, you can turn on compression on any filesystem you like with a simple command then all files that are added to the filesystem are compressed possibly using less space. Note this usually doesn’t slow down IO at all, on some systems and workloads it actually speeds up data access.

# zfs set compression=on data/filesystem

ZFS is so simple you can talk your grandmother through the process of creating filesystems or restoring data. This is just a small sample of what ZFS can do, but it’s all just as simple as what I have shown you in this document. Even if you are more advanced, you can still benefit from ZFS’ ease of use. No more hitting the web to study how-to's to setup raid or LVM. Even if you can't afford two drives in your home box ZFS will be perfectly happy with one drive, though you do lose hardware redundancy, snapshots are still there to take care of software or user introduced filesystem problems.

62 Comments:

Anonymous Anonymous said...

As a Unix admin with years of experience, I think you should be more intellectually honest with your audience. Hasn't Veritas offered most of this functionality for years?

3:05 PM  
Anonymous Anonymous said...

Veritas? For a "home user"? You must be joking. The cost alone would be prohibitive. Assuming it was free, the complexity (while still not as bad as linux lvm+raid) issue remains.

The only 2 limitations I've found with ZFS are that it strangely seems to lack user/group quotas (very important IMO) and that you cannot easily boot it (being fixed).

3:19 PM  
Blogger Platypus said...

If you think snapshots are a substitute for backups then you have a nasty surprise coming - the first time you have a drive failure. Snapshots are essentially no more than a different index into the exact same blocks as your live filesystem, except for those (relatively few) which have been overwritten since the snapshot. Since most of the blocks are the same, this achieves nothing in terms of data protection. If the live copy ever becomes unavailable, so will the snapshot.

Snapshots are not entirely without value. They're a handy way to implement backups because they make it easier to ensure that you're backing up a consistent copy of your data, but they don't replace backups. Many high-end disk arrays provide a "split mirror" type of snapshot, which does contain a separate copy of data and thus can be used as a form of short-term backup, but that's not the kind of snapshots you get with ZFS.

3:37 PM  
Anonymous Bart_M said...

veritas is to expensive for a home user, and sun doesn't deny that they just copied the good features of veritas. also, there is no lack of user/group quota's. check page 23 on the zfs documentation slides (from opensolaris.org) I qoute:
"Hierarchical filesystems with inherited properties: Filesystems become administrative control points."
This is also true for quota's.

Also, zfs will support booting in the future

3:38 PM  
Blogger jamesd_wi said...

Yes ZFS has some of the features of Veritas’s filesystem. But they are implemented differently. Take Snapshots. Here is a small description on how they are implemented in Veritas filesystem.

How a Snapshot File System Works
A snapshot file system is created by mounting an empty disk slice as a snapshot of a currently mounted file system. The bitmap, blockmap and super-block are initialized and then the currently mounted file system is frozen (see Chapter 6, "Application Interface," for a description of the VX_FREEZE ioctl). Once the file system to be snapped is frozen, the snapshot is enabled and mounted and the snapped file system is thawed. The snapshot appears as an exact image of the snapped file system at the time the snapshot was made.
http://ou800doc.caldera.com/en/ODM_FSadmin/fssag-7.html#HEADING7-12

The things to note here are:

• requires an extra blank slice or partition.
• The filesystem is taken off-line(frozen) to do this process.
• Requires a degree in filesystem. with ZFS its extremely taking a snapshot.

Okay now lets look at Veritas Volume Manager(VxVM), an excellent Logical Volume manager, but not simple. If you want a crash course on it, check out my friend Ben Rockwood’s VxVM Crash course. You will see that its 11 pages just on how to setup a storage pool. This does not cover the filesystem, and commands are a lot more complex. I’m sure Ben could do a crash course on VxFS(Veritas FileSystem) that would run another 20 pages or so. But even if you had both of these documents, it would take you weeks to get up to speed, and your grandmother never would get up to speed.

The real bottom line to all this is that Veritas VxVM and VxFS is that they are NOT FREE! In fact the combination of both packages would cost you more than the home computer so they are not solutions that are usable in the home market that this blog entry was about. Further more ZFS also has features that Veritas doesn’t have that I chose to leave of this document for brevity, raidz(faster writes than software raid5), simple quota, and reservation system.

3:38 PM  
Anonymous Maynard Handley said...

It's nice to talk about things your FS can do well; but it's even nicer to run the problem backwards --- not "how can my pet project be shoe-horned into solving home user problems" but "what are home user problems and how can I solve them".

The number 1 home user problem is backup. We all know this. Snapshots + redundant storage across more than one disk are the start of dealing with this, but the point, in the real world, is not how many command-lines it takes. The point is what actually does the best job of protecting the data against real problems.

It is not clear to this reader the extent to which
* ZFS' ability to simply keep adding odd drives you find (some of which may be 3GB from five years ago and some of which may be 500GB bough last week) interacts seamlessly with an implicit promise that every piece of data is stored on at least two physical drives so that the physical failure of a drive does not mean disaster
* in the world of home users as opposed to enterprise users, the sad fact is that a high proportion of the drives you buy at Fry's go bad sooner or later (perhaps 15% in my experience). You mention little of how ZFS learns that a drive is going bad (since all these lameass home drive vendors are too damn cheap to use SMART, and since the one drive I had with SMART only started telling me the disk was failing after it was already pretty dodgy to read). The first step is obviously tracking that sectors on the disk are going bad and not using them, but there are higher order issue --- is that information actively sought out rather than learned about after data at the cluster is lost, is it persisted across formats, is it reported up to the user aggressively rather than stuck in some log that pretty much no-one ever checks?
* current figures are, what, more laptops than desk PCs now being sold, at least outside enterprise? To what extent does any of this work well with laptops. Obviously the ideal is something like my laptop works just fine by itself, and has on it my essential materials along with some subset of my music and photo and movie collection, and that when I plug it into my home firewire/USB cable it auto-syncs the changes made to the internal drive to other disks (for backup), and seamlessly presents a larger photo/music/movie/etc collection.

If I can't even boot off ZFS (as one of your commenters claims) then I suspect none of the issues raised above have even been thought about. At which point the issue is: why bother with a post like this? ZFS is probably fine for what it is designed for, but pretending that it is great for home users without having a clue about what real home users do, what their *real*problems are, and what they *really* want, is just silly. As the most obvious example, in the real world of home users, people care as little about how many command lines it takes to do anything as they care about how many x86 instructions or how many uops it takes, because they will be interacting with their file system, whether partitioning or otherwise, through a GUI, and what they will care about is the ease of use and capabilities of that GUI.

3:38 PM  
Anonymous Anonymous said...

Typos at the beginning of your article: it is "lose" not "loose" (e.g. "That is a lot of data to loose and its getting easier loose data these days."). Otherwise, thanks for the informative article.

3:38 PM  
Anonymous Anonymous said...

IMHO the Linux Software RAID HOWTO takes the long road to creating the array. The following commands are similar to your zpool & zfs create statements:

# mdadm --create /dev/md0 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sdc1
# mke2fs -j /dev/md0

There you go! A RAID 1 array formatted with ext3 in just two commands.

I know ZFS is a lot more than this, but I couldn't resist promoting the wonderfull tool mdadm.

Martin,

3:42 PM  
Anonymous Bill Bradford said...

I plan on using ZFS on single (external USB-connected) disks at home, so I can easily move them back and forth between my x86 and SPARC Solaris boxes - no endian issues to worry about! If one system dies, I simply unplug the disk and plug it into the other box.

3:52 PM  
Anonymous Anonymous said...

You guys are right, Veritas is in no way priced for a home consumer. But when LVM3 has most of these features, you might correctly claim that "Linux stole these features from Veritas" instead of "Linux stole these features from ZFS"

3:56 PM  
Anonymous Anonymous said...

Veritas should be out of the picture, as these are functions and methods that are available at the filesystem level, without the need to purchase thirdparty apps. How cares if some company has a product which can nearly do the same things. Having these things built in is much better and of course cheaper.

From what i have read about ZFS, it seems like the ideal FS to use on modern large sized hard disk drives, with seagate bringing out 750GB drives, i see in the future that FS's like ZFS will be a requirement and not just a luxury extra.

4:31 PM  
Anonymous Anonymous said...

Quoted:

===
also, there is no lack of user/group quota's. check page 23 on the zfs documentation slides (from opensolaris.org) I qoute:
"Hierarchical filesystems with inherited properties: Filesystems become administrative control points."
This is also true for quota's.

===

Those are not user/group quotas. Those are filesystem-level quotas. This would require each user to have his own filesystem and mountpoint in order to implement user quotas.

Please explain how you would create a single ZFS filesystem mountpoint:

/work

and create per-user or per-group quotas on that directory. I do not believe that you can.

4:35 PM  
Blogger jamesd_wi said...


Please explain how you would create a single ZFS filesystem mountpoint:

/work


simple

#zfs create data/work
#zfs set mountpoint=/work

done, you can even do it all in one line but two is easier to read in this case.


and create per-user or per-group quotas on that directory. I do not believe that you can.


With zfs you can create filesystems as easy as you can create directories. So its really not necessaary to assign a quota to a user, though you can request this feature if you like at bugs.opensolaris.org

#zfs create data/username
#zfs create data/projectname
#zfs create data/groupname

then use normal unix tools to associate them to what ever user or group you would like, zfs also has full support of ACL like nfsv4 and much easier to setup.

To set a quota you do the following, the following is 1 gigabyte but you can use any size you need.

#zfs set quota=1G pool/filesystem

ZFS also has reservations so if you are working on a project and you know it will need at least 4 gigabyte of data availible. You execute the following then you can be assured that the space will always be availible.

#zfs set reservation=1g pool/filesystem

4:50 PM  
Anonymous squaxbarket said...

I like the sound of zfs, but you said 'home users'.

And you also mentioned data safety. "Pick one".

Home users won't be able to manage data until data is as manageable as it once was in the days of floppy disks: want to save your work, insert floppy, save. Want to back up? Repeat using a second floppy, or copy the first one using 'copy a:'.

I don't think even zfs for home users is helping arrive at this. It may be a useful underlying system, but the commands you illustrated, even though simple and dramatically improved on previous volume management systems, still are way to complex, and more importantly, non-human-parseable for home users.

Data integrity problems of home users need distribution management, not online volume management. Integrity checking is important in both, and simple interface also, but end-users need their data to be transparently dispersed and secured, and easily retrieved, from locations and technologies they know and care nothing about.

Eg, every time a device is plugged into a system, it becomes part of some technology that is a bit *like* zfs, but when moved to someone else's system, can be accessed with a mouse click and password.

Every time a system is connected to a network, it does discovery for volunteer storage servers, and starts uploading a trickle feed of bittorrent-like packets, and making local copies of packets that are signed with a recognised signature, but are not locally cached, ie 'synchronising/caching'.

etc. So, you are onto something, but it's not even close to the answer. :) Sorry to sound negative.

Even for a technofile like myself, zfs looks and sounds sexy, and when I have learnt it at work/study I'll maybe use it at home too, or maybe I'll try it for curiosity, but really, I already know plenty of tools that can do what I need.

2:33 AM  
Anonymous Anonymous said...

quote:

===
With zfs you can create filesystems as easy as you can create directories. So its really not necessaary to assign a quota to a user, though you can request this feature if you like at bugs.opensolaris.org
===

Yep, that's always the answer I hear. It's not really a good one, unfortunately. In a few of the environments we support, applications have been coded to read from and write to a shared directory. We have assigned tiered user and group quotas in order to keep these large filesystems under control.

As it stands, we cannot use ZFS for these applications due to its lack of quota support.

:(

8:24 AM  
Anonymous Anonymous said...

Hi!, I'm not so familiar with ZFS, I've just read the basics, but is there anyone working on porting ZFS to Linux ? I'd love to try it. Thanks.

8:24 AM  
Blogger jamesd_wi said...

Home users won't be able to manage data until data is as manageable as it once was in the days of floppy disks: want to save your work, insert floppy, save. Want to back up? Repeat using a second floppy, or copy the first one using 'copy a:'.

Home users can still do this, except there are more choices where to put files. Floppies ( well some home machines still have them), CD-rw's, the web, ftp, shared filesystems, you can email them to your self. ZFS doesn't change anything here.


I don't think even zfs for home users is helping arrive at this. It may be a useful underlying system, but the commands you illustrated, even though simple and dramatically improved on previous volume management systems, still are way to complex, and more importantly, non-human-parseable for home users.

ZFS is the starting point. Someone can easily put a GUI interface over it. In fact there allready is one but its probably too complex for the users you are refering too, but it would be easy to create one that could target the typical home user.

Data integrity problems of home users need distribution management, not online volume management. Integrity checking is important in both, and simple interface also, but end-users need their data to be transparently dispersed and secured, and easily retrieved, from locations and technologies they know and care nothing about.

This is possibly a good idea, a single filesystem can not hope to do this, this would be a project for a company or a 3rd party website, but ZFS has most of the tools to make this possible and adding the rest shortly. There is currently built-in support for backing up pools over the net. Also encryption support is in the works. With out these no informed home user would want to distribute there data.

Eg, every time a device is plugged into a system, it becomes part of some technology that is a bit *like* zfs, but when moved to someone else's system, can be accessed with a mouse click and password.


This one is simple Solaris has the automounter that probably needs to be enhanced to understand ZFS but should be simple. Not sure why a passwd is necessary for a typical remote drive. But ZFS will provide encryption support later and require a passwd.

Every time a system is connected to a network, it does discovery for volunteer storage servers, and starts uploading a trickle feed of bittorrent-like packets, and making local copies of packets that are signed with a recognised signature, but are not locally cached, ie 'synchronising/caching'.

this again would need a 3rd party company or a large community to make happen, not sure its even possible for most home users since not everyone is on broadband. Replicating large files over the net is not really usable at 56k. But its not impossible.

etc. So, you are onto something, but it's not even close to the answer. :) Sorry to sound negative.

Well what you ask for is a giant leap from any filesystem now in existance ZFS is the best there is currently. If someone really wants to make ZFS even better for the home user they could work on a GUI that makes ZFS as simple as possible for the home user and perhaps a company will step in and fill the need for data replication.

Even for a technofile like myself, zfs looks and sounds sexy, and when I have learnt it at work/study I'll maybe use it at home too, or maybe I'll try it for curiosity, but really, I already know plenty of tools that can do what I need.

I hope you do take a look at ZFS, but really its incredibily easy to use and maintain, its a huge leap over what is currently availible. It also does things differently than what older solutions, so you may need to break some preconcieved notions before you see the true power of ZFS.

12:50 PM  
Anonymous Anonymous said...

I think the key to ZFS is the simplicity and power for the Unix admin. I also think that it would be quite easy for a company like Apple Computer to put a nice front end to the commands for ZFS and make it super easy to use all that power.

5:25 PM  
Anonymous Anonymous said...

Your title should have been ZFS howto instead it seems you said you were going to write a hi-level why zfs but rather wrote a bunch of techo no jumble that I will need when I am ready to implement ZFS ... still dont see why I would want to use ZFS. -1

5:28 PM  
Anonymous Anonymous said...

ZFS is perfect for Apple to use. Maybe in the world of windows or linux backups are difficult but on the mac, i just use "Backup" which uses less than 5% cpu comes with every mac and backups up certian data or the entire drive. It's as simple as plugging in the USB drive when the computer asks you and forgetting about it.

Not only that but in the event that my entire computer melts, i can simply boot off cd, and hit restore. 30 minutes later I'm back in my old system.

Apple getting interested in ZFS is a good thing, they have a way of making very complex great features EASY for everyone to use. Bring it on!

5:32 PM  
Blogger JB Hewitt said...

I think you're underestimating LVM for linux quite a bit. Still this ZFS looks interesting. LVM is very flexable, but perhaps not quite as simple in execution as ZFS.


btw - it's a digg.com spin for the apple thing as people click 'DIGG IT' consideribly more with apple in the title.

6:22 PM  
Anonymous Damjan said...

RAID5 in Linux now supports online growing, just add disks :)

6:35 PM  
Anonymous Anonymous said...

Don't forget LVM2 under Linux.

Combined with the aforementioned two-line mdadm raid setup, it's pretty easy to get snapshot functionality, as well as extendable filesystems, all sitting atop a RAID 1 or RAID 5 array.

I should know, I just did it in about 6 shell lines last night.

7:38 PM  
Anonymous Kenn Christ said...

Snapshots are basically a picture of your data; these are taken in real time and are nearly instant in ZFS, to get these in any other OS you need to buy expensive raid hardware or an expensive software package something that no home user will want to buy.

Or use the free rsnapshot, which does nearly the same thing.

Nearly, but not quite, as it does require a single full copy of your data, plus incrementals, but the plus side to that is that you can store your snapshots on another disk and get both types of protection at once. And it runs on any flavor of *nix (including OS X).

8:07 PM  
Anonymous Anonymous said...

Exactly. The end user never sees the commands so it really doesn't matter for them. Consider this: what is the command to create an encrypted directory on Mac OS X? How many users know that command? Almost none? Case closed.

10:43 PM  
Blogger jamesd_wi said...

I have taken another look at Linux LVM and Raid options and compared them in this Chart

and for those that want to think that rsnapshot is a valid replacement to zfs' snapshot, compare the time of zfs to rdesktop i don't think it will snapshot in .299 seconds.


# du -sh /mp3
17G /mp3
# time zfs snapshot data/mp3@5-4-2006

real 0m0.299s
user 0m0.018s
sys 0m0.032s
#

12:16 AM  
Anonymous Anonymous said...

As platypus said, snapsnots aren't a substitute for backups.

However, for home use, raid-z (zfs's version of raid) is probably more than adequate. In zfs, all data is checksummed; in a non-raid-z zfs filesystem, the system panics when data corruption is detected (ugh), but a raid-z filesystem will simply detect, report, and correct the error. There's a zfs demo video somewhere showing someone creating a raid-z filesystem (it's really easy), and then doing a "dd if=/dev/zero ..." to one of the disks. The raid-z zfs filesystem notices it and keeps on chunking away. One of the solaris developers is actually using a pc with a flaky power supply, which occasionally corrupts his disk data; with raid-z zfs, the corruption is detected and corrected. It's really quite scary. Oh, and don't forget that zfs does background disk scrubbing, which reads all of the disk data, and verifies the integrity; data correction doesn't have to occur during the actual file read.

For home use, raid-z is often enough. (But you still need to do backups, as only backups can protect you against nasties like fire and theft.)

12:27 AM  
Anonymous Anonymous said...

Sorry, I meant to say, "raid-z is zfs's version of raid-5".

12:29 AM  
Anonymous Anonymous said...

Veritas is hardly as difficult as the apple-and-now-zfs fanboys posting here make it out to be.

make a 100gb volume:
vxassist -g mydg make myvolume 100g

make a filesystem:
mkfs -F vxfs /dev/vx/rdsk/mydg/myvolume

mirror volume:
vxassist -b mirror myvolume mirror1 mirror2

and netapp/emc/etc have been doing end-to-end checksumming for years.

not to dis ZFS, i'm sure it's a fine product, but the oozing of 'wow look at how unbelievably incredible' type comments about it on this blog, lead me to believe it's just more mac fanboys regurgitating ZFS talking points, who've never actually used any of the -many- other file storage systems out there that have been doing this stuff for years.

pretty typical.

2:30 AM  
Blogger logicnazi said...

There are several problems with most home users making use of ZFS. First problem is that most home users keep most of their data on their root drive. Last I checked using ZFS for the root filesystem was still a bit of a hack, though this seems likely to be fixed soon if not already.

A bigger issue is the applicability of pools to the home enviornment. Pools are only really useful when you have several disks you want to treat as one massive storage system. This doesn't describe the use of most home users. They often have a root drive and then maybe an external drive that they keep their music on so they can take it with them as well as a bunch of DVDs they burn. This setup doesn't benefit at all from ZFS's pools because the user wants to send different content to different devices not treat them indistingushably.

Having 3 drives you want to all treat the same way just isn't the typical home setup.

Worse these pools are likely to confuse most home users who are going to end up adding their external drive to a Raid-Z pool and later be confused about how to take some files with them on their external drive.

Admitedly the COW feature is a potential advantage. Especially if the filesystem is used to automatically keep a certain number of previous versions as long as they don't use too much space. The capacity to be used as a FS undo feature is promising but not really home user accesible yet.

Also if ZFS has good metadata support I could see why apple might be interested.

3:21 AM  
Blogger jamesd_wi said...

Veritas is hardly as difficult as the apple-and-now-zfs fanboys posting here make it out to be.

i'm not an apple fan boy, I have never even owned a Mac, my last apple was Apple][+. Not that I have anything against them just never bought a mac.



make a 100gb volume:
vxassist -g mydg make myvolume 100g

make a filesystem:
mkfs -F vxfs /dev/vx/rdsk/mydg/myvolume

mirror volume:
vxassist -b mirror myvolume mirror1 mirror2


BTW veritas is prety offtopic for the home user, isn't it about $2000 for a single machine license just for VxVM? and VxFS doubles that price.

VERITAS Foundation Suite™ for Solaris, with VERITAS File System 3.2.2 and Volume Manager™ 2.5.2, is available immediately. Pricing for the Foundation Suite™ starts at US $5,095. File System 3.2.2 and Volume Manager™ 2.5.2 are also available separately with prices for the individual products starting at US $3,000.

Your right it is only 3 commands. But how long does it take to figure out that you only need those 3 commands. Take a look at the veritas manuals some times I beleve there are 3 of them each 200+ pages one is almost 400. Even the Ben Rockwood's Krash Kourse is 20 pages long just for VxVM.

and netapp/emc/etc have been doing end-to-end checksumming for years.

Well if you want veritas for the home user, why should it surprise me that you want to now move to $15,000+ NAS storage boxes.

bottom line once again.

* ZFS is free
* veritas /EMC, Netapp, etc are not

7:09 AM  
Blogger ylon said...

I'm extremely excited about this. Sure, things don't sound perfect, but they are certainly working in the right directions. I would like to see this combined with a GFS (GoogleFS or redhat's GFS if you like) type of solution as well where you can create a homogenous hierarchical filesystem out of all of the hard drives in your organization autonomously. I've had this on my todo list for a quite a number of years now but just haven't had the time/money to invest in such a development.

7:15 AM  
Anonymous Ceri Davies said...

Snapshots are basically a picture of your data; these are taken in real time and are nearly instant in ZFS, to get these in any other OS you need to buy expensive raid hardware or an expensive software package something that no home user will want to buy.

FreeBSD has had UFS snapshots which have the same functionality for years.

Can you mount a snapshot with ZFS? I'm worried about the example where you rollback to an older snapshot in order to recover two files - what if the other files have changed since the snapshot was taken.

As to the guy worrying about quotas, he just needs to get his head around the new paradigm. I seriously doubt that user and group quotas will be added, as they are to be considered obsoleted by ZFS. You can't make any tool do every job.

8:04 AM  
Anonymous Anonymous said...

Quote:

===

As to the guy worrying about quotas, he just needs to get his head around the new paradigm. I seriously doubt that user and group quotas will be added, as they are to be considered obsoleted by ZFS. You can't make any tool do every job.

===

It's not a matter of getting my head around a new paradigm. It's a matter of supporting servers where I'm not the sole user and, as such, do not play a large role in the selection, development, and configuration of applications.

It's nice to live in a bubble where user requirements don't interfere with the ability to "do the right" thing and you can simply "obsolete" any functions that you no longer have a need for...however I don't have that luxury.

The fact stands that some big dumb corporate multi-user applications and software systems (both legacy and new) take the path of using one big filesystem that a number of (in our case) non-technical users/groups use to dump files. Since quotas are a near necessity in this realm, ZFS cannot be used for these sorts of applications. And, as a result, Solaris is becoming less of an option for these systems. And that's unfortunate.

9:04 AM  
Anonymous iapx said...

Your entry give me some key to understand the intereste Apple has on ZFS, as a tool for IT Professionals, and a tool for home user too, given the great GUI-interface done over the BSD-mach-gnu fundations of OS X.

9:24 AM  
Anonymous Anonymous said...

Wow, this post was surpisingly uninformative. Try using layman's English with proper grammar and sentence structure. It sounds like you know your stuff, but that doesn't help us unless you can effectively relate that information.

9:54 AM  
Blogger jamesd_wi said...

Can you mount a snapshot with ZFS? I'm worried about the example where you rollback to an older snapshot in order to recover two files - what if the other files have changed since the snapshot was taken.

yes you can mount a snapshot, and you can just cd in to them as shown below

# pwd
/data/jamesd/.zfs/snapshot/backup_jan26_06/insomnia
#

i'm in a directory that is part of a snapshot. But you can also clone the snapshot and have a filesystem like the original but you can continue writing and read new data to it so you can work off the snapshot, and also work on the filesystem that was snapshotted.

11:33 AM  
Anonymous Anonymous said...

Versioning is the main advantage even for the home user. Dell sells a RAID system on its desktops for backup usage, so why can't Apple (except for the lack of enough hard drive bays in the PowerMac chassis).

I see no reason why Apple shouldn't just switch both users over to ZFS even if the advantages aren't there right now for home users. lt also might help to make ZFS the default scheme for external drives that might get used with Windows, Linux and/or Mac platforms as it would be truly cross-platform. Better than FAT32 which is pretty mucht the default nowadays.

1:56 PM  
Anonymous Anonymous said...

Versioning is the main advantage even for the home user. Dell sells a RAID system on its desktops for backup usage, so why can't Apple (except for the lack of enough hard drive bays in the PowerMac chassis).

I see no reason why Apple shouldn't just switch both users over to ZFS even if the advantages aren't there right now for home users. lt also might help to make ZFS the default scheme for external drives that might get used with Windows, Linux and/or Mac platforms as it would be truly cross-platform. Better than FAT32 which is pretty mucht the default nowadays.

1:56 PM  
Anonymous Ceri Davies said...

Since quotas are a near necessity in this realm, ZFS cannot be used for these sorts of applications.

That's what I was getting at when I said that there is no tool that works for every job.

And, as a result, Solaris is becoming less of an option for these systems

That's probably a bit harsh considering that ZFS isn't even in Solaris yet.

2:05 PM  
Anonymous Anonymous said...

And, as a result, Solaris is becoming less of an option for these systems.

That's probably a bit harsh considering that ZFS isn't even in Solaris yet.


Not harsh at all. With filesystem sizes increasing for many of these apps (I just built a 4TB fs last week - yuck), UFS+logging just isn't cutting it. ZFS was my hope for these. Say what you will about Linux, but its filesystems certainly outperform Solaris's UFS and fit these situations better.

Thankfully, we can run Solaris and ZFS on the others (currently a small number of dev servers running Express).

2:15 PM  
Anonymous Anonymous said...

I'm very impressed with what I've read about ZFS so far. But I'm still unsure of a really critical question, which I've seen asked here but I don't see a clear answer to yet: if I want to plug in a FW drive, copy a file to it, unplug the drive, and carry it to another computer, how can I be sure my file is on that drive if it was "pooled" with my other drives and there was no distinct disk to copy the file to?

3:45 PM  
Blogger jamesd_wi said...

I'm very impressed with what I've read about ZFS so far. But I'm still unsure of a really critical question, which I've seen asked here but I don't see a clear answer to yet: if I want to plug in a FW drive, copy a file to it, unplug the drive, and carry it to another computer, how can I be sure my file is on that drive if it was "pooled" with my other drives and there was no distinct disk to copy the file to?

you have two choices, either put the firewire drive in a second pool, and write your data to it then you can export the drive (make ready for transport to another system) or you can make the firewire drive mirror the internal drive. then you can take the entire pool with you.. given that the external drive is larger than the internal drive.

your pool would look like

zpool create pool mirror internaldrive externaldrive

after you move the data to the firewire drive you execute

#zpool detach pool externaldrive


"Detach device from a mirror. The operation is refused if there are no other valid replicas of the data."

when the command returns... you are free to remove the external drive and take it with you.

A few solaris express users do this exact thing to backup there laptops.. connect external drive in the office.. and then detach when they leave.

5:01 PM  
Anonymous Anonymous said...

"you have two choices, either put the firewire drive in a second pool, and write your data to it then you can export the drive (make ready for transport to another system) or you can make the firewire drive mirror the internal drive. then you can take the entire pool with you.. given that the external drive is larger than the internal drive."

I'm not sure the mirror option is going to work for "home users". Suppose I plug in my iPod (used as a drive), a USB pen drive, etc so I can copy a file to it and then take that to another computer? Does my iPod then have to mirror another drive? Maybe the separate pool option could be handled such that each attached drive is, by default, its own pool unless you specify otherwise?

6:14 PM  
Blogger jamesd_wi said...

I'm not sure the mirror option is going to work for "home users". Suppose I plug in my iPod (used as a drive), a USB pen drive, etc so I can copy a file to it and then take that to another computer?

all of those devices wont be zfs enabled the OS see's those usually as MSDOS filesystems. you can keep a copy of the files on them, but the devices them self won't be a part of the zfs pool.


Does my iPod then have to mirror another drive?

your ipod should be mirrored to the computers harddisk via itunes.

any hardisk or device that you don't want to be part of the pool isn't. The firewire drive is large so thus you would want it to be part of the pool.

10:38 PM  
Blogger bnitz said...

James,

Excellent post! One of the first things I tried with ZFS was to create mirrored pools which lived inside files, one file on NFS and the other on my laptop, and relying on mirror resilvering to resync everything. I also heard a lot of "that's not what ZFS is for." But it works and until ZFS is available on OSX, Linux and other OSs, I intend to have a fast booting Solaris ZFS fileserver be my digital video/digital photo repository.

11:52 AM  
Anonymous Anonymous said...

bnitz,
Comments from a "real" home user using ZFS is not keeping with the flavor of the rest of the postings. Facts should NOT be used to refute the posters, who probably spent more time learning the other products than I did, and don't want to see anything that might lower the importance of their previous experience in their resume!

9:39 PM  
Anonymous Anonymous said...

Veritas recent launch of free version of SF (SF Basic 5.0), I assume should take care of any pricing issues & perfectly work for home purposes :)

5:49 AM  
Anonymous Tim Foster said...

I wrote about
ZFS on YOUR Desktop
, as well as some simple ways to
automate snapshot taking
- ZFS for home ? Hell yeah!

4:57 AM  
Anonymous Anonymous said...

This seems extremely interesting. I plan on building a desktop with one 150GB raptor and 2 Barracuda 300GB drives, at home. On the Barracuda drives I will use ZFS. As I understand it, they will be raided.

My question; do I hvae to defrag my Barracuda pool each month? Does ZFS cope with fragmentation in a bad way? Say two filesystems grow and fragment each other... Thats not a nice scenario from a performance viewpoint...

3:38 AM  
Blogger jamesd_wi said...

This seems extremely interesting. I plan on building a desktop with one 150GB raptor and 2 Barracuda 300GB drives, at home. On the Barracuda drives I will use ZFS. As I understand it, they will be raided.

Cool. the 150GB is probably too much for the OS, but too much never hurts you could also set much of it asside for temporary storage and use ZFS on it.


My question; do I hvae to defrag my Barracuda pool each month? Does ZFS cope with fragmentation in a bad way? Say two filesystems grow and fragment each other... Thats not a nice scenario from a performance viewpoint...

Currently fragmentation has not been found to be a problem, the general rule with all properly designed filesystems is that you use no more than 90% of the space then you will have no problems with fragmentation. There currently isn't a defragger, if it becomes a problem it will be intergrated into zpool scrub and it will be able to run in the background or in the middle of the night. In the future they will come up recomendations for when and if you should run zpool scrub. It will check all your data for errors and fix any it finds.

I haven't experienced problems with fragmentation and I have exceeded the 90% rule quite frequently almost constantly in fact and have had no problems with fragmentation. I currently have 48 filesystems, and over 300 snapshots. On approximately 100GB of storage.

4:15 AM  
Anonymous Anonymous said...

There's a couple of great Flash demos of ZFS at the OpenSolaris site including this one on ZFS basics... http://www.opensolaris.org/os/community/zfs/demos/basics/

6:08 PM  
Anonymous Anonymous said...

I have not yet build my raptor + 2 discs solaris system. I plan to devote part of the raptor to solaris and the rest to windows (for playing games). I want to use the 300GB discs as raid-z storage for films, mp3 etc.

I saw a demo i liked; someone overwrote one of the discs with garbage and ZFS automatically repaired the entire pool. I want to be able to do that too, if a disc crashes or so. So, I thought 2 barracuda discs would suffice for automatical repairment? Or must I use a minimum of 3 discs? Where can i read more about this? And the bandwidth increases with 2 discs, I hope?

8:07 AM  
Blogger jamesd_wi said...

"I saw a demo i liked; someone overwrote one of the discs with garbage and ZFS automatically repaired the entire pool. I want to be able to do that too, if a disc crashes or so. So, I thought 2 barracuda discs would suffice for automatical repairment? Or must I use a minimum of 3 discs? Where can i read more about this? And the bandwidth increases with 2 discs, I hope?"

ZFS supports both raidz and mirror(raid-1) if you only have 2 drives its best to use mirroring. yes your read bandwidth will increase because of mirroring i don't think it will increase performance with just 2 drives with raidz, but more than 2 drives will see a benefit. you can't increase or decrease the size of a raidz pool yet. you also pay a greater penalty with raidz than you do with a simple mirror, its not much but more than with a mirror.

The other option i would highly recomend is getting a 3rd drive and use raidz, then your data protection is only costing you 1/3 of your storage instead of 1/2 of your storage, and sooner or later you will need the extra 300GB. its amazing how fast you can fill any amount of disk space.

9:12 AM  
Anonymous Danny Org said...

Nice blog thanks for the info

Danny

6:03 AM  
Anonymous Anonymous said...

Oh, thanx for your suggestion. So you say that to use Raid-Z optimally, I have to use 3 discs for my Zpool - not 2 discs. With 3 discs I can damage one disc and the entire pool wont crash, I just have to exchange the damaged disc and the pool will automatically repair. Is this correct? And there will be increased bandwidth too?


And I wonder, would it be difficult to release a ZFS driver for windows XP/Vista, now that ZFS is open source? I want to store all data in a Zpool with 3 discs on my computer, and occasionally boot windows for gaming and stuff - and I want to be able to access all my MP3 on my ZFS pool.

2:18 PM  
Blogger jamesd_wi said...

Oh, thanx for your suggestion. So you say that to use Raid-Z optimally, I have to use 3 discs for my Zpool - not 2 discs.
yes, raidz work best with 3-9 drives. A 2 disk raidz group brings no extra benefits and a performance decrease because of the extra calculations.
With 3 discs I can damage one disc and the entire pool wont crash, I just have to exchange the damaged disc and the pool will automatically repair. Is this correct?
yes just execute one command and it will replace the drive and resiliver the drive and all will be find.
And there will be increased bandwidth too?
yes the more drives in a zfs pool the more badnwidth availible.
And I wonder, would it be difficult to release a ZFS driver for windows XP/Vista, now that ZFS is open source?
yes this could happen but I expect microsoft or another company that has full access to Microsoft's source code would have to code it given that they would need extensive access to underlying kernel interfaces.
I want to store all data in a Zpool with 3 discs on my computer, and occasionally boot windows for gaming and stuff - and I want to be able to . access all my MP3 on my ZFS pool.
The easiest way to do this is to buy or find a used computer and make it a fileserver that can share the data reguardless of the OS your main desktop computer is running. The other option that is to use vmware on your desktop and have Solaris 1 running inside a virtual machine and use it to hold your data.h

4:51 PM  
Anonymous Anonymous said...

you say that ZFS does not fragment as long as there is more than 10% free disk space. How can you or I verify this to be the case ?

8:59 AM  
Anonymous Anonymous said...

Holy crap! I read now that Solaris doesnt support SATA drives really good. That really suxx. I hoped to build my RAID-Z with 3 SATA hard drives. Maybe I should wait until some decent SATA drivers arrive. :o(

5:41 PM  
Anonymous Anonymous said...

Suppose I have 3 Barracuda 300GB hard drives. Then I remove one of them and insert a 500GB harddrive instead. Then I issue a command to repair the whole pool, et voila, i have a working pool with 300GB, 300GB and 500GB harddrives. Then I repeat the procedure 2 times more, and at the end, I have migrated my whole zpool from 300GB to 500GB harddrives.

Would this be possible?

PS. I often visit your site, it is good info. Thanx for doing this for us on the webb!

4:58 PM  
Blogger Jon said...

"Anonymous" wrote:

I think the key to ZFS is the simplicity and power for the Unix admin. I also think that it would be quite easy for a company like Apple Computer to put a nice front end to the commands for ZFS and make it super easy to use all that power.

Is that exactly what they are doing with "Time Machine", promised in Leopard (MAc OS X 10.5), due this spring?

See:
http://www.apple.com/macosx/leopard/timemachine.html

1:17 AM  
Anonymous Anonymous said...

Home user here. I think that the real value for the home user would be appliance based, and see real benefit in the workspace for the same hardware.
The ability to add raid protected storage as simply as plugging in a drive would be a real boon to the home user.
I store music, videos, and for my children, copies of their favorite movies and games on the network using virtual cd images and a virtual cd driver on the client computer.
When I run out of disk storage space, if all I had to do was purchase a new disk, mount it into the appliance, and add the new disk to the storage pool though a web interface, slamming possibility.
MOST home users are running Windows boxes, and the minimum amount of messing around on the command line typical home users have to do, the better off they are.
Integration with Windows would make ZFS a have to have for the home user.

For the corporate user, particularly for the SMB without a dedicated IT staff, the ability to simply add storage by adding a drive to a storage appliance, would make ZFS a huge seller to that market, again, with a simple web based interface. Remember, no dedicated IT staff.

On another note, I think I remember reading that as data is accessed across drives, fragmented data is rewritten across the drives in unfragmented blocks, as well as in the process of auditing the data integrity of the data. If I understand it correctly, this eliminated any problem with data fragmentation.

5:40 PM  

<< Home