diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 5717858..2ef035e 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -84,6 +84,8 @@ udf.txt - info and mount options for the UDF filesystem. ufs.txt - info on the ufs filesystem. +unionfs/ + - info on the unionfs filesystem vfat.txt - info on using the VFAT filesystem used in Windows NT and Windows 95 vfs.txt diff --git a/Documentation/filesystems/unionfs/00-INDEX b/Documentation/filesystems/unionfs/00-INDEX new file mode 100644 index 0000000..96fdf67 --- /dev/null +++ b/Documentation/filesystems/unionfs/00-INDEX @@ -0,0 +1,10 @@ +00-INDEX + - this file. +concepts.txt + - A brief introduction of concepts. +issues.txt + - A summary of known issues with unionfs. +rename.txt + - Information regarding rename operations. +usage.txt + - Usage information and examples. diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt new file mode 100644 index 0000000..eb74aac --- /dev/null +++ b/Documentation/filesystems/unionfs/concepts.txt @@ -0,0 +1,181 @@ +Unionfs 2.0 CONCEPTS: +===================== + +This file describes the concepts needed by a namespace unification file +system. + + +Branch Priority: +================ + +Each branch is assigned a unique priority - starting from 0 (highest +priority). No two branches can have the same priority. + + +Branch Mode: +============ + +Each branch is assigned a mode - read-write or read-only. This allows +directories on media mounted read-write to be used in a read-only manner. + + +Whiteouts: +========== + +A whiteout removes a file name from the namespace. Whiteouts are needed when +one attempts to remove a file on a read-only branch. + +Suppose we have a two-branch union, where branch 0 is read-write and branch +1 is read-only. And a file 'foo' on branch 1: + +./b0/ +./b1/ +./b1/foo + +The unified view would simply be: + +./union/ +./union/foo + +Since 'foo' is stored on a read-only branch, it cannot be removed. A +whiteout is used to remove the name 'foo' from the unified namespace. Again, +since branch 1 is read-only, the whiteout cannot be created there. So, we +try on a higher priority (lower numerically) branch and create the whiteout +there. + +./b0/ +./b0/.wh.foo +./b1/ +./b1/foo + +Later, when Unionfs traverses branches (due to lookup or readdir), it +eliminate 'foo' from the namespace (as well as the whiteout itself.) + + +Duplicate Elimination: +====================== + +It is possible for files on different branches to have the same name. +Unionfs then has to select which instance of the file to show to the user. +Given the fact that each branch has a priority associated with it, the +simplest solution is to take the instance from the highest priority +(numerically lowest value) and "hide" the others. + + +Copyup: +======= + +When a change is made to the contents of a file's data or meta-data, they +have to be stored somewhere. The best way is to create a copy of the +original file on a branch that is writable, and then redirect the write +though to this copy. The copy must be made on a higher priority branch so +that lookup and readdir return this newer "version" of the file rather than +the original (see duplicate elimination). + + +Cache Coherency: +================ + +Unionfs users often want to be able to modify files and directories directly +on the lower branches, and have those changes be visible at the Unionfs +level. This means that data (e.g., pages) and meta-data (dentries, inodes, +open files, etc.) have to be synchronized between the upper and lower +layers. In other words, the newest changes from a layer below have to be +propagated to the Unionfs layer above. If the two layers are not in sync, a +cache incoherency ensues, which could lead to application failures and even +oopses. The Linux kernel, however, has a rather limited set of mechanisms +to ensure this inter-layer cache coherency---so Unionfs has to do most of +the hard work on its own. + +Maintaining Invariants: + +The way Unionfs ensures cache coherency is as follows. At each entry point +to a Unionfs file system method, we call a utility function to validate the +primary objects of this method. Generally, we call unionfs_file_revalidate +on open files, and __Unionfs_d_revalidate_chain on dentries (which also +validates inodes). These utility functions check to see whether the upper +Unionfs object is in sync with any of the lower objects that it represents. +The checks we perform include whether the Unionfs superblock has a newer +generation number, or if any of the lower objects mtime's or ctime's are +newer. (Note: generation numbers change when branch-management commands are +issued, so in a way, maintaining cache coherency is also very important for +branch-management.) If indeed we determine that any Unionfs object is no +longer in sync with its lower counterparts, then we rebuild that object +similarly to how we do so for branch-management. + +While rebuilding Unionfs's objects, we also purge any page mappings and +truncate inode pages (see fs/Unionfs/dentry.c:purge_inode_data). This is to +ensure that Unionfs will re-get the newer data from the lower branches. We +perform this purging only if the Unionfs operation in question is a reading +operation; if Unionfs is performing a data writing operation (e.g., ->write, +->commit_write, etc.) then we do NOT flush the lower mappings/pages: this is +because (1) a self-deadlock could occur and (2) the upper Unionfs pages are +considered more authoritative anyway, as they are newer and will overwrite +any lower pages. + +Unionfs maintains the following important invariant regarding mtime's, +ctime's, and atime's: the upper inode object's times are the max() of all of +the lower ones. For non-directory objects, there's only one object below, +so the mapping is simple; for directory objects, there could me multiple +lower objects and we have to sync up with the newest one of all the lower +ones. This invariant is important to maintain, especially for directories +(besides, we need this to be POSIX compliant). A union could comprise +multiple writable branches, each of which could change. If we don't reflect +the newest possible mtime/ctime, some applications could fail. For example, +NFSv2/v3 exports check for newer directory mtimes on the server to determine +if the client-side attribute cache should be purged. + +To maintain these important invariants, of course, Unionfs carefully +synchronizes upper and lower times in various places. For example, if we +copy-up a file to a top-level branch, the parent directory where the file +was copied up to will now have a new mtime: so after a successful copy-up, +we sync up with the new top-level branch's parent directory mtime. + +Implementation: + +This cache-coherency implementation is efficient because it defers any +synchronizing between the upper and lower layers until absolutely needed. +Consider the example a common situation where users perform a lot of lower +changes, such as untarring a whole package. While these take place, +typically the user doesn't access the files via Unionfs; only after the +lower changes are done, does the user try to access the lower files. With +our cache-coherency implementation, the entirety of the changes to the lower +branches will not result in a single CPU cycle spent at the Unionfs level +until the user invokes a system call that goes through Unionfs. + +We have considered two alternate cache-coherency designs. (1) Using the +dentry/inode notify functionality to register interest in finding out about +any lower changes. This is a somewhat limited and also a heavy-handed +approach which could result in many notifications to the Unionfs layer upon +each small change at the lower layer (imagine a file being modified multiple +times in rapid succession). (2) Rewriting the VFS to support explicit +callbacks from lower objects to upper objects. We began exploring such an +implementation, but found it to be very complicated--it would have resulted +in massive VFS/MM changes which are unlikely to be accepted by the LKML +community. We therefore believe that our current cache-coherency design and +implementation represent the best approach at this time. + +Limitations: + +Our implementation works in that as long as a user process will have caused +Unionfs to be called, directly or indirectly, even to just do +->d_revalidate; then we will have purged the current Unionfs data and the +process will see the new data. For example, a process that continually +re-reads the same file's data will see the NEW data as soon as the lower +file had changed, upon the next read(2) syscall (even if the file is still +open!) However, this doesn't work when the process re-reads the open file's +data via mmap(2) (unless the user unmaps/closes the file and remaps/reopens +it). Once we respond to ->readpage(s), then the kernel maps the page into +the process's address space and there doesn't appear to be a way to force +the kernel to invalidate those pages/mappings, and force the process to +re-issue ->readpage. If there's a way to invalidate active mappings and +force a ->readpage, let us know please (invalidate_inode_pages2 doesn't do +the trick). + +Our current Unionfs code has to perform many file-revalidation calls. It +would be really nice if the VFS would export an optional file system hook +->file_revalidate (similarly to dentry->d_revalidate) that will be called +before each VFS op that has a "struct file" in it. + + +For more information, see . diff --git a/Documentation/filesystems/unionfs/issues.txt b/Documentation/filesystems/unionfs/issues.txt new file mode 100644 index 0000000..6101ebf --- /dev/null +++ b/Documentation/filesystems/unionfs/issues.txt @@ -0,0 +1,12 @@ +KNOWN Unionfs 2.1 ISSUES: +========================= + +1. Unionfs should not use lookup_one_len() on the underlying f/s as it + confuses NFSv4. Currently, unionfs_lookup() passes lookup intents to the + lower file-system, this eliminates part of the problem. The remaining + calls to lookup_one_len may need to be changed to pass an intent. We are + currently introducing VFS changes to fs/namei.c's do_path_lookup() to + allow proper file lookup and opening in stackable file systems. + + +For more information, see . diff --git a/Documentation/filesystems/unionfs/rename.txt b/Documentation/filesystems/unionfs/rename.txt new file mode 100644 index 0000000..e20bb82 --- /dev/null +++ b/Documentation/filesystems/unionfs/rename.txt @@ -0,0 +1,31 @@ +Rename is a complex beast. The following table shows which rename(2) operations +should succeed and which should fail. + +o: success +E: error (either unionfs or vfs) +X: EXDEV + +none = file does not exist +file = file is a file +dir = file is a empty directory +child= file is a non-empty directory +wh = file is a directory containing only whiteouts; this makes it logically + empty + + none file dir child wh +file o o E E E +dir o E o E o +child X E X E X +wh o E o E o + + +Renaming directories: +===================== + +Whenever a empty (either physically or logically) directory is being renamed, +the following sequence of events should take place: + +1) Remove whiteouts from both source and destination directory +2) Rename source to destination +3) Make destination opaque to prevent anything under it from showing up + diff --git a/Documentation/filesystems/unionfs/usage.txt b/Documentation/filesystems/unionfs/usage.txt new file mode 100644 index 0000000..2316670 --- /dev/null +++ b/Documentation/filesystems/unionfs/usage.txt @@ -0,0 +1,97 @@ +Unionfs is a stackable unification file system, which can appear to merge +the contents of several directories (branches), while keeping their physical +content separate. Unionfs is useful for unified source tree management, +merged contents of split CD-ROM, merged separate software package +directories, data grids, and more. Unionfs allows any mix of read-only and +read-write branches, as well as insertion and deletion of branches anywhere +in the fan-out. To maintain Unix semantics, Unionfs handles elimination of +duplicates, partial-error conditions, and more. + +# mount -t unionfs -o branch-option[,union-options[,...]] none MOUNTPOINT + +The available branch-option for the mount command is: + + dirs=branch[=ro|=rw][:...] + +specifies a separated list of which directories compose the union. +Directories that come earlier in the list have a higher precedence than +those which come later. Additionally, read-only or read-write permissions of +the branch can be specified by appending =ro or =rw (default) to each +directory. + +Syntax: + + dirs=/branch1[=ro|=rw]:/branch2[=ro|=rw]:...:/branchN[=ro|=rw] + +Example: + + dirs=/writable_branch=rw:/read-only_branch=ro + + +DYNAMIC BRANCH MANAGEMENT AND REMOUNTS +====================================== + +You can remount a union and change its overall mode, or reconfigure the +branches, as follows. + +To downgrade a union from read-write to read-only: + +# mount -t unionfs -o remount,ro none MOUNTPOINT + +To upgrade a union from read-only to read-write: + +# mount -t unionfs -o remount,rw none MOUNTPOINT + +To delete a branch /foo, regardless where it is in the current union: + +# mount -t unionfs -o remount,del=/foo none MOUNTPOINT + +To insert (add) a branch /foo before /bar: + +# mount -t unionfs -o remount,add=/bar:/foo none MOUNTPOINT + +To insert (add) a branch /foo (with the "rw" mode flag) before /bar: + +# mount -t unionfs -o remount,add=/bar:/foo=rw none MOUNTPOINT + +To insert (add) a branch /foo (in "rw" mode) at the very beginning (i.e., a +new highest-priority branch), you can use the above syntax, or use a short +hand version as follows: + +# mount -t unionfs -o remount,add=/foo none MOUNTPOINT + +To append a branch to the very end (new lowest-priority branch): + +# mount -t unionfs -o remount,add=:/foo none MOUNTPOINT + +To append a branch to the very end (new lowest-priority branch), in +read-only mode: + +# mount -t unionfs -o remount,add=:/foo=ro none MOUNTPOINT + +Finally, to change the mode of one existing branch, say /foo, from read-only +to read-write, and change /bar from read-write to read-only: + +# mount -t unionfs -o remount,mode=/foo=rw,mode=/bar=ro none MOUNTPOINT + + +CACHE CONSISTENCY +================= + +If you modify any file on any of the lower branches directly, while there is +a Unionfs 2.0 mounted above any of those branches, you should tell Unionfs +to purge its caches and re-get the objects. To do that, you have to +increment the generation number of the superblock using the following +command: + +# mount -t unionfs -o remount,incgen none MOUNTPOINT + +Note that the older way of incrementing the generation number using an +ioctl, is no longer supported in Unionfs 2.0. Ioctls in general are not +encouraged. Plus, an ioctl is per-file concept, whereas the generation +number is a per-file-system concept. Worse, such an ioctl requires an open +file, which then has to be invalidated by the very nature of the generation +number increase (read: the old generation increase ioctl was pretty racy). + + +For more information, see . diff --git a/MAINTAINERS b/MAINTAINERS index 277877a..d694ced 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3364,6 +3364,15 @@ L: linux-kernel@vger.kernel.org W: http://www.kernel.dk S: Maintained +UNIONFS +P: Erez Zadok +M: ezk@cs.sunysb.edu +P: Josef "Jeff" Sipek +M: jsipek@cs.sunysb.edu +L: unionfs@filesystems.org +W: http://unionfs.filesystems.org +S: Maintained + USB ACM DRIVER P: Oliver Neukum M: oliver@neukum.name diff --git a/fs/Kconfig b/fs/Kconfig index 3c4886b..ab68288 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1034,6 +1034,47 @@ config CONFIGFS_FS endmenu +menu "Layered filesystems" + +config ECRYPT_FS + tristate "eCrypt filesystem layer support (EXPERIMENTAL)" + depends on EXPERIMENTAL && KEYS && CRYPTO && NET + help + Encrypted filesystem that operates on the VFS layer. See + to learn more about + eCryptfs. Userspace components are required and can be + obtained from . + + To compile this file system support as a module, choose M here: the + module will be called ecryptfs. + +config UNION_FS + tristate "Union file system (EXPERIMENTAL)" + depends on EXPERIMENTAL + help + Unionfs is a stackable unification file system, which appears to + merge the contents of several directories (branches), while keeping + their physical content separate. + + See for details + +config UNION_FS_XATTR + bool "Unionfs extended attributes" + depends on UNION_FS + help + Extended attributes are name:value pairs associated with inodes by + the kernel or by users (see the attr(5) manual page). + + If unsure, say N. + +config UNION_FS_DEBUG + bool "Debug Unionfs" + depends on UNION_FS + help + If you say Y here, you can turn on debugging output from Unionfs. + +endmenu + menu "Miscellaneous filesystems" config ADFS_FS @@ -1086,18 +1127,6 @@ config AFFS_FS To compile this file system support as a module, choose M here: the module will be called affs. If unsure, say N. -config ECRYPT_FS - tristate "eCrypt filesystem layer support (EXPERIMENTAL)" - depends on EXPERIMENTAL && KEYS && CRYPTO && NET - help - Encrypted filesystem that operates on the VFS layer. See - to learn more about - eCryptfs. Userspace components are required and can be - obtained from . - - To compile this file system support as a module, choose M here: the - module will be called ecryptfs. - config HFS_FS tristate "Apple Macintosh file system support (EXPERIMENTAL)" depends on BLOCK && EXPERIMENTAL diff --git a/fs/Makefile b/fs/Makefile index 9edf411..b490b1a 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -114,3 +114,4 @@ obj-$(CONFIG_HPPFS) += hppfs/ obj-$(CONFIG_DEBUG_FS) += debugfs/ obj-$(CONFIG_OCFS2_FS) += ocfs2/ obj-$(CONFIG_GFS2_FS) += gfs2/ +obj-$(CONFIG_UNION_FS) += unionfs/ diff --git a/fs/drop_caches.c b/fs/drop_caches.c index 03ea769..6a7aa05 100644 --- a/fs/drop_caches.c +++ b/fs/drop_caches.c @@ -3,6 +3,7 @@ */ #include +#include #include #include #include @@ -12,7 +13,7 @@ /* A global variable is a bit ugly, but it keeps the code simple */ int sysctl_drop_caches; -static void drop_pagecache_sb(struct super_block *sb) +void drop_pagecache_sb(struct super_block *sb) { struct inode *inode; @@ -24,6 +25,7 @@ static void drop_pagecache_sb(struct super_block *sb) } spin_unlock(&inode_lock); } +EXPORT_SYMBOL(drop_pagecache_sb); void drop_pagecache(void) { diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c index cb20b96..a8c1686 100644 --- a/fs/ecryptfs/dentry.c +++ b/fs/ecryptfs/dentry.c @@ -62,7 +62,7 @@ static int ecryptfs_d_revalidate(struct dentry *dentry, struct nameidata *nd) struct inode *lower_inode = ecryptfs_inode_to_lower(dentry->d_inode); - fsstack_copy_attr_all(dentry->d_inode, lower_inode, NULL); + fsstack_copy_attr_all(dentry->d_inode, lower_inode); } out: return rc; diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 1548be2..d37cc12 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -280,7 +280,9 @@ static struct dentry *ecryptfs_lookup(struct inode *dir, struct dentry *dentry, int rc = 0; struct dentry *lower_dir_dentry; struct dentry *lower_dentry; + struct dentry *dentry_save; struct vfsmount *lower_mnt; + struct vfsmount *mnt_save; char *encoded_name; unsigned int encoded_namelen; struct ecryptfs_crypt_stat *crypt_stat = NULL; @@ -308,9 +310,13 @@ static struct dentry *ecryptfs_lookup(struct inode *dir, struct dentry *dentry, } ecryptfs_printk(KERN_DEBUG, "encoded_name = [%s]; encoded_namelen " "= [%d]\n", encoded_name, encoded_namelen); - lower_dentry = lookup_one_len(encoded_name, lower_dir_dentry, - encoded_namelen - 1); + dentry_save = nd->dentry; + mnt_save = nd->mnt; + lower_dentry = lookup_one_len_nd(encoded_name, lower_dir_dentry, + (encoded_namelen - 1), nd); kfree(encoded_name); + nd->mnt = mnt_save; + nd->dentry = dentry_save; if (IS_ERR(lower_dentry)) { ecryptfs_printk(KERN_ERR, "ERR from lower_dentry\n"); rc = PTR_ERR(lower_dentry); @@ -597,9 +603,9 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, lower_new_dir_dentry->d_inode, lower_new_dentry); if (rc) goto out_lock; - fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode, NULL); + fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode); if (new_dir != old_dir) - fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode, NULL); + fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode); out_lock: unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry); dput(lower_new_dentry->d_parent); @@ -892,7 +898,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia) } rc = notify_change(lower_dentry, ia); out: - fsstack_copy_attr_all(inode, lower_inode, NULL); + fsstack_copy_attr_all(inode, lower_inode); return rc; } diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c index fc4a3a2..07c3a58 100644 --- a/fs/ecryptfs/main.c +++ b/fs/ecryptfs/main.c @@ -151,7 +151,7 @@ int ecryptfs_interpose(struct dentry *lower_dentry, struct dentry *dentry, d_add(dentry, inode); else d_instantiate(dentry, inode); - fsstack_copy_attr_all(inode, lower_inode, NULL); + fsstack_copy_attr_all(inode, lower_inode); /* This size will be overwritten for real files w/ headers and * other metadata */ fsstack_copy_inode_size(inode, lower_inode); diff --git a/fs/namei.c b/fs/namei.c index ee60cc4..436e9fa 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1125,6 +1125,10 @@ static int fastcall do_path_lookup(int dfd, const char *name, nd->mnt = mntget(fs->rootmnt); nd->dentry = dget(fs->root); read_unlock(&fs->lock); + } else if (flags & LOOKUP_ONE) { + /* nd->mnt and nd->dentry already set, just grab references */ + mntget(nd->mnt); + dget(nd->dentry); } else if (dfd == AT_FDCWD) { read_lock(&fs->lock); nd->mnt = mntget(fs->pwdmnt); @@ -1293,29 +1297,37 @@ static struct dentry *lookup_hash(struct nameidata *nd) } /* SMP-safe */ -struct dentry * lookup_one_len(const char * name, struct dentry * base, int len) +static inline int __lookup_one_len(const char *name, struct qstr *this, struct dentry *base, int len) { unsigned long hash; - struct qstr this; unsigned int c; - this.name = name; - this.len = len; + this->name = name; + this->len = len; if (!len) - goto access; + return -EACCES; hash = init_name_hash(); while (len--) { c = *(const unsigned char *)name++; if (c == '/' || c == '\0') - goto access; + return -EACCES; hash = partial_name_hash(c, hash); } - this.hash = end_name_hash(hash); + this->hash = end_name_hash(hash); + return 0; +} + +struct dentry *lookup_one_len_nd(const char *name, struct dentry *base, + int len, struct nameidata *nd) +{ + int err; + struct qstr this; - return __lookup_hash(&this, base, NULL); -access: - return ERR_PTR(-EACCES); + err = __lookup_one_len(name, &this, base, len); + if (err) + return ERR_PTR(err); + return __lookup_hash(&this, base, nd); } /* @@ -2758,7 +2770,7 @@ EXPORT_SYMBOL(follow_up); EXPORT_SYMBOL(get_write_access); /* binfmt_aout */ EXPORT_SYMBOL(getname); EXPORT_SYMBOL(lock_rename); -EXPORT_SYMBOL(lookup_one_len); +EXPORT_SYMBOL(lookup_one_len_nd); EXPORT_SYMBOL(page_follow_link_light); EXPORT_SYMBOL(page_put_link); EXPORT_SYMBOL(page_readlink); diff --git a/fs/stack.c b/fs/stack.c index 67716f6..a548aac 100644 --- a/fs/stack.c +++ b/fs/stack.c @@ -1,8 +1,20 @@ +/* + * Copyright (c) 2006-2007 Erez Zadok + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek + * Copyright (c) 2006-2007 Stony Brook University + * Copyright (c) 2006-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + #include #include #include -/* does _NOT_ require i_mutex to be held. +/* + * does _NOT_ require i_mutex to be held. * * This function cannot be inlined since i_size_{read,write} is rather * heavy-weight on 32-bit systems @@ -14,11 +26,11 @@ void fsstack_copy_inode_size(struct inode *dst, const struct inode *src) } EXPORT_SYMBOL_GPL(fsstack_copy_inode_size); -/* copy all attributes; get_nlinks is optional way to override the i_nlink +/* + * copy all attributes; get_nlinks is optional way to override the i_nlink * copying */ -void fsstack_copy_attr_all(struct inode *dest, const struct inode *src, - int (*get_nlinks)(struct inode *)) +void fsstack_copy_attr_all(struct inode *dest, const struct inode *src) { dest->i_mode = src->i_mode; dest->i_uid = src->i_uid; @@ -29,14 +41,6 @@ void fsstack_copy_attr_all(struct inode *dest, const struct inode *src, dest->i_ctime = src->i_ctime; dest->i_blkbits = src->i_blkbits; dest->i_flags = src->i_flags; - - /* - * Update the nlinks AFTER updating the above fields, because the - * get_links callback may depend on them. - */ - if (!get_nlinks) - dest->i_nlink = src->i_nlink; - else - dest->i_nlink = (*get_nlinks)(dest); + dest->i_nlink = src->i_nlink; } EXPORT_SYMBOL_GPL(fsstack_copy_attr_all); diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile new file mode 100644 index 0000000..b05549a --- /dev/null +++ b/fs/unionfs/Makefile @@ -0,0 +1,13 @@ +UNIONFS_VERSION="2.1.2 (for 2.6.21.7)" + +EXTRA_CFLAGS += -DUNIONFS_VERSION=\"$(UNIONFS_VERSION)\" + +obj-$(CONFIG_UNION_FS) += unionfs.o + +unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \ + rdstate.o copyup.o dirhelper.o rename.o unlink.o \ + lookup.o commonfops.o dirfops.o sioq.o mmap.o + +unionfs-$(CONFIG_UNION_FS_XATTR) += xattr.o + +unionfs-$(CONFIG_UNION_FS_DEBUG) += debug.o diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c new file mode 100644 index 0000000..d77608e --- /dev/null +++ b/fs/unionfs/commonfops.c @@ -0,0 +1,837 @@ +/* + * Copyright (c) 2003-2007 Erez Zadok + * Copyright (c) 2003-2006 Charles P. Wright + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek + * Copyright (c) 2005-2006 Junjiro Okajima + * Copyright (c) 2005 Arun M. Krishnakumar + * Copyright (c) 2004-2006 David P. Quigley + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair + * Copyright (c) 2003 Puja Gupta + * Copyright (c) 2003 Harikesavan Krishnan + * Copyright (c) 2003-2007 Stony Brook University + * Copyright (c) 2003-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include "union.h" + +/* + * 1) Copyup the file + * 2) Rename the file to '.unionfs' - obviously + * stolen from NFS's silly rename + */ +static int copyup_deleted_file(struct file *file, struct dentry *dentry, + int bstart, int bindex) +{ + static unsigned int counter; + const int i_inosize = sizeof(dentry->d_inode->i_ino) * 2; + const int countersize = sizeof(counter) * 2; + const int nlen = sizeof(".unionfs") + i_inosize + countersize - 1; + char name[nlen + 1]; + int err; + struct dentry *tmp_dentry = NULL; + struct dentry *lower_dentry; + struct dentry *lower_dir_dentry = NULL; + + lower_dentry = unionfs_lower_dentry_idx(dentry, bstart); + + sprintf(name, ".unionfs%*.*lx", + i_inosize, i_inosize, lower_dentry->d_inode->i_ino); + + /* + * Loop, looking for an unused temp name to copyup to. + * + * It's somewhat silly that we look for a free temp tmp name in the + * source branch (bstart) instead of the dest branch (bindex), where + * the final name will be created. We _will_ catch it if somehow + * the name exists in the dest branch, but it'd be nice to catch it + * sooner than later. + */ +retry: + tmp_dentry = NULL; + do { + char *suffix = name + nlen - countersize; + + dput(tmp_dentry); + counter++; + sprintf(suffix, "%*.*x", countersize, countersize, counter); + + printk(KERN_DEBUG "unionfs: trying to rename %s to %s\n", + dentry->d_name.name, name); + + tmp_dentry = lookup_one_len(name, lower_dentry->d_parent, + nlen); + if (IS_ERR(tmp_dentry)) { + err = PTR_ERR(tmp_dentry); + goto out; + } + /* don't dput here because of do-while condition eval order */ + } while (tmp_dentry->d_inode != NULL); /* need negative dentry */ + dput(tmp_dentry); + + err = copyup_named_file(dentry->d_parent->d_inode, file, name, bstart, + bindex, file->f_path.dentry->d_inode->i_size); + if (err) { + if (err == -EEXIST) + goto retry; + goto out; + } + + /* bring it to the same state as an unlinked file */ + lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry)); + if (!unionfs_lower_inode_idx(dentry->d_inode, bindex)) { + atomic_inc(&lower_dentry->d_inode->i_count); + unionfs_set_lower_inode_idx(dentry->d_inode, bindex, + lower_dentry->d_inode); + } + lower_dir_dentry = lock_parent(lower_dentry); + err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry); + unlock_dir(lower_dir_dentry); + +out: + if (!err) + unionfs_check_dentry(dentry); + return err; +} + +/* + * put all references held by upper struct file and free lower file pointer + * array + */ +static void cleanup_file(struct file *file) +{ + int bindex, bstart, bend; + struct file **lower_files; + struct file *lower_file; + struct super_block *sb = file->f_path.dentry->d_sb; + + lower_files = UNIONFS_F(file)->lower_files; + bstart = fbstart(file); + bend = fbend(file); + + for (bindex = bstart; bindex <= bend; bindex++) { + int i; /* holds (possibly) updated branch index */ + int old_bid; + + lower_file = unionfs_lower_file_idx(file, bindex); + if (!lower_file) + continue; + + /* + * Find new index of matching branch with an open + * file, since branches could have been added or + * deleted causing the one with open files to shift. + */ + old_bid = UNIONFS_F(file)->saved_branch_ids[bindex]; + i = branch_id_to_idx(sb, old_bid); + if (i < 0) { + printk(KERN_ERR "unionfs: no superblock for " + "file %p\n", file); + continue; + } + + /* decrement count of open files */ + branchput(sb, i); + /* + * fput will perform an mntput for us on the correct branch. + * Although we're using the file's old branch configuration, + * bindex, which is the old index, correctly points to the + * right branch in the file's branch list. In other words, + * we're going to mntput the correct branch even if branches + * have been added/removed. + */ + fput(lower_file); + UNIONFS_F(file)->lower_files[bindex] = NULL; + UNIONFS_F(file)->saved_branch_ids[bindex] = -1; + } + + UNIONFS_F(file)->lower_files = NULL; + kfree(lower_files); + kfree(UNIONFS_F(file)->saved_branch_ids); + /* set to NULL because caller needs to know if to kfree on error */ + UNIONFS_F(file)->saved_branch_ids = NULL; +} + +/* open all lower files for a given file */ +static int open_all_files(struct file *file) +{ + int bindex, bstart, bend, err = 0; + struct file *lower_file; + struct dentry *lower_dentry; + struct dentry *dentry = file->f_path.dentry; + struct super_block *sb = dentry->d_sb; + + bstart = dbstart(dentry); + bend = dbend(dentry); + + for (bindex = bstart; bindex <= bend; bindex++) { + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex); + if (!lower_dentry) + continue; + + dget(lower_dentry); + unionfs_mntget(dentry, bindex); + branchget(sb, bindex); + + lower_file = + dentry_open(lower_dentry, + unionfs_lower_mnt_idx(dentry, bindex), + file->f_flags); + if (IS_ERR(lower_file)) { + err = PTR_ERR(lower_file); + goto out; + } else + unionfs_set_lower_file_idx(file, bindex, lower_file); + } +out: + return err; +} + +/* open the highest priority file for a given upper file */ +static int open_highest_file(struct file *file, int willwrite) +{ + int bindex, bstart, bend, err = 0; + struct file *lower_file; + struct dentry *lower_dentry; + struct dentry *dentry = file->f_path.dentry; + struct inode *parent_inode = dentry->d_parent->d_inode; + struct super_block *sb = dentry->d_sb; + size_t inode_size = dentry->d_inode->i_size; + + bstart = dbstart(dentry); + bend = dbend(dentry); + + lower_dentry = unionfs_lower_dentry(dentry); + if (willwrite && IS_WRITE_FLAG(file->f_flags) && is_robranch(dentry)) { + for (bindex = bstart - 1; bindex >= 0; bindex--) { + err = copyup_file(parent_inode, file, bstart, bindex, + inode_size); + if (!err) + break; + } + atomic_set(&UNIONFS_F(file)->generation, + atomic_read(&UNIONFS_I(dentry->d_inode)-> + generation)); + goto out; + } + + dget(lower_dentry); + unionfs_mntget(dentry, bstart); + lower_file = dentry_open(lower_dentry, + unionfs_lower_mnt_idx(dentry, bstart), + file->f_flags); + if (IS_ERR(lower_file)) { + err = PTR_ERR(lower_file); + goto out; + } + branchget(sb, bstart); + unionfs_set_lower_file(file, lower_file); + /* Fix up the position. */ + lower_file->f_pos = file->f_pos; + + memcpy(&lower_file->f_ra, &file->f_ra, sizeof(struct file_ra_state)); +out: + return err; +} + +/* perform a delayed copyup of a read-write file on a read-only branch */ +static int do_delayed_copyup(struct file *file) +{ + int bindex, bstart, bend, err = 0; + struct dentry *dentry = file->f_path.dentry; + struct inode *parent_inode = dentry->d_parent->d_inode; + loff_t inode_size = dentry->d_inode->i_size; + + bstart = fbstart(file); + bend = fbend(file); + + BUG_ON(!S_ISREG(dentry->d_inode->i_mode)); + + unionfs_check_file(file); + unionfs_check_dentry(dentry); + for (bindex = bstart - 1; bindex >= 0; bindex--) { + if (!d_deleted(dentry)) + err = copyup_file(parent_inode, file, bstart, + bindex, inode_size); + else + err = copyup_deleted_file(file, dentry, bstart, + bindex); + + if (!err) + break; + } + if (err || (bstart <= fbstart(file))) + goto out; + bend = fbend(file); + for (bindex = bstart; bindex <= bend; bindex++) { + if (unionfs_lower_file_idx(file, bindex)) { + branchput(dentry->d_sb, bindex); + fput(unionfs_lower_file_idx(file, bindex)); + unionfs_set_lower_file_idx(file, bindex, NULL); + } + if (unionfs_lower_mnt_idx(dentry, bindex)) { + unionfs_mntput(dentry, bindex); + unionfs_set_lower_mnt_idx(dentry, bindex, NULL); + } + if (unionfs_lower_dentry_idx(dentry, bindex)) { + BUG_ON(!dentry->d_inode); + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex)); + unionfs_set_lower_inode_idx(dentry->d_inode, bindex, + NULL); + dput(unionfs_lower_dentry_idx(dentry, bindex)); + unionfs_set_lower_dentry_idx(dentry, bindex, NULL); + } + } + /* for reg file, we only open it "once" */ + fbend(file) = fbstart(file); + set_dbend(dentry, dbstart(dentry)); + ibend(dentry->d_inode) = ibstart(dentry->d_inode); + +out: + unionfs_check_file(file); + unionfs_check_dentry(dentry); + return err; +} + +/* + * Revalidate the struct file + * @file: file to revalidate + * @willwrite: 1 if caller may cause changes to the file; 0 otherwise. + */ +int unionfs_file_revalidate(struct file *file, int willwrite) +{ + struct super_block *sb; + struct dentry *dentry; + int sbgen, fgen, dgen; + int bstart, bend; + int size; + int err = 0; + + dentry = file->f_path.dentry; + unionfs_lock_dentry(dentry); + sb = dentry->d_sb; + + /* + * First revalidate the dentry inside struct file, + * but not unhashed dentries. + */ + if (!d_deleted(dentry) && + !__unionfs_d_revalidate_chain(dentry, NULL, willwrite)) { + err = -ESTALE; + goto out_nofree; + } + + sbgen = atomic_read(&UNIONFS_SB(sb)->generation); + dgen = atomic_read(&UNIONFS_D(dentry)->generation); + fgen = atomic_read(&UNIONFS_F(file)->generation); + + BUG_ON(sbgen > dgen); + + /* + * There are two cases we are interested in. The first is if the + * generation is lower than the super-block. The second is if + * someone has copied up this file from underneath us, we also need + * to refresh things. + */ + if (!d_deleted(dentry) && + (sbgen > fgen || dbstart(dentry) != fbstart(file))) { + int orig_brid = /* save orig branch ID */ + UNIONFS_F(file)->saved_branch_ids[fbstart(file)]; + + /* First we throw out the existing files. */ + cleanup_file(file); + + /* Now we reopen the file(s) as in unionfs_open. */ + bstart = fbstart(file) = dbstart(dentry); + bend = fbend(file) = dbend(dentry); + + size = sizeof(struct file *) * sbmax(sb); + UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL); + if (!UNIONFS_F(file)->lower_files) { + err = -ENOMEM; + goto out; + } + size = sizeof(int) * sbmax(sb); + UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL); + if (!UNIONFS_F(file)->saved_branch_ids) { + err = -ENOMEM; + goto out; + } + + if (S_ISDIR(dentry->d_inode->i_mode)) { + /* We need to open all the files. */ + err = open_all_files(file); + if (err) + goto out; + } else { + int new_brid; + /* We only open the highest priority branch. */ + err = open_highest_file(file, willwrite); + if (err) + goto out; + new_brid = UNIONFS_F(file)-> + saved_branch_ids[fbstart(file)]; + if (new_brid != orig_brid && sbgen > fgen) { + /* + * If we re-opened the file on a different + * branch than the original one, and this + * was due to a new branch inserted, then + * update the mnt counts of the old and new + * branches accordingly. + */ + unionfs_mntget(dentry, bstart); /* new branch */ + unionfs_mntput(sb->s_root, /* orig branch */ + branch_id_to_idx(sb, orig_brid)); + } + } + atomic_set(&UNIONFS_F(file)->generation, + atomic_read(&UNIONFS_I(dentry->d_inode)-> + generation)); + } + + /* Copyup on the first write to a file on a readonly branch. */ + if (willwrite && IS_WRITE_FLAG(file->f_flags) && + !IS_WRITE_FLAG(unionfs_lower_file(file)->f_flags) && + is_robranch(dentry)) { + printk(KERN_DEBUG "unionfs: do delay copyup of \"%s\"\n", + dentry->d_name.name); + err = do_delayed_copyup(file); + } + +out: + if (err) { + kfree(UNIONFS_F(file)->lower_files); + kfree(UNIONFS_F(file)->saved_branch_ids); + } +out_nofree: + if (!err) + unionfs_check_file(file); + unionfs_unlock_dentry(dentry); + return err; +} + +/* unionfs_open helper function: open a directory */ +static int __open_dir(struct inode *inode, struct file *file) +{ + struct dentry *lower_dentry; + struct file *lower_file; + int bindex, bstart, bend; + + bstart = fbstart(file) = dbstart(file->f_path.dentry); + bend = fbend(file) = dbend(file->f_path.dentry); + + for (bindex = bstart; bindex <= bend; bindex++) { + lower_dentry = + unionfs_lower_dentry_idx(file->f_path.dentry, bindex); + if (!lower_dentry) + continue; + + dget(lower_dentry); + unionfs_mntget(file->f_path.dentry, bindex); + lower_file = dentry_open(lower_dentry, + unionfs_lower_mnt_idx(file->f_path.dentry, + bindex), + file->f_flags); + if (IS_ERR(lower_file)) + return PTR_ERR(lower_file); + + unionfs_set_lower_file_idx(file, bindex, lower_file); + + /* + * The branchget goes after the open, because otherwise + * we would miss the reference on release. + */ + branchget(inode->i_sb, bindex); + } + + return 0; +} + +/* unionfs_open helper function: open a file */ +static int __open_file(struct inode *inode, struct file *file) +{ + struct dentry *lower_dentry; + struct file *lower_file; + int lower_flags; + int bindex, bstart, bend; + + lower_dentry = unionfs_lower_dentry(file->f_path.dentry); + lower_flags = file->f_flags; + + bstart = fbstart(file) = dbstart(file->f_path.dentry); + bend = fbend(file) = dbend(file->f_path.dentry); + + /* + * check for the permission for lower file. If the error is + * COPYUP_ERR, copyup the file. + */ + if (lower_dentry->d_inode && is_robranch(file->f_path.dentry)) { + /* + * if the open will change the file, copy it up otherwise + * defer it. + */ + if (lower_flags & O_TRUNC) { + int size = 0; + int err = -EROFS; + + /* copyup the file */ + for (bindex = bstart - 1; bindex >= 0; bindex--) { + err = copyup_file( + file->f_path.dentry->d_parent->d_inode, + file, bstart, bindex, size); + if (!err) + break; + } + return err; + } else + lower_flags &= ~(OPEN_WRITE_FLAGS); + } + + dget(lower_dentry); + + /* + * dentry_open will decrement mnt refcnt if err. + * otherwise fput() will do an mntput() for us upon file close. + */ + unionfs_mntget(file->f_path.dentry, bstart); + lower_file = + dentry_open(lower_dentry, + unionfs_lower_mnt_idx(file->f_path.dentry, bstart), + lower_flags); + if (IS_ERR(lower_file)) + return PTR_ERR(lower_file); + + unionfs_set_lower_file(file, lower_file); + branchget(inode->i_sb, bstart); + + return 0; +} + +int unionfs_open(struct inode *inode, struct file *file) +{ + int err = 0; + struct file *lower_file = NULL; + struct dentry *dentry = NULL; + int bindex = 0, bstart = 0, bend = 0; + int size; + + unionfs_read_lock(inode->i_sb); + + file->private_data = + kzalloc(sizeof(struct unionfs_file_info), GFP_KERNEL); + if (!UNIONFS_F(file)) { + err = -ENOMEM; + goto out_nofree; + } + fbstart(file) = -1; + fbend(file) = -1; + atomic_set(&UNIONFS_F(file)->generation, + atomic_read(&UNIONFS_I(inode)->generation)); + + size = sizeof(struct file *) * sbmax(inode->i_sb); + UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL); + if (!UNIONFS_F(file)->lower_files) { + err = -ENOMEM; + goto out; + } + size = sizeof(int) * sbmax(inode->i_sb); + UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL); + if (!UNIONFS_F(file)->saved_branch_ids) { + err = -ENOMEM; + goto out; + } + + dentry = file->f_path.dentry; + unionfs_lock_dentry(dentry); + + bstart = fbstart(file) = dbstart(dentry); + bend = fbend(file) = dbend(dentry); + + /* increment, so that we can flush appropriately */ + atomic_inc(&UNIONFS_I(dentry->d_inode)->totalopens); + + /* + * open all directories and make the unionfs file struct point to + * these lower file structs + */ + if (S_ISDIR(inode->i_mode)) + err = __open_dir(inode, file); /* open a dir */ + else + err = __open_file(inode, file); /* open a file */ + + /* freeing the allocated resources, and fput the opened files */ + if (err) { + atomic_dec(&UNIONFS_I(dentry->d_inode)->totalopens); + for (bindex = bstart; bindex <= bend; bindex++) { + lower_file = unionfs_lower_file_idx(file, bindex); + if (!lower_file) + continue; + + branchput(file->f_path.dentry->d_sb, bindex); + /* fput calls dput for lower_dentry */ + fput(lower_file); + } + } + + unionfs_unlock_dentry(dentry); + +out: + if (err) { + kfree(UNIONFS_F(file)->lower_files); + kfree(UNIONFS_F(file)->saved_branch_ids); + kfree(UNIONFS_F(file)); + } +out_nofree: + unionfs_read_unlock(inode->i_sb); + unionfs_check_inode(inode); + if (!err) { + unionfs_check_file(file); + unionfs_check_dentry(file->f_path.dentry->d_parent); + } + return err; +} + +/* + * release all lower object references & free the file info structure + * + * No need to grab sb info's rwsem. + */ +int unionfs_file_release(struct inode *inode, struct file *file) +{ + struct file *lower_file = NULL; + struct unionfs_file_info *fileinfo; + struct unionfs_inode_info *inodeinfo; + struct super_block *sb = inode->i_sb; + int bindex, bstart, bend; + int fgen, err = 0; + + unionfs_read_lock(sb); + /* + * Yes, we have to revalidate this file even if it's being released. + * This is important for open-but-unlinked files, as well as mmap + * support. + */ + if ((err = unionfs_file_revalidate(file, 1))) + goto out; + unionfs_check_file(file); + fileinfo = UNIONFS_F(file); + BUG_ON(file->f_path.dentry->d_inode != inode); + inodeinfo = UNIONFS_I(inode); + + /* fput all the lower files */ + fgen = atomic_read(&fileinfo->generation); + bstart = fbstart(file); + bend = fbend(file); + + for (bindex = bstart; bindex <= bend; bindex++) { + lower_file = unionfs_lower_file_idx(file, bindex); + + if (lower_file) { + fput(lower_file); + branchput(sb, bindex); + } + } + kfree(fileinfo->lower_files); + kfree(fileinfo->saved_branch_ids); + + if (fileinfo->rdstate) { + fileinfo->rdstate->access = jiffies; + printk(KERN_DEBUG "unionfs: saving rdstate with cookie " + "%u [%d.%lld]\n", + fileinfo->rdstate->cookie, + fileinfo->rdstate->bindex, + (long long)fileinfo->rdstate->dirpos); + spin_lock(&inodeinfo->rdlock); + inodeinfo->rdcount++; + list_add_tail(&fileinfo->rdstate->cache, + &inodeinfo->readdircache); + mark_inode_dirty(inode); + spin_unlock(&inodeinfo->rdlock); + fileinfo->rdstate = NULL; + } + kfree(fileinfo); + +out: + unionfs_read_unlock(sb); + return err; +} + +/* pass the ioctl to the lower fs */ +static long do_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct file *lower_file; + int err; + + lower_file = unionfs_lower_file(file); + + err = security_file_ioctl(lower_file, cmd, arg); + if (err) + goto out; + + err = -ENOTTY; + if (!lower_file || !lower_file->f_op) + goto out; + if (lower_file->f_op->unlocked_ioctl) { + err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg); + } else if (lower_file->f_op->ioctl) { + lock_kernel(); + err = lower_file->f_op->ioctl(lower_file->f_path.dentry->d_inode, + lower_file, cmd, arg); + unlock_kernel(); + } + +out: + return err; +} + +/* + * return to user-space the branch indices containing the file in question + * + * We use fd_set and therefore we are limited to the number of the branches + * to FD_SETSIZE, which is currently 1024 - plenty for most people + */ +static int unionfs_ioctl_queryfile(struct file *file, unsigned int cmd, + unsigned long arg) +{ + int err = 0; + fd_set branchlist; + int bstart = 0, bend = 0, bindex = 0; + int orig_bstart, orig_bend; + struct dentry *dentry, *lower_dentry; + struct vfsmount *mnt; + + dentry = file->f_path.dentry; + unionfs_lock_dentry(dentry); + orig_bstart = dbstart(dentry); + orig_bend = dbend(dentry); + if ((err = unionfs_partial_lookup(dentry))) + goto out; + bstart = dbstart(dentry); + bend = dbend(dentry); + + FD_ZERO(&branchlist); + + for (bindex = bstart; bindex <= bend; bindex++) { + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex); + if (!lower_dentry) + continue; + if (lower_dentry->d_inode) + FD_SET(bindex, &branchlist); + /* purge any lower objects after partial_lookup */ + if (bindex < orig_bstart || bindex > orig_bend) { + dput(lower_dentry); + unionfs_set_lower_dentry_idx(dentry, bindex, NULL); + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex)); + unionfs_set_lower_inode_idx(dentry->d_inode, bindex, + NULL); + mnt = unionfs_lower_mnt_idx(dentry, bindex); + if (!mnt) + continue; + unionfs_mntput(dentry, bindex); + unionfs_set_lower_mnt_idx(dentry, bindex, NULL); + } + } + /* restore original dentry's offsets */ + set_dbstart(dentry, orig_bstart); + set_dbend(dentry, orig_bend); + ibstart(dentry->d_inode) = orig_bstart; + ibend(dentry->d_inode) = orig_bend; + + err = copy_to_user((void __user *)arg, &branchlist, sizeof(fd_set)); + if (err) + err = -EFAULT; + +out: + unionfs_unlock_dentry(dentry); + return err < 0 ? err : bend; +} + +long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + long err; + + unionfs_read_lock(file->f_path.dentry->d_sb); + + if ((err = unionfs_file_revalidate(file, 1))) + goto out; + + /* check if asked for local commands */ + switch (cmd) { + case UNIONFS_IOCTL_INCGEN: + /* Increment the superblock generation count */ + printk("unionfs: incgen ioctl deprecated; " + "use \"-o remount,incgen\"\n"); + err = -ENOSYS; + break; + + case UNIONFS_IOCTL_QUERYFILE: + /* Return list of branches containing the given file */ + err = unionfs_ioctl_queryfile(file, cmd, arg); + break; + + default: + /* pass the ioctl down */ + err = do_ioctl(file, cmd, arg); + break; + } + +out: + unionfs_read_unlock(file->f_path.dentry->d_sb); + unionfs_check_file(file); + return err; +} + +int unionfs_flush(struct file *file, fl_owner_t id) +{ + int err = 0; + struct file *lower_file = NULL; + struct dentry *dentry = file->f_path.dentry; + int bindex, bstart, bend; + + unionfs_read_lock(dentry->d_sb); + + if ((err = unionfs_file_revalidate(file, 1))) + goto out; + unionfs_check_file(file); + + if (!atomic_dec_and_test(&UNIONFS_I(dentry->d_inode)->totalopens)) + goto out; + + unionfs_lock_dentry(dentry); + + bstart = fbstart(file); + bend = fbend(file); + for (bindex = bstart; bindex <= bend; bindex++) { + lower_file = unionfs_lower_file_idx(file, bindex); + + if (lower_file && lower_file->f_op && + lower_file->f_op->flush) { + err = lower_file->f_op->flush(lower_file, id); + if (err) + goto out_lock; + + /* if there are no more refs to the dentry, dput it */ + if (d_deleted(dentry)) { + dput(unionfs_lower_dentry_idx(dentry, bindex)); + unionfs_set_lower_dentry_idx(dentry, bindex, + NULL); + } + } + + } + + /* on success, update our times */ + unionfs_copy_attr_times(dentry->d_inode); + /* parent time could have changed too (async) */ + unionfs_copy_attr_times(dentry->d_parent->d_inode); + +out_lock: + unionfs_unlock_dentry(dentry); +out: + unionfs_read_unlock(dentry->d_sb); + unionfs_check_file(file); + return err; +} diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c new file mode 100644 index 0000000..04bedb1 --- /dev/null +++ b/fs/unionfs/copyup.c @@ -0,0 +1,888 @@ +/* + * Copyright (c) 2003-2007 Erez Zadok + * Copyright (c) 2003-2006 Charles P. Wright + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek + * Copyright (c) 2005-2006 Junjiro Okajima + * Copyright (c) 2005 Arun M. Krishnakumar + * Copyright (c) 2004-2006 David P. Quigley + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair + * Copyright (c) 2003 Puja Gupta + * Copyright (c) 2003 Harikesavan Krishnan + * Copyright (c) 2003-2007 Stony Brook University + * Copyright (c) 2003-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include "union.h" + +/* + * For detailed explanation of copyup see: + * Documentation/filesystems/unionfs/concepts.txt + */ + +#ifdef CONFIG_UNION_FS_XATTR +/* copyup all extended attrs for a given dentry */ +static int copyup_xattrs(struct dentry *old_lower_dentry, + struct dentry *new_lower_dentry) +{ + int err = 0; + ssize_t list_size = -1; + char *name_list = NULL; + char *attr_value = NULL; + char *name_list_buf = NULL; + + /* query the actual size of the xattr list */ + list_size = vfs_listxattr(old_lower_dentry, NULL, 0); + if (list_size <= 0) { + err = list_size; + goto out; + } + + /* allocate space for the actual list */ + name_list = unionfs_xattr_alloc(list_size + 1, XATTR_LIST_MAX); + if (!name_list || IS_ERR(name_list)) { + err = PTR_ERR(name_list); + goto out; + } + + name_list_buf = name_list; /* save for kfree at end */ + + /* now get the actual xattr list of the source file */ + list_size = vfs_listxattr(old_lower_dentry, name_list, list_size); + if (list_size <= 0) { + err = list_size; + goto out; + } + + /* allocate space to hold each xattr's value */ + attr_value = unionfs_xattr_alloc(XATTR_SIZE_MAX, XATTR_SIZE_MAX); + if (!attr_value || IS_ERR(attr_value)) { + err = PTR_ERR(name_list); + goto out; + } + + /* in a loop, get and set each xattr from src to dst file */ + while (*name_list) { + ssize_t size; + + /* Lock here since vfs_getxattr doesn't lock for us */ + mutex_lock(&old_lower_dentry->d_inode->i_mutex); + size = vfs_getxattr(old_lower_dentry, name_list, + attr_value, XATTR_SIZE_MAX); + mutex_unlock(&old_lower_dentry->d_inode->i_mutex); + if (size < 0) { + err = size; + goto out; + } + if (size > XATTR_SIZE_MAX) { + err = -E2BIG; + goto out; + } + /* Don't lock here since vfs_setxattr does it for us. */ + err = vfs_setxattr(new_lower_dentry, name_list, attr_value, + size, 0); + /* + * Selinux depends on "security.*" xattrs, so to maintain + * the security of copied-up files, if Selinux is active, + * then we must copy these xattrs as well. So we need to + * temporarily get FOWNER privileges. + * XXX: move entire copyup code to SIOQ. + */ + if (err == -EPERM && !capable(CAP_FOWNER)) { + cap_raise(current->cap_effective, CAP_FOWNER); + err = vfs_setxattr(new_lower_dentry, name_list, + attr_value, size, 0); + cap_lower(current->cap_effective, CAP_FOWNER); + } + if (err < 0) + goto out; + name_list += strlen(name_list) + 1; + } +out: + unionfs_xattr_kfree(name_list_buf); + unionfs_xattr_kfree(attr_value); + /* Ignore if xattr isn't supported */ + if (err == -ENOTSUPP || err == -EOPNOTSUPP) + err = 0; + return err; +} +#endif /* CONFIG_UNION_FS_XATTR */ + +/* + * Determine the mode based on the copyup flags, and the existing dentry. + * + * Handle file systems which may not support certain options. For example + * jffs2 doesn't allow one to chmod a symlink. So we ignore such harmless + * errors, rather than propagating them up, which results in copyup errors + * and errors returned back to users. + */ +static int copyup_permissions(struct super_block *sb, + struct dentry *old_lower_dentry, + struct dentry *new_lower_dentry) +{ + struct inode *i = old_lower_dentry->d_inode; + struct iattr newattrs; + int err; + + newattrs.ia_atime = i->i_atime; + newattrs.ia_mtime = i->i_mtime; + newattrs.ia_ctime = i->i_ctime; + newattrs.ia_gid = i->i_gid; + newattrs.ia_uid = i->i_uid; + newattrs.ia_valid = ATTR_CTIME | ATTR_ATIME | ATTR_MTIME | + ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_FORCE | + ATTR_GID | ATTR_UID; + err = notify_change(new_lower_dentry, &newattrs); + if (err) + goto out; + + /* now try to change the mode and ignore EOPNOTSUPP on symlinks */ + newattrs.ia_mode = i->i_mode; + newattrs.ia_valid = ATTR_MODE | ATTR_FORCE; + err = notify_change(new_lower_dentry, &newattrs); + if (err == -EOPNOTSUPP && + S_ISLNK(new_lower_dentry->d_inode->i_mode)) { + printk(KERN_WARNING + "unionfs: changing \"%s\" symlink mode unsupported\n", + new_lower_dentry->d_name.name); + err = 0; + } + +out: + return err; +} + +/* + * create the new device/file/directory - use copyup_permission to copyup + * times, and mode + * + * if the object being copied up is a regular file, the file is only created, + * the contents have to be copied up separately + */ +static int __copyup_ndentry(struct dentry *old_lower_dentry, + struct dentry *new_lower_dentry, + struct dentry *new_lower_parent_dentry, + char *symbuf) +{ + int err = 0; + umode_t old_mode = old_lower_dentry->d_inode->i_mode; + struct sioq_args args; + + if (S_ISDIR(old_mode)) { + args.mkdir.parent = new_lower_parent_dentry->d_inode; + args.mkdir.dentry = new_lower_dentry; + args.mkdir.mode = old_mode; + + run_sioq(__unionfs_mkdir, &args); + err = args.err; + } else if (S_ISLNK(old_mode)) { + args.symlink.parent = new_lower_parent_dentry->d_inode; + args.symlink.dentry = new_lower_dentry; + args.symlink.symbuf = symbuf; + args.symlink.mode = old_mode; + + run_sioq(__unionfs_symlink, &args); + err = args.err; + } else if (S_ISBLK(old_mode) || S_ISCHR(old_mode) || + S_ISFIFO(old_mode) || S_ISSOCK(old_mode)) { + args.mknod.parent = new_lower_parent_dentry->d_inode; + args.mknod.dentry = new_lower_dentry; + args.mknod.mode = old_mode; + args.mknod.dev = old_lower_dentry->d_inode->i_rdev; + + run_sioq(__unionfs_mknod, &args); + err = args.err; + } else if (S_ISREG(old_mode)) { + args.create.parent = new_lower_parent_dentry->d_inode; + args.create.dentry = new_lower_dentry; + args.create.mode = old_mode; + args.create.nd = NULL; + + run_sioq(__unionfs_create, &args); + err = args.err; + } else { + printk(KERN_ERR "unionfs: unknown inode type %d\n", + old_mode); + BUG(); + } + + return err; +} + +static int __copyup_reg_data(struct dentry *dentry, + struct dentry *new_lower_dentry, int new_bindex, + struct dentry *old_lower_dentry, int old_bindex, + struct file **copyup_file, loff_t len) +{ + struct super_block *sb = dentry->d_sb; + struct file *input_file; + struct file *output_file; + struct vfsmount *output_mnt; + mm_segment_t old_fs; + char *buf = NULL; + ssize_t read_bytes, write_bytes; + loff_t size; + int err = 0; + + /* open old file */ + unionfs_mntget(dentry, old_bindex); + branchget(sb, old_bindex); + /* dentry_open calls dput and mntput if it returns an error */ + input_file = dentry_open(old_lower_dentry, + unionfs_lower_mnt_idx(dentry, old_bindex), + O_RDONLY | O_LARGEFILE); + if (IS_ERR(input_file)) { + dput(old_lower_dentry); + err = PTR_ERR(input_file); + goto out; + } + if (!input_file->f_op || !input_file->f_op->read) { + err = -EINVAL; + goto out_close_in; + } + + /* open new file */ + dget(new_lower_dentry); + output_mnt = unionfs_mntget(sb->s_root, new_bindex); + branchget(sb, new_bindex); + output_file = dentry_open(new_lower_dentry, output_mnt, + O_RDWR | O_LARGEFILE); + if (IS_ERR(output_file)) { + err = PTR_ERR(output_file); + goto out_close_in2; + } + if (!output_file->f_op || !output_file->f_op->write) { + err = -EINVAL; + goto out_close_out; + } + + /* allocating a buffer */ + buf = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (!buf) { + err = -ENOMEM; + goto out_close_out; + } + + input_file->f_pos = 0; + output_file->f_pos = 0; + + old_fs = get_fs(); + set_fs(KERNEL_DS); + + size = len; + err = 0; + do { + if (len >= PAGE_SIZE) + size = PAGE_SIZE; + else if ((len < PAGE_SIZE) && (len > 0)) + size = len; + + len -= PAGE_SIZE; + + read_bytes = + input_file->f_op->read(input_file, + (char __user *)buf, size, + &input_file->f_pos); + if (read_bytes <= 0) { + err = read_bytes; + break; + } + + write_bytes = + output_file->f_op->write(output_file, + (char __user *)buf, + read_bytes, + &output_file->f_pos); + if ((write_bytes < 0) || (write_bytes < read_bytes)) { + err = write_bytes; + break; + } + } while ((read_bytes > 0) && (len > 0)); + + set_fs(old_fs); + + kfree(buf); + + if (!err) + err = output_file->f_op->fsync(output_file, + new_lower_dentry, 0); + + if (err) + goto out_close_out; + + if (copyup_file) { + *copyup_file = output_file; + goto out_close_in; + } + +out_close_out: + fput(output_file); + +out_close_in2: + branchput(sb, new_bindex); + +out_close_in: + fput(input_file); + +out: + branchput(sb, old_bindex); + + return err; +} + +/* + * dput the lower references for old and new dentry & clear a lower dentry + * pointer + */ +static void __clear(struct dentry *dentry, struct dentry *old_lower_dentry, + int old_bstart, int old_bend, + struct dentry *new_lower_dentry, int new_bindex) +{ + /* get rid of the lower dentry and all its traces */ + unionfs_set_lower_dentry_idx(dentry, new_bindex, NULL); + set_dbstart(dentry, old_bstart); + set_dbend(dentry, old_bend); + + dput(new_lower_dentry); + dput(old_lower_dentry); +} + +/* + * Copy up a dentry to a file of specified name. + * + * @dir: used to pull the ->i_sb to access other branches + * @dentry: the non-negative dentry whose lower_inode we should copy + * @bstart: the branch of the lower_inode to copy from + * @new_bindex: the branch to create the new file in + * @name: the name of the file to create + * @namelen: length of @name + * @copyup_file: the "struct file" to return (optional) + * @len: how many bytes to copy-up? + */ +int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart, + int new_bindex, const char *name, int namelen, + struct file **copyup_file, loff_t len) +{ + struct dentry *new_lower_dentry; + struct dentry *old_lower_dentry = NULL; + struct super_block *sb; + int err = 0; + int old_bindex; + int old_bstart; + int old_bend; + struct dentry *new_lower_parent_dentry = NULL; + mm_segment_t oldfs; + char *symbuf = NULL; + + verify_locked(dentry); + + old_bindex = bstart; + old_bstart = dbstart(dentry); + old_bend = dbend(dentry); + + BUG_ON(new_bindex < 0); + BUG_ON(new_bindex >= old_bindex); + + sb = dir->i_sb; + + if ((err = is_robranch_super(sb, new_bindex))) + goto out; + + /* Create the directory structure above this dentry. */ + new_lower_dentry = create_parents(dir, dentry, name, new_bindex); + if (IS_ERR(new_lower_dentry)) { + err = PTR_ERR(new_lower_dentry); + goto out; + } + + old_lower_dentry = unionfs_lower_dentry_idx(dentry, old_bindex); + /* we conditionally dput this old_lower_dentry at end of function */ + dget(old_lower_dentry); + + /* For symlinks, we must read the link before we lock the directory. */ + if (S_ISLNK(old_lower_dentry->d_inode->i_mode)) { + + symbuf = kmalloc(PATH_MAX, GFP_KERNEL); + if (!symbuf) { + __clear(dentry, old_lower_dentry, + old_bstart, old_bend, + new_lower_dentry, new_bindex); + err = -ENOMEM; + goto out_free; + } + + oldfs = get_fs(); + set_fs(KERNEL_DS); + err = old_lower_dentry->d_inode->i_op->readlink( + old_lower_dentry, + (char __user *)symbuf, + PATH_MAX); + set_fs(oldfs); + if (err < 0) { + __clear(dentry, old_lower_dentry, + old_bstart, old_bend, + new_lower_dentry, new_bindex); + goto out_free; + } + symbuf[err] = '\0'; + } + + /* Now we lock the parent, and create the object in the new branch. */ + new_lower_parent_dentry = lock_parent(new_lower_dentry); + + /* create the new inode */ + err = __copyup_ndentry(old_lower_dentry, new_lower_dentry, + new_lower_parent_dentry, symbuf); + + if (err) { + __clear(dentry, old_lower_dentry, + old_bstart, old_bend, + new_lower_dentry, new_bindex); + goto out_unlock; + } + + /* We actually copyup the file here. */ + if (S_ISREG(old_lower_dentry->d_inode->i_mode)) + err = __copyup_reg_data(dentry, new_lower_dentry, new_bindex, + old_lower_dentry, old_bindex, + copyup_file, len); + if (err) + goto out_unlink; + + /* Set permissions. */ + if ((err = copyup_permissions(sb, old_lower_dentry, + new_lower_dentry))) + goto out_unlink; + +#ifdef CONFIG_UNION_FS_XATTR + /* Selinux uses extended attributes for permissions. */ + if ((err = copyup_xattrs(old_lower_dentry, new_lower_dentry))) + goto out_unlink; +#endif /* CONFIG_UNION_FS_XATTR */ + + /* do not allow files getting deleted to be re-interposed */ + if (!d_deleted(dentry)) + unionfs_reinterpose(dentry); + + goto out_unlock; + +out_unlink: + /* + * copyup failed, because we possibly ran out of space or + * quota, or something else happened so let's unlink; we don't + * really care about the return value of vfs_unlink + */ + vfs_unlink(new_lower_parent_dentry->d_inode, new_lower_dentry); + + if (copyup_file) { + /* need to close the file */ + + fput(*copyup_file); + branchput(sb, new_bindex); + } + + /* + * TODO: should we reset the error to something like -EIO? + * + * If we don't reset, the user may get some nonsensical errors, but + * on the other hand, if we reset to EIO, we guarantee that the user + * will get a "confusing" error message. + */ + +out_unlock: + unlock_dir(new_lower_parent_dentry); + +out_free: + /* + * If old_lower_dentry was a directory, we need to dput it. If it + * was a file, then it was already dput indirectly by other + * functions we call above which operate on regular files. + */ + if (old_lower_dentry && old_lower_dentry->d_inode && + (S_ISDIR(old_lower_dentry->d_inode->i_mode) || + S_ISLNK(old_lower_dentry->d_inode->i_mode))) + dput(old_lower_dentry); + kfree(symbuf); + + if (err) + goto out; + if (!S_ISDIR(dentry->d_inode->i_mode)) { + unionfs_postcopyup_release(dentry); + if (!unionfs_lower_inode(dentry->d_inode)) { + /* + * If we got here, then we copied up to an + * unlinked-open file, whose name is .unionfsXXXXX. + */ + struct inode *inode = new_lower_dentry->d_inode; + atomic_inc(&inode->i_count); + unionfs_set_lower_inode_idx(dentry->d_inode, + ibstart(dentry->d_inode), + inode); + } + } + unionfs_postcopyup_setmnt(dentry); + /* sync inode times from copied-up inode to our inode */ + unionfs_copy_attr_times(dentry->d_inode); + unionfs_check_inode(dir); + unionfs_check_dentry(dentry); +out: + return err; +} + +/* + * This function creates a copy of a file represented by 'file' which + * currently resides in branch 'bstart' to branch 'new_bindex.' The copy + * will be named "name". + */ +int copyup_named_file(struct inode *dir, struct file *file, char *name, + int bstart, int new_bindex, loff_t len) +{ + int err = 0; + struct file *output_file = NULL; + + err = copyup_dentry(dir, file->f_path.dentry, bstart, new_bindex, + name, strlen(name), &output_file, len); + if (!err) { + fbstart(file) = new_bindex; + unionfs_set_lower_file_idx(file, new_bindex, output_file); + } + + return err; +} + +/* + * This function creates a copy of a file represented by 'file' which + * currently resides in branch 'bstart' to branch 'new_bindex'. + */ +int copyup_file(struct inode *dir, struct file *file, int bstart, + int new_bindex, loff_t len) +{ + int err = 0; + struct file *output_file = NULL; + struct dentry *dentry = file->f_path.dentry; + + err = copyup_dentry(dir, dentry, bstart, new_bindex, + dentry->d_name.name, dentry->d_name.len, + &output_file, len); + if (!err) { + fbstart(file) = new_bindex; + unionfs_set_lower_file_idx(file, new_bindex, output_file); + } + + return err; +} + +/* purge a dentry's lower-branch states (dput/mntput, etc.) */ +static void __cleanup_dentry(struct dentry *dentry, int bindex, + int old_bstart, int old_bend) +{ + int loop_start; + int loop_end; + int new_bstart = -1; + int new_bend = -1; + int i; + + loop_start = min(old_bstart, bindex); + loop_end = max(old_bend, bindex); + + /* + * This loop sets the bstart and bend for the new dentry by + * traversing from left to right. It also dputs all negative + * dentries except bindex + */ + for (i = loop_start; i <= loop_end; i++) { + if (!unionfs_lower_dentry_idx(dentry, i)) + continue; + + if (i == bindex) { + new_bend = i; + if (new_bstart < 0) + new_bstart = i; + continue; + } + + if (!unionfs_lower_dentry_idx(dentry, i)->d_inode) { + dput(unionfs_lower_dentry_idx(dentry, i)); + unionfs_set_lower_dentry_idx(dentry, i, NULL); + + unionfs_mntput(dentry, i); + unionfs_set_lower_mnt_idx(dentry, i, NULL); + } else { + if (new_bstart < 0) + new_bstart = i; + new_bend = i; + } + } + + if (new_bstart < 0) + new_bstart = bindex; + if (new_bend < 0) + new_bend = bindex; + set_dbstart(dentry, new_bstart); + set_dbend(dentry, new_bend); + +} + +/* set lower inode ptr and update bstart & bend if necessary */ +static void __set_inode(struct dentry *upper, struct dentry *lower, + int bindex) +{ + unionfs_set_lower_inode_idx(upper->d_inode, bindex, + igrab(lower->d_inode)); + if (likely(ibstart(upper->d_inode) > bindex)) + ibstart(upper->d_inode) = bindex; + if (likely(ibend(upper->d_inode) < bindex)) + ibend(upper->d_inode) = bindex; + +} + +/* set lower dentry ptr and update bstart & bend if necessary */ +static void __set_dentry(struct dentry *upper, struct dentry *lower, + int bindex) +{ + unionfs_set_lower_dentry_idx(upper, bindex, lower); + if (likely(dbstart(upper) > bindex)) + set_dbstart(upper, bindex); + if (likely(dbend(upper) < bindex)) + set_dbend(upper, bindex); +} + +/* + * This function replicates the directory structure up-to given dentry + * in the bindex branch. + */ +struct dentry *create_parents(struct inode *dir, struct dentry *dentry, + const char *name, int bindex) +{ + int err; + struct dentry *child_dentry; + struct dentry *parent_dentry; + struct dentry *lower_parent_dentry = NULL; + struct dentry *lower_dentry = NULL; + const char *childname; + unsigned int childnamelen; + int nr_dentry; + int count = 0; + int old_bstart; + int old_bend; + struct dentry **path = NULL; + struct super_block *sb; + + verify_locked(dentry); + + if ((err = is_robranch_super(dir->i_sb, bindex))) { + lower_dentry = ERR_PTR(err); + goto out; + } + + old_bstart = dbstart(dentry); + old_bend = dbend(dentry); + + lower_dentry = ERR_PTR(-ENOMEM); + + /* There is no sense allocating any less than the minimum. */ + nr_dentry = 1; + path = kmalloc(nr_dentry * sizeof(struct dentry *), GFP_KERNEL); + if (!path) + goto out; + + /* assume the negative dentry of unionfs as the parent dentry */ + parent_dentry = dentry; + + /* + * This loop finds the first parent that exists in the given branch. + * We start building the directory structure from there. At the end + * of the loop, the following should hold: + * - child_dentry is the first nonexistent child + * - parent_dentry is the first existent parent + * - path[0] is the = deepest child + * - path[count] is the first child to create + */ + do { + child_dentry = parent_dentry; + + /* find the parent directory dentry in unionfs */ + parent_dentry = child_dentry->d_parent; + unionfs_lock_dentry(parent_dentry); + + /* find out the lower_parent_dentry in the given branch */ + lower_parent_dentry = + unionfs_lower_dentry_idx(parent_dentry, bindex); + + /* grow path table */ + if (count == nr_dentry) { + void *p; + + nr_dentry *= 2; + p = krealloc(path, nr_dentry * sizeof(struct dentry *), + GFP_KERNEL); + if (!p) { + lower_dentry = ERR_PTR(-ENOMEM); + goto out; + } + path = p; + } + + /* store the child dentry */ + path[count++] = child_dentry; + } while (!lower_parent_dentry); + count--; + + sb = dentry->d_sb; + + /* + * This code goes between the begin/end labels and basically + * emulates a while(child_dentry != dentry), only cleaner and + * shorter than what would be a much longer while loop. + */ +begin: + /* get lower parent dir in the current branch */ + lower_parent_dentry = unionfs_lower_dentry_idx(parent_dentry, bindex); + unionfs_unlock_dentry(parent_dentry); + + /* init the values to lookup */ + childname = child_dentry->d_name.name; + childnamelen = child_dentry->d_name.len; + + if (child_dentry != dentry) { + /* lookup child in the underlying file system */ + lower_dentry = lookup_one_len(childname, lower_parent_dentry, + childnamelen); + if (IS_ERR(lower_dentry)) + goto out; + } else { + /* + * Is the name a whiteout of the child name ? lookup the + * whiteout child in the underlying file system + */ + lower_dentry = lookup_one_len(name, lower_parent_dentry, + strlen(name)); + if (IS_ERR(lower_dentry)) + goto out; + + /* Replace the current dentry (if any) with the new one */ + dput(unionfs_lower_dentry_idx(dentry, bindex)); + unionfs_set_lower_dentry_idx(dentry, bindex, + lower_dentry); + + __cleanup_dentry(dentry, bindex, old_bstart, old_bend); + goto out; + } + + if (lower_dentry->d_inode) { + /* + * since this already exists we dput to avoid + * multiple references on the same dentry + */ + dput(lower_dentry); + } else { + struct sioq_args args; + + /* it's a negative dentry, create a new dir */ + lower_parent_dentry = lock_parent(lower_dentry); + + args.mkdir.parent = lower_parent_dentry->d_inode; + args.mkdir.dentry = lower_dentry; + args.mkdir.mode = child_dentry->d_inode->i_mode; + + run_sioq(__unionfs_mkdir, &args); + err = args.err; + + if (!err) + err = copyup_permissions(dir->i_sb, child_dentry, + lower_dentry); + unlock_dir(lower_parent_dentry); + if (err) { + struct inode *inode = lower_dentry->d_inode; + /* + * If we get here, it means that we created a new + * dentry+inode, but copying permissions failed. + * Therefore, we should delete this inode and dput + * the dentry so as not to leave cruft behind. + */ + if (lower_dentry->d_op && lower_dentry->d_op->d_iput) + lower_dentry->d_op->d_iput(lower_dentry, + inode); + else + iput(inode); + lower_dentry->d_inode = NULL; + dput(lower_dentry); + lower_dentry = ERR_PTR(err); + goto out; + } + + } + + __set_inode(child_dentry, lower_dentry, bindex); + __set_dentry(child_dentry, lower_dentry, bindex); + /* + * update times of this dentry, but also the parent, because if + * we changed, the parent may have changed too. + */ + unionfs_copy_attr_times(parent_dentry->d_inode); + unionfs_copy_attr_times(child_dentry->d_inode); + + parent_dentry = child_dentry; + child_dentry = path[--count]; + goto begin; +out: + /* cleanup any leftover locks from the do/while loop above */ + if (IS_ERR(lower_dentry)) + while (count) + unionfs_unlock_dentry(path[count--]); + kfree(path); + return lower_dentry; +} + +/* + * Post-copyup helper to ensure we have valid mnts: set lower mnt of + * dentry+parents to the first parent node that has an mnt. + */ +void unionfs_postcopyup_setmnt(struct dentry *dentry) +{ + struct dentry *parent, *hasone; + int bindex = dbstart(dentry); + + if (unionfs_lower_mnt_idx(dentry, bindex)) + return; + hasone = dentry->d_parent; + /* this loop should stop at root dentry */ + while (!unionfs_lower_mnt_idx(hasone, bindex)) + hasone = hasone->d_parent; + parent = dentry; + while (!unionfs_lower_mnt_idx(parent, bindex)) { + unionfs_set_lower_mnt_idx(parent, bindex, + unionfs_mntget(hasone, bindex)); + parent = parent->d_parent; + } +} + +/* + * Post-copyup helper to release all non-directory source objects of a + * copied-up file. Regular files should have only one lower object. + */ +void unionfs_postcopyup_release(struct dentry *dentry) +{ + int bindex; + + BUG_ON(S_ISDIR(dentry->d_inode->i_mode)); + for (bindex=dbstart(dentry)+1; bindex<=dbend(dentry); bindex++) { + if (unionfs_lower_mnt_idx(dentry, bindex)) { + unionfs_mntput(dentry, bindex); + unionfs_set_lower_mnt_idx(dentry, bindex, NULL); + } + if (unionfs_lower_dentry_idx(dentry, bindex)) { + dput(unionfs_lower_dentry_idx(dentry, bindex)); + unionfs_set_lower_dentry_idx(dentry, bindex, NULL); + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex)); + unionfs_set_lower_inode_idx(dentry->d_inode, bindex, + NULL); + } + } + bindex = dbstart(dentry); + set_dbend(dentry, bindex); + ibend(dentry->d_inode) = ibstart(dentry->d_inode) = bindex; +} diff --git a/fs/unionfs/debug.c b/fs/unionfs/debug.c new file mode 100644 index 0000000..94f0e84 --- /dev/null +++ b/fs/unionfs/debug.c @@ -0,0 +1,494 @@ +/* + * Copyright (c) 2003-2007 Erez Zadok + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek + * Copyright (c) 2003-2007 Stony Brook University + * Copyright (c) 2003-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include "union.h" + +/* + * Helper debugging functions for maintainers (and for users to report back + * useful information back to maintainers) + */ + +/* it's always useful to know what part of the code called us */ +#define PRINT_CALLER() \ + do { \ + if (!printed_caller) { \ + printk("PC:%s:%s:%d\n",fname,fxn,line); \ + printed_caller = 1; \ + } \ + } while (0) + +/* + * __unionfs_check_{inode,dentry,file} perform exhaustive sanity checking on + * the fan-out of various Unionfs objects. We check that no lower objects + * exist outside the start/end branch range; that all objects within are + * non-NULL (with some allowed exceptions); that for every lower file + * there's a lower dentry+inode; that the start/end ranges match for all + * corresponding lower objects; that open files/symlinks have only one lower + * objects, but directories can have several; and more. + */ +void __unionfs_check_inode(const struct inode *inode, + const char *fname, const char *fxn, int line) +{ + int bindex; + int istart, iend; + struct inode *lower_inode; + struct super_block *sb; + int printed_caller = 0; + + /* for inodes now */ + BUG_ON(!inode); + sb = inode->i_sb; + istart = ibstart(inode); + iend = ibend(inode); + if (istart > iend) { + PRINT_CALLER(); + printk(" Ci0: inode=%p istart/end=%d:%d\n", + inode, istart, iend); + } + if ((istart == -1 && iend != -1) || + (istart != -1 && iend == -1)) { + PRINT_CALLER(); + printk(" Ci1: inode=%p istart/end=%d:%d\n", + inode, istart, iend); + } + if (!S_ISDIR(inode->i_mode)) { + if (iend != istart) { + PRINT_CALLER(); + printk(" Ci2: inode=%p istart=%d iend=%d\n", + inode, istart, iend); + } + } + + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) { + if (!UNIONFS_I(inode)) { + PRINT_CALLER(); + printk(" Ci3: no inode_info %p\n", inode); + return; + } + if (!UNIONFS_I(inode)->lower_inodes) { + PRINT_CALLER(); + printk(" Ci4: no lower_inodes %p\n", inode); + return; + } + lower_inode = unionfs_lower_inode_idx(inode, bindex); + if (lower_inode) { + if (bindex < istart || bindex > iend) { + PRINT_CALLER(); + printk(" Ci5: inode/linode=%p:%p bindex=%d " + "istart/end=%d:%d\n", inode, + lower_inode, bindex, istart, iend); + } else if ((int)lower_inode == 0x5a5a5a5a) { + /* freed inode! */ + PRINT_CALLER(); + printk(" Ci6: inode/linode=%p:%p bindex=%d " + "istart/end=%d:%d\n", inode, + lower_inode, bindex, istart, iend); + } + } else { /* lower_inode == NULL */ + if (bindex >= istart && bindex <= iend) { + /* + * directories can have NULL lower inodes in + * b/t start/end, but NOT if at the + * start/end range. + */ + if (!(S_ISDIR(inode->i_mode) && + bindex > istart && bindex < iend)) { + PRINT_CALLER(); + printk(" Ci7: inode/linode=%p:%p " + "bindex=%d istart/end=%d:%d\n", + inode, lower_inode, bindex, + istart, iend); + } + } + } + } +} + +void __unionfs_check_dentry(const struct dentry *dentry, + const char *fname, const char *fxn, int line) +{ + int bindex; + int dstart, dend, istart, iend; + struct dentry *lower_dentry; + struct inode *inode, *lower_inode; + struct super_block *sb; + struct vfsmount *lower_mnt; + int printed_caller = 0; + + BUG_ON(!dentry); + sb = dentry->d_sb; + inode = dentry->d_inode; + dstart = dbstart(dentry); + dend = dbend(dentry); + BUG_ON(dstart > dend); + + if ((dstart == -1 && dend != -1) || + (dstart != -1 && dend == -1)) { + PRINT_CALLER(); + printk(" CD0: dentry=%p dstart/end=%d:%d\n", + dentry, dstart, dend); + } + /* + * check for NULL dentries inside the start/end range, or + * non-NULL dentries outside the start/end range. + */ + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) { + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex); + if (lower_dentry) { + if (bindex < dstart || bindex > dend) { + PRINT_CALLER(); + printk(" CD1: dentry/lower=%p:%p(%p) " + "bindex=%d dstart/end=%d:%d\n", + dentry, lower_dentry, + (lower_dentry ? lower_dentry->d_inode : + (void *) 0xffffffff), + bindex, dstart, dend); + } + } else { /* lower_dentry == NULL */ + if (bindex >= dstart && bindex <= dend) { + /* + * Directories can have NULL lower inodes in + * b/t start/end, but NOT if at the + * start/end range. Ignore this rule, + * however, if this is a NULL dentry or a + * deleted dentry. + */ + if (!d_deleted((struct dentry *) dentry) && + inode && + !(inode && S_ISDIR(inode->i_mode) && + bindex > dstart && bindex < dend)) { + PRINT_CALLER(); + printk(" CD2: dentry/lower=%p:%p(%p) " + "bindex=%d dstart/end=%d:%d\n", + dentry, lower_dentry, + (lower_dentry ? + lower_dentry->d_inode : + (void *) 0xffffffff), + bindex, dstart, dend); + } + } + } + } + + /* check for vfsmounts same as for dentries */ + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) { + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex); + if (lower_mnt) { + if (bindex < dstart || bindex > dend) { + PRINT_CALLER(); + printk(" CM0: dentry/lmnt=%p:%p bindex=%d " + "dstart/end=%d:%d\n", dentry, + lower_mnt, bindex, dstart, dend); + } + } else { /* lower_mnt == NULL */ + if (bindex >= dstart && bindex <= dend) { + /* + * Directories can have NULL lower inodes in + * b/t start/end, but NOT if at the + * start/end range. Ignore this rule, + * however, if this is a NULL dentry. + */ + if (inode && + !(inode && S_ISDIR(inode->i_mode) && + bindex > dstart && bindex < dend)) { + PRINT_CALLER(); + printk(" CM1: dentry/lmnt=%p:%p " + "bindex=%d dstart/end=%d:%d\n", + dentry, lower_mnt, bindex, + dstart, dend); + } + } + } + } + + /* for inodes now */ + if (!inode) + return; + istart = ibstart(inode); + iend = ibend(inode); + BUG_ON(istart > iend); + if ((istart == -1 && iend != -1) || + (istart != -1 && iend == -1)) { + PRINT_CALLER(); + printk(" CI0: dentry/inode=%p:%p istart/end=%d:%d\n", + dentry, inode, istart, iend); + } + if (istart != dstart) { + PRINT_CALLER(); + printk(" CI1: dentry/inode=%p:%p istart=%d dstart=%d\n", + dentry, inode, istart, dstart); + } + if (iend != dend) { + PRINT_CALLER(); + printk(" CI2: dentry/inode=%p:%p iend=%d dend=%d\n", + dentry, inode, iend, dend); + } + + if (!S_ISDIR(inode->i_mode)) { + if (dend != dstart) { + PRINT_CALLER(); + printk(" CI3: dentry/inode=%p:%p dstart=%d dend=%d\n", + dentry, inode, dstart, dend); + } + if (iend != istart) { + PRINT_CALLER(); + printk(" CI4: dentry/inode=%p:%p istart=%d iend=%d\n", + dentry, inode, istart, iend); + } + } + + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) { + lower_inode = unionfs_lower_inode_idx(inode, bindex); + if (lower_inode) { + if (bindex < istart || bindex > iend) { + PRINT_CALLER(); + printk(" CI5: dentry/linode=%p:%p bindex=%d " + "istart/end=%d:%d\n", dentry, + lower_inode, bindex, istart, iend); + } else if ((int)lower_inode == 0x5a5a5a5a) { + /* freed inode! */ + PRINT_CALLER(); + printk(" CI6: dentry/linode=%p:%p bindex=%d " + "istart/end=%d:%d\n", dentry, + lower_inode, bindex, istart, iend); + } + } else { /* lower_inode == NULL */ + if (bindex >= istart && bindex <= iend) { + /* + * directories can have NULL lower inodes in + * b/t start/end, but NOT if at the + * start/end range. + */ + if (!(S_ISDIR(inode->i_mode) && + bindex > istart && bindex < iend)) { + PRINT_CALLER(); + printk(" CI7: dentry/linode=%p:%p " + "bindex=%d istart/end=%d:%d\n", + dentry, lower_inode, bindex, + istart, iend); + } + } + } + } + + /* + * If it's a directory, then intermediate objects b/t start/end can + * be NULL. But, check that all three are NULL: lower dentry, mnt, + * and inode. + */ + if (S_ISDIR(inode->i_mode)) + for (bindex = dstart+1; bindex < dend; bindex++) { + lower_inode = unionfs_lower_inode_idx(inode, bindex); + lower_dentry = unionfs_lower_dentry_idx(dentry, + bindex); + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex); + if (!((lower_inode && lower_dentry && lower_mnt) || + (!lower_inode && !lower_dentry && !lower_mnt))) { + PRINT_CALLER(); + printk(" Cx: lmnt/ldentry/linode=%p:%p:%p " + "bindex=%d dstart/end=%d:%d\n", + lower_mnt, lower_dentry, lower_inode, + bindex, dstart, dend); + } + } + /* check if lower inode is newer than upper one (it shouldn't) */ + if (is_newer_lower(dentry)) { + PRINT_CALLER(); + for (bindex=ibstart(inode); bindex <= ibend(inode); bindex++) { + lower_inode = unionfs_lower_inode_idx(inode, bindex); + if (!lower_inode) + continue; + printk(" CI8: bindex=%d mtime/lmtime=%lu.%lu/%lu.%lu " + "ctime/lctime=%lu.%lu/%lu.%lu\n", + bindex, + inode->i_mtime.tv_sec, + inode->i_mtime.tv_nsec, + lower_inode->i_mtime.tv_sec, + lower_inode->i_mtime.tv_nsec, + inode->i_ctime.tv_sec, + inode->i_ctime.tv_nsec, + lower_inode->i_ctime.tv_sec, + lower_inode->i_ctime.tv_nsec); + } + } +} + +void __unionfs_check_file(const struct file *file, + const char *fname, const char *fxn, int line) +{ + int bindex; + int dstart, dend, fstart, fend; + struct dentry *dentry; + struct file *lower_file; + struct inode *inode; + struct super_block *sb; + int printed_caller = 0; + + BUG_ON(!file); + dentry = file->f_path.dentry; + sb = dentry->d_sb; + dstart = dbstart(dentry); + dend = dbend(dentry); + BUG_ON(dstart > dend); + fstart = fbstart(file); + fend = fbend(file); + BUG_ON(fstart > fend); + + if ((fstart == -1 && fend != -1) || + (fstart != -1 && fend == -1)) { + PRINT_CALLER(); + printk(" CF0: file/dentry=%p:%p fstart/end=%d:%d\n", + file, dentry, fstart, fend); + } + if (fstart != dstart) { + PRINT_CALLER(); + printk(" CF1: file/dentry=%p:%p fstart=%d dstart=%d\n", + file, dentry, fstart, dstart); + } + if (fend != dend) { + PRINT_CALLER(); + printk(" CF2: file/dentry=%p:%p fend=%d dend=%d\n", + file, dentry, fend, dend); + } + inode = dentry->d_inode; + if (!S_ISDIR(inode->i_mode)) { + if (fend != fstart) { + PRINT_CALLER(); + printk(" CF3: file/inode=%p:%p fstart=%d fend=%d\n", + file, inode, fstart, fend); + } + if (dend != dstart) { + PRINT_CALLER(); + printk(" CF4: file/dentry=%p:%p dstart=%d dend=%d\n", + file, dentry, dstart, dend); + } + } + + /* + * check for NULL dentries inside the start/end range, or + * non-NULL dentries outside the start/end range. + */ + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) { + lower_file = unionfs_lower_file_idx(file, bindex); + if (lower_file) { + if (bindex < fstart || bindex > fend) { + PRINT_CALLER(); + printk(" CF5: file/lower=%p:%p bindex=%d " + "fstart/end=%d:%d\n", + file, lower_file, bindex, fstart, fend); + } + } else { /* lower_file == NULL */ + if (bindex >= fstart && bindex <= fend) { + /* + * directories can have NULL lower inodes in + * b/t start/end, but NOT if at the + * start/end range. + */ + if (!(S_ISDIR(inode->i_mode) && + bindex > fstart && bindex < fend)) { + PRINT_CALLER(); + printk(" CF6: file/lower=%p:%p " + "bindex=%d fstart/end=%d:%d\n", + file, lower_file, bindex, + fstart, fend); + } + } + } + } + + __unionfs_check_dentry(dentry,fname,fxn,line); +} + +/* useful to track vfsmount leaks that could cause EBUSY on unmount */ +void __show_branch_counts(const struct super_block *sb, + const char *file, const char *fxn, int line) +{ + int i; + struct vfsmount *mnt; + + printk("BC:"); + for (i=0; is_root) + mnt = UNIONFS_D(sb->s_root)->lower_paths[i].mnt; + else + mnt = NULL; + printk("%d:", (mnt ? atomic_read(&mnt->mnt_count) : -99)); + } + printk("%s:%s:%d\n",file,fxn,line); +} + +void __show_inode_times(const struct inode *inode, + const char *file, const char *fxn, int line) +{ + struct inode *lower_inode; + int bindex; + + for (bindex=ibstart(inode); bindex <= ibend(inode); bindex++) { + lower_inode = unionfs_lower_inode_idx(inode, bindex); + if (!lower_inode) + continue; + printk("IT(%lu:%d): ", inode->i_ino, bindex); + printk("%s:%s:%d ",file,fxn,line); + printk("um=%lu/%lu lm=%lu/%lu ", + inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec, + lower_inode->i_mtime.tv_sec, + lower_inode->i_mtime.tv_nsec); + printk("uc=%lu/%lu lc=%lu/%lu\n", + inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec, + lower_inode->i_ctime.tv_sec, + lower_inode->i_ctime.tv_nsec); + } +} + +void __show_dinode_times(const struct dentry *dentry, + const char *file, const char *fxn, int line) +{ + struct inode *inode = dentry->d_inode; + struct inode *lower_inode; + int bindex; + + for (bindex=ibstart(inode); bindex <= ibend(inode); bindex++) { + lower_inode = unionfs_lower_inode_idx(inode, bindex); + if (!lower_inode) + continue; + printk("DT(%s:%lu:%d): ", dentry->d_name.name, inode->i_ino, bindex); + printk("%s:%s:%d ",file,fxn,line); + printk("um=%lu/%lu lm=%lu/%lu ", + inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec, + lower_inode->i_mtime.tv_sec, + lower_inode->i_mtime.tv_nsec); + printk("uc=%lu/%lu lc=%lu/%lu\n", + inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec, + lower_inode->i_ctime.tv_sec, + lower_inode->i_ctime.tv_nsec); + } +} + +void __show_inode_counts(const struct inode *inode, + const char *file, const char *fxn, int line) +{ + struct inode *lower_inode; + int bindex; + + if (!inode) { + printk("SiC: Null inode\n"); + return; + } + for (bindex=sbstart(inode->i_sb); bindex <= sbend(inode->i_sb); bindex++) { + lower_inode = unionfs_lower_inode_idx(inode, bindex); + if (!lower_inode) + continue; + printk("SIC(%lu:%d:%d): ", inode->i_ino, bindex, + atomic_read(&(inode)->i_count)); + printk("lc=%d ", atomic_read(&(lower_inode)->i_count)); + printk("%s:%s:%d\n",file,fxn,line); + } +} diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c new file mode 100644 index 0000000..f3c1258 --- /dev/null +++ b/fs/unionfs/dentry.c @@ -0,0 +1,480 @@ +/* + * Copyright (c) 2003-2007 Erez Zadok + * Copyright (c) 2003-2006 Charles P. Wright + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek + * Copyright (c) 2005-2006 Junjiro Okajima + * Copyright (c) 2005 Arun M. Krishnakumar + * Copyright (c) 2004-2006 David P. Quigley + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair + * Copyright (c) 2003 Puja Gupta + * Copyright (c) 2003 Harikesavan Krishnan + * Copyright (c) 2003-2007 Stony Brook University + * Copyright (c) 2003-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include "union.h" + +/* + * Revalidate a single dentry. + * Assume that dentry's info node is locked. + * Assume that parent(s) are all valid already, but + * the child may not yet be valid. + * Returns 1 if valid, 0 otherwise. + */ +static int __unionfs_d_revalidate_one(struct dentry *dentry, + struct nameidata *nd) +{ + int valid = 1; /* default is valid (1); invalid is 0. */ + struct dentry *lower_dentry; + int bindex, bstart, bend; + int sbgen, dgen; + int positive = 0; + int locked = 0; + int interpose_flag; + struct nameidata lowernd; /* TODO: be gentler to the stack */ + + if (nd) + memcpy(&lowernd, nd, sizeof(struct nameidata)); + else + memset(&lowernd, 0, sizeof(struct nameidata)); + + verify_locked(dentry); + + /* if the dentry is unhashed, do NOT revalidate */ + if (d_deleted(dentry)) { + printk(KERN_DEBUG "unionfs: unhashed dentry being " + "revalidated: %*s\n", + dentry->d_name.len, dentry->d_name.name); + goto out; + } + + BUG_ON(dbstart(dentry) == -1); + if (dentry->d_inode) + positive = 1; + dgen = atomic_read(&UNIONFS_D(dentry)->generation); + sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation); + /* + * If we are working on an unconnected dentry, then there is no + * revalidation to be done, because this file does not exist within + * the namespace, and Unionfs operates on the namespace, not data. + */ + if (sbgen != dgen) { + struct dentry *result; + int pdgen; + + /* The root entry should always be valid */ + BUG_ON(IS_ROOT(dentry)); + + /* We can't work correctly if our parent isn't valid. */ + pdgen = atomic_read(&UNIONFS_D(dentry->d_parent)->generation); + BUG_ON(pdgen != sbgen); /* should never happen here */ + + /* Free the pointers for our inodes and this dentry. */ + bstart = dbstart(dentry); + bend = dbend(dentry); + if (bstart >= 0) { + struct dentry *lower_dentry; + for (bindex = bstart; bindex <= bend; bindex++) { + lower_dentry = + unionfs_lower_dentry_idx(dentry, + bindex); + dput(lower_dentry); + } + } + set_dbstart(dentry, -1); + set_dbend(dentry, -1); + + interpose_flag = INTERPOSE_REVAL_NEG; + if (positive) { + interpose_flag = INTERPOSE_REVAL; + /* + * During BRM, the VFS could already hold a lock on + * a file being read, so don't lock it again + * (deadlock), but if you lock it in this function, + * then release it here too. + */ + if (!mutex_is_locked(&dentry->d_inode->i_mutex)) { + mutex_lock(&dentry->d_inode->i_mutex); + locked = 1; + } + + bstart = ibstart(dentry->d_inode); + bend = ibend(dentry->d_inode); + if (bstart >= 0) { + struct inode *lower_inode; + for (bindex = bstart; bindex <= bend; + bindex++) { + lower_inode = + unionfs_lower_inode_idx( + dentry->d_inode, + bindex); + iput(lower_inode); + } + } + kfree(UNIONFS_I(dentry->d_inode)->lower_inodes); + UNIONFS_I(dentry->d_inode)->lower_inodes = NULL; + ibstart(dentry->d_inode) = -1; + ibend(dentry->d_inode) = -1; + if (locked) + mutex_unlock(&dentry->d_inode->i_mutex); + } + + result = unionfs_lookup_backend(dentry, &lowernd, + interpose_flag); + if (result) { + if (IS_ERR(result)) { + valid = 0; + goto out; + } + /* + * current unionfs_lookup_backend() doesn't return + * a valid dentry + */ + dput(dentry); + dentry = result; + } + + if (positive && UNIONFS_I(dentry->d_inode)->stale) { + make_bad_inode(dentry->d_inode); + d_drop(dentry); + valid = 0; + goto out; + } + goto out; + } + + /* The revalidation must occur across all branches */ + bstart = dbstart(dentry); + bend = dbend(dentry); + BUG_ON(bstart == -1); + for (bindex = bstart; bindex <= bend; bindex++) { + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex); + if (!lower_dentry || !lower_dentry->d_op + || !lower_dentry->d_op->d_revalidate) + continue; + if (!lower_dentry->d_op->d_revalidate(lower_dentry, + &lowernd)) + valid = 0; + } + + if (!dentry->d_inode) + valid = 0; + + if (valid) { + /* + * If we get here, and we copy the meta-data from the lower + * inode to our inode, then it is vital that we have already + * purged all unionfs-level file data. We do that in the + * caller (__unionfs_d_revalidate_chain) by calling + * purge_inode_data. + */ + unionfs_copy_attr_all(dentry->d_inode, + unionfs_lower_inode(dentry->d_inode)); + fsstack_copy_inode_size(dentry->d_inode, + unionfs_lower_inode(dentry->d_inode)); + } + +out: + return valid; +} + +/* + * Determine if the lower inode objects have changed from below the unionfs + * inode. Return 1 if changed, 0 otherwise. + */ +int is_newer_lower(const struct dentry *dentry) +{ + int bindex; + struct inode *inode; + struct inode *lower_inode; + + /* ignore if we're called on semi-initialized dentries/inodes */ + if (!dentry || !UNIONFS_D(dentry)) + return 0; + inode = dentry->d_inode; + if (!inode || !UNIONFS_I(inode) || + ibstart(inode) < 0 || ibend(inode) < 0) + return 0; + + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) { + lower_inode = unionfs_lower_inode_idx(inode, bindex); + if (!lower_inode) + continue; + /* + * We may want to apply other tests to determine if the + * lower inode's data has changed, but checking for changed + * ctime and mtime on the lower inode should be enough. + */ + if (timespec_compare(&inode->i_mtime, + &lower_inode->i_mtime) < 0) { + printk("unionfs: new lower inode mtime " + "(bindex=%d, name=%s)\n", bindex, + dentry->d_name.name); + show_dinode_times(dentry); + return 1; /* mtime changed! */ + } + if (timespec_compare(&inode->i_ctime, + &lower_inode->i_ctime) < 0) { + printk("unionfs: new lower inode ctime " + "(bindex=%d, name=%s)\n", bindex, + dentry->d_name.name); + show_dinode_times(dentry); + return 1; /* ctime changed! */ + } + } + return 0; /* default: lower is not newer */ +} + +/* + * Purge/remove/unmap all date pages of a unionfs inode. This is called + * when the lower inode has changed, and we have to force processes to get + * the new data. + * + * XXX: Our implementation works in that as long as a user process will have + * caused Unionfs to be called, directly or indirectly, even to just do + * ->d_revalidate; then we will have purged the current Unionfs data and the + * process will see the new data. For example, a process that continually + * re-reads the same file's data will see the NEW data as soon as the lower + * file had changed, upon the next read(2) syscall (even if the file is + * still open!) However, this doesn't work when the process re-reads the + * open file's data via mmap(2) (unless the user unmaps/closes the file and + * remaps/reopens it). Once we respond to ->readpage(s), then the kernel + * maps the page into the process's address space and there doesn't appear + * to be a way to force the kernel to invalidate those pages/mappings, and + * force the process to re-issue ->readpage. If there's a way to invalidate + * active mappings and force a ->readpage, let us know please + * (invalidate_inode_pages2 doesn't do the trick). + */ +static inline void purge_inode_data(struct dentry *dentry) +{ + /* remove all non-private mappings */ + unmap_mapping_range(dentry->d_inode->i_mapping, 0, 0, 0); + + if (dentry->d_inode->i_data.nrpages) + truncate_inode_pages(&dentry->d_inode->i_data, 0); +} + +/* + * Revalidate a parent chain of dentries, then the actual node. + * Assumes that dentry is locked, but will lock all parents if/when needed. + * + * If 'willwrite' is 1, and the lower inode times are not in sync, then + * *don't* purge_inode_data, as it could deadlock if ->write calls us and we + * try to truncate a locked page. Besides, if unionfs is about to write + * data to a file, then there's the data unionfs is about to write is more + * authoritative than what's below, therefore we can safely overwrite the + * lower inode times and data. + */ +int __unionfs_d_revalidate_chain(struct dentry *dentry, struct nameidata *nd, + int willwrite) +{ + int valid = 0; /* default is invalid (0); valid is 1. */ + struct dentry **chain = NULL; /* chain of dentries to reval */ + int chain_len = 0; + struct dentry *dtmp; + int sbgen, dgen, i; + int saved_bstart, saved_bend, bindex; + + /* find length of chain needed to revalidate */ + /* XXX: should I grab some global (dcache?) lock? */ + chain_len = 0; + sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation); + dtmp = dentry->d_parent; + dgen = atomic_read(&UNIONFS_D(dtmp)->generation); + /* XXX: should we check if is_newer_lower all the way up? */ + if (is_newer_lower(dtmp)) { + /* + * Special case: the root dentry's generation number must + * always be valid, but its lower inode times don't have to + * be, so sync up the times only. + */ + if (IS_ROOT(dtmp)) + unionfs_copy_attr_times(dtmp->d_inode); + else { + /* + * reset generation number to zero, guaranteed to be + * "old" + */ + dgen = 0; + atomic_set(&UNIONFS_D(dtmp)->generation, dgen); + } + purge_inode_data(dtmp); + } + while (sbgen != dgen) { + /* The root entry should always be valid */ + BUG_ON(IS_ROOT(dtmp)); + chain_len++; + dtmp = dtmp->d_parent; + dgen = atomic_read(&UNIONFS_D(dtmp)->generation); + } + if (chain_len == 0) + goto out_this; /* shortcut if parents are OK */ + + /* + * Allocate array of dentries to reval. We could use linked lists, + * but the number of entries we need to alloc here is often small, + * and short lived, so locality will be better. + */ + chain = kzalloc(chain_len * sizeof(struct dentry *), GFP_KERNEL); + if (!chain) { + printk("unionfs: no more memory in %s\n", __FUNCTION__); + goto out; + } + + /* + * lock all dentries in chain, in child to parent order. + * if failed, then sleep for a little, then retry. + */ + dtmp = dentry->d_parent; + for (i=chain_len-1; i>=0; i--) { + chain[i] = dget(dtmp); + dtmp = dtmp->d_parent; + } + + /* + * call __unionfs_d_revalidate_one() on each dentry, but in parent + * to child order. + */ + for (i=0; id_sb)->generation); + dgen = atomic_read(&UNIONFS_D(chain[i])->generation); + + valid = __unionfs_d_revalidate_one(chain[i], nd); + /* XXX: is this the correct mntput condition?! */ + if (valid && chain_len > 0 && + sbgen != dgen && chain[i]->d_inode && + S_ISDIR(chain[i]->d_inode->i_mode)) { + for (bindex = saved_bstart; bindex <= saved_bend; + bindex++) + unionfs_mntput(chain[i], bindex); + } + unionfs_unlock_dentry(chain[i]); + + if (!valid) + goto out_free; + } + + +out_this: + /* finally, lock this dentry and revalidate it */ + verify_locked(dentry); + dgen = atomic_read(&UNIONFS_D(dentry)->generation); + if (is_newer_lower(dentry)) { + /* root dentry special case as aforementioned */ + if (IS_ROOT(dentry)) + unionfs_copy_attr_times(dentry->d_inode); + else { + /* + * reset generation number to zero, guaranteed to be + * "old" + */ + dgen = 0; + atomic_set(&UNIONFS_D(dentry)->generation, dgen); + } + if (!willwrite) + purge_inode_data(dentry); + } + valid = __unionfs_d_revalidate_one(dentry, nd); + + /* + * If __unionfs_d_revalidate_one() succeeded above, then it will + * have incremented the refcnt of the mnt's, but also the branch + * indices of the dentry will have been updated (to take into + * account any branch insertions/deletion. So the current + * dbstart/dbend match the current, and new, indices of the mnts + * which __unionfs_d_revalidate_one has incremented. Note: the "if" + * test below does not depend on whether chain_len was 0 or greater. + */ + if (valid && sbgen != dgen) + for (bindex = dbstart(dentry); + bindex <= dbend(dentry); + bindex++) + unionfs_mntput(dentry, bindex); + +out_free: + /* unlock/dput all dentries in chain and return status */ + if (chain_len > 0) { + for (i=0; id_sb); + + unionfs_lock_dentry(dentry); + err = __unionfs_d_revalidate_chain(dentry, nd, 0); + unionfs_unlock_dentry(dentry); + unionfs_check_dentry(dentry); + + unionfs_read_unlock(dentry->d_sb); + + return err; +} + +/* + * At this point no one can reference this dentry, so we don't have to be + * careful about concurrent access. + */ +static void unionfs_d_release(struct dentry *dentry) +{ + int bindex, bstart, bend; + + unionfs_read_lock(dentry->d_sb); + + unionfs_check_dentry(dentry); + /* this could be a negative dentry, so check first */ + if (!UNIONFS_D(dentry)) { + printk(KERN_DEBUG "unionfs: dentry without private data: %.*s", + dentry->d_name.len, dentry->d_name.name); + goto out; + } else if (dbstart(dentry) < 0) { + /* this is due to a failed lookup */ + printk(KERN_DEBUG "unionfs: dentry without lower " + "dentries: %.*s", + dentry->d_name.len, dentry->d_name.name); + goto out_free; + } + + /* Release all the lower dentries */ + bstart = dbstart(dentry); + bend = dbend(dentry); + for (bindex = bstart; bindex <= bend; bindex++) { + dput(unionfs_lower_dentry_idx(dentry, bindex)); + unionfs_set_lower_dentry_idx(dentry, bindex, NULL); + /* NULL lower mnt is ok if this is a negative dentry */ + if (!dentry->d_inode && !unionfs_lower_mnt_idx(dentry,bindex)) + continue; + unionfs_mntput(dentry, bindex); + unionfs_set_lower_mnt_idx(dentry, bindex, NULL); + } + /* free private data (unionfs_dentry_info) here */ + kfree(UNIONFS_D(dentry)->lower_paths); + UNIONFS_D(dentry)->lower_paths = NULL; + +out_free: + /* No need to unlock it, because it is disappeared. */ + free_dentry_private_data(dentry); + +out: + unionfs_read_unlock(dentry->d_sb); + return; +} + +struct dentry_operations unionfs_dops = { + .d_revalidate = unionfs_d_revalidate, + .d_release = unionfs_d_release, +}; diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c new file mode 100644 index 0000000..980f125 --- /dev/null +++ b/fs/unionfs/dirfops.c @@ -0,0 +1,278 @@ +/* + * Copyright (c) 2003-2007 Erez Zadok + * Copyright (c) 2003-2006 Charles P. Wright + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek + * Copyright (c) 2005-2006 Junjiro Okajima + * Copyright (c) 2005 Arun M. Krishnakumar + * Copyright (c) 2004-2006 David P. Quigley + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair + * Copyright (c) 2003 Puja Gupta + * Copyright (c) 2003 Harikesavan Krishnan + * Copyright (c) 2003-2007 Stony Brook University + * Copyright (c) 2003-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include "union.h" + +/* Make sure our rdstate is playing by the rules. */ +static void verify_rdstate_offset(struct unionfs_dir_state *rdstate) +{ + BUG_ON(rdstate->offset >= DIREOF); + BUG_ON(rdstate->cookie >= MAXRDCOOKIE); +} + +struct unionfs_getdents_callback { + struct unionfs_dir_state *rdstate; + void *dirent; + int entries_written; + int filldir_called; + int filldir_error; + filldir_t filldir; + struct super_block *sb; +}; + +/* based on generic filldir in fs/readir.c */ +static int unionfs_filldir(void *dirent, const char *name, int namelen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct unionfs_getdents_callback *buf = dirent; + struct filldir_node *found = NULL; + int err = 0; + int is_wh_entry = 0; + + buf->filldir_called++; + + if ((namelen > UNIONFS_WHLEN) && + !strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN)) { + name += UNIONFS_WHLEN; + namelen -= UNIONFS_WHLEN; + is_wh_entry = 1; + } + + found = find_filldir_node(buf->rdstate, name, namelen); + + if (found) + goto out; + + /* if 'name' isn't a whiteout, filldir it. */ + if (!is_wh_entry) { + off_t pos = rdstate2offset(buf->rdstate); + u64 unionfs_ino = ino; + + if (!err) { + err = buf->filldir(buf->dirent, name, namelen, pos, + unionfs_ino, d_type); + buf->rdstate->offset++; + verify_rdstate_offset(buf->rdstate); + } + } + /* + * If we did fill it, stuff it in our hash, otherwise return an + * error. + */ + if (err) { + buf->filldir_error = err; + goto out; + } + buf->entries_written++; + if ((err = add_filldir_node(buf->rdstate, name, namelen, + buf->rdstate->bindex, is_wh_entry))) + buf->filldir_error = err; + +out: + return err; +} + +static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir) +{ + int err = 0; + struct file *lower_file = NULL; + struct inode *inode = NULL; + struct unionfs_getdents_callback buf; + struct unionfs_dir_state *uds; + int bend; + loff_t offset; + + unionfs_read_lock(file->f_path.dentry->d_sb); + + if ((err = unionfs_file_revalidate(file, 0))) + goto out; + + inode = file->f_path.dentry->d_inode; + + uds = UNIONFS_F(file)->rdstate; + if (!uds) { + if (file->f_pos == DIREOF) { + goto out; + } else if (file->f_pos > 0) { + uds = find_rdstate(inode, file->f_pos); + if (!uds) { + err = -ESTALE; + goto out; + } + UNIONFS_F(file)->rdstate = uds; + } else { + init_rdstate(file); + uds = UNIONFS_F(file)->rdstate; + } + } + bend = fbend(file); + + while (uds->bindex <= bend) { + lower_file = unionfs_lower_file_idx(file, uds->bindex); + if (!lower_file) { + uds->bindex++; + uds->dirpos = 0; + continue; + } + + /* prepare callback buffer */ + buf.filldir_called = 0; + buf.filldir_error = 0; + buf.entries_written = 0; + buf.dirent = dirent; + buf.filldir = filldir; + buf.rdstate = uds; + buf.sb = inode->i_sb; + + /* Read starting from where we last left off. */ + offset = vfs_llseek(lower_file, uds->dirpos, SEEK_SET); + if (offset < 0) { + err = offset; + goto out; + } + err = vfs_readdir(lower_file, unionfs_filldir, &buf); + + /* Save the position for when we continue. */ + offset = vfs_llseek(lower_file, 0, SEEK_CUR); + if (offset < 0) { + err = offset; + goto out; + } + uds->dirpos = offset; + + /* Copy the atime. */ + fsstack_copy_attr_atime(inode, lower_file->f_path.dentry->d_inode); + + if (err < 0) + goto out; + + if (buf.filldir_error) + break; + + if (!buf.entries_written) { + uds->bindex++; + uds->dirpos = 0; + } + } + + if (!buf.filldir_error && uds->bindex >= bend) { + /* Save the number of hash entries for next time. */ + UNIONFS_I(inode)->hashsize = uds->hashentries; + free_rdstate(uds); + UNIONFS_F(file)->rdstate = NULL; + file->f_pos = DIREOF; + } else + file->f_pos = rdstate2offset(uds); + +out: + unionfs_read_unlock(file->f_path.dentry->d_sb); + return err; +} + +/* + * This is not meant to be a generic repositioning function. If you do + * things that aren't supported, then we return EINVAL. + * + * What is allowed: + * (1) seeking to the same position that you are currently at + * This really has no effect, but returns where you are. + * (2) seeking to the beginning of the file + * This throws out all state, and lets you begin again. + */ +static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin) +{ + struct unionfs_dir_state *rdstate; + loff_t err; + + unionfs_read_lock(file->f_path.dentry->d_sb); + + if ((err = unionfs_file_revalidate(file, 0))) + goto out; + + rdstate = UNIONFS_F(file)->rdstate; + + /* + * we let users seek to their current position, but not anywhere + * else. + */ + if (!offset) { + switch (origin) { + case SEEK_SET: + if (rdstate) { + free_rdstate(rdstate); + UNIONFS_F(file)->rdstate = NULL; + } + init_rdstate(file); + err = 0; + break; + case SEEK_CUR: + err = file->f_pos; + break; + case SEEK_END: + /* Unsupported, because we would break everything. */ + err = -EINVAL; + break; + } + } else { + switch (origin) { + case SEEK_SET: + if (rdstate) { + if (offset == rdstate2offset(rdstate)) + err = offset; + else if (file->f_pos == DIREOF) + err = DIREOF; + else + err = -EINVAL; + } else { + rdstate = find_rdstate(file->f_path.dentry->d_inode, + offset); + if (rdstate) { + UNIONFS_F(file)->rdstate = rdstate; + err = rdstate->offset; + } else + err = -EINVAL; + } + break; + case SEEK_CUR: + case SEEK_END: + /* Unsupported, because we would break everything. */ + err = -EINVAL; + break; + } + } + +out: + unionfs_read_unlock(file->f_path.dentry->d_sb); + return err; +} + +/* + * Trimmed directory options, we shouldn't pass everything down since + * we don't want to operate on partial directories. + */ +struct file_operations unionfs_dir_fops = { + .llseek = unionfs_dir_llseek, + .read = generic_read_dir, + .readdir = unionfs_readdir, + .unlocked_ioctl = unionfs_ioctl, + .open = unionfs_open, + .release = unionfs_file_release, + .flush = unionfs_flush, + .fsync = unionfs_fsync, + .fasync = unionfs_fasync, +}; diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c new file mode 100644 index 0000000..a72f711 --- /dev/null +++ b/fs/unionfs/dirhelper.c @@ -0,0 +1,271 @@ +/* + * Copyright (c) 2003-2007 Erez Zadok + * Copyright (c) 2003-2006 Charles P. Wright + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek + * Copyright (c) 2005-2006 Junjiro Okajima + * Copyright (c) 2005 Arun M. Krishnakumar + * Copyright (c) 2004-2006 David P. Quigley + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair + * Copyright (c) 2003 Puja Gupta + * Copyright (c) 2003 Harikesavan Krishnan + * Copyright (c) 2003-2007 Stony Brook University + * Copyright (c) 2003-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include "union.h" + +/* + * Delete all of the whiteouts in a given directory for rmdir. + * + * lower directory inode should be locked + */ +int do_delete_whiteouts(struct dentry *dentry, int bindex, + struct unionfs_dir_state *namelist) +{ + int err = 0; + struct dentry *lower_dir_dentry = NULL; + struct dentry *lower_dentry; + char *name = NULL, *p; + struct inode *lower_dir; + int i; + struct list_head *pos; + struct filldir_node *cursor; + + /* Find out lower parent dentry */ + lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex); + BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode)); + lower_dir = lower_dir_dentry->d_inode; + BUG_ON(!S_ISDIR(lower_dir->i_mode)); + + err = -ENOMEM; + name = __getname(); + if (!name) + goto out; + strcpy(name, UNIONFS_WHPFX); + p = name + UNIONFS_WHLEN; + + err = 0; + for (i = 0; !err && i < namelist->size; i++) { + list_for_each(pos, &namelist->list[i]) { + cursor = + list_entry(pos, struct filldir_node, + file_list); + /* Only operate on whiteouts in this branch. */ + if (cursor->bindex != bindex) + continue; + if (!cursor->whiteout) + continue; + + strcpy(p, cursor->name); + lower_dentry = + lookup_one_len(name, lower_dir_dentry, + cursor->namelen + + UNIONFS_WHLEN); + if (IS_ERR(lower_dentry)) { + err = PTR_ERR(lower_dentry); + break; + } + if (lower_dentry->d_inode) + err = vfs_unlink(lower_dir, lower_dentry); + dput(lower_dentry); + if (err) + break; + } + } + + __putname(name); + + /* After all of the removals, we should copy the attributes once. */ + fsstack_copy_attr_times(dentry->d_inode, lower_dir_dentry->d_inode); + +out: + return err; +} + +/* delete whiteouts in a dir (for rmdir operation) using sioq if necessary */ +int delete_whiteouts(struct dentry *dentry, int bindex, + struct unionfs_dir_state *namelist) +{ + int err; + struct super_block *sb; + struct dentry *lower_dir_dentry; + struct inode *lower_dir; + struct sioq_args args; + + sb = dentry->d_sb; + + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode)); + BUG_ON(bindex < dbstart(dentry)); + BUG_ON(bindex > dbend(dentry)); + err = is_robranch_super(sb, bindex); + if (err) + goto out; + + lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex); + BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode)); + lower_dir = lower_dir_dentry->d_inode; + BUG_ON(!S_ISDIR(lower_dir->i_mode)); + + mutex_lock(&lower_dir->i_mutex); + if (!permission(lower_dir, MAY_WRITE | MAY_EXEC, NULL)) + err = do_delete_whiteouts(dentry, bindex, namelist); + else { + args.deletewh.namelist = namelist; + args.deletewh.dentry = dentry; + args.deletewh.bindex = bindex; + run_sioq(__delete_whiteouts, &args); + err = args.err; + } + mutex_unlock(&lower_dir->i_mutex); + +out: + return err; +} + +#define RD_NONE 0 +#define RD_CHECK_EMPTY 1 +/* The callback structure for check_empty. */ +struct unionfs_rdutil_callback { + int err; + int filldir_called; + struct unionfs_dir_state *rdstate; + int mode; +}; + +/* This filldir function makes sure only whiteouts exist within a directory. */ +static int readdir_util_callback(void *dirent, const char *name, int namelen, + loff_t offset, u64 ino, unsigned int d_type) +{ + int err = 0; + struct unionfs_rdutil_callback *buf = dirent; + int whiteout = 0; + struct filldir_node *found; + + buf->filldir_called = 1; + + if (name[0] == '.' && (namelen == 1 || + (name[1] == '.' && namelen == 2))) + goto out; + + if (namelen > UNIONFS_WHLEN && + !strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN)) { + namelen -= UNIONFS_WHLEN; + name += UNIONFS_WHLEN; + whiteout = 1; + } + + found = find_filldir_node(buf->rdstate, name, namelen); + /* If it was found in the table there was a previous whiteout. */ + if (found) + goto out; + + /* + * if it wasn't found and isn't a whiteout, the directory isn't + * empty. + */ + err = -ENOTEMPTY; + if ((buf->mode == RD_CHECK_EMPTY) && !whiteout) + goto out; + + err = add_filldir_node(buf->rdstate, name, namelen, + buf->rdstate->bindex, whiteout); + +out: + buf->err = err; + return err; +} + +/* Is a directory logically empty? */ +int check_empty(struct dentry *dentry, struct unionfs_dir_state **namelist) +{ + int err = 0; + struct dentry *lower_dentry = NULL; + struct super_block *sb; + struct file *lower_file; + struct unionfs_rdutil_callback *buf = NULL; + int bindex, bstart, bend, bopaque; + + sb = dentry->d_sb; + + + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode)); + + if ((err = unionfs_partial_lookup(dentry))) + goto out; + + bstart = dbstart(dentry); + bend = dbend(dentry); + bopaque = dbopaque(dentry); + if (0 <= bopaque && bopaque < bend) + bend = bopaque; + + buf = kmalloc(sizeof(struct unionfs_rdutil_callback), GFP_KERNEL); + if (!buf) { + err = -ENOMEM; + goto out; + } + buf->err = 0; + buf->mode = RD_CHECK_EMPTY; + buf->rdstate = alloc_rdstate(dentry->d_inode, bstart); + if (!buf->rdstate) { + err = -ENOMEM; + goto out; + } + + /* Process the lower directories with rdutil_callback as a filldir. */ + for (bindex = bstart; bindex <= bend; bindex++) { + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex); + if (!lower_dentry) + continue; + if (!lower_dentry->d_inode) + continue; + if (!S_ISDIR(lower_dentry->d_inode->i_mode)) + continue; + + dget(lower_dentry); + unionfs_mntget(dentry, bindex); + branchget(sb, bindex); + lower_file = + dentry_open(lower_dentry, + unionfs_lower_mnt_idx(dentry, bindex), + O_RDONLY); + if (IS_ERR(lower_file)) { + err = PTR_ERR(lower_file); + dput(lower_dentry); + branchput(sb, bindex); + goto out; + } + + do { + buf->filldir_called = 0; + buf->rdstate->bindex = bindex; + err = vfs_readdir(lower_file, + readdir_util_callback, buf); + if (buf->err) + err = buf->err; + } while ((err >= 0) && buf->filldir_called); + + /* fput calls dput for lower_dentry */ + fput(lower_file); + branchput(sb, bindex); + + if (err < 0) + goto out; + } + +out: + if (buf) { + if (namelist && !err) + *namelist = buf->rdstate; + else if (buf->rdstate) + free_rdstate(buf->rdstate); + kfree(buf); + } + + + return err; +} diff --git a/fs/unionfs/fanout.h b/fs/unionfs/fanout.h new file mode 100644 index 0000000..c5bf454 --- /dev/null +++ b/fs/unionfs/fanout.h @@ -0,0 +1,353 @@ +/* + * Copyright (c) 2003-2007 Erez Zadok + * Copyright (c) 2003-2006 Charles P. Wright + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek + * Copyright (c) 2005 Arun M. Krishnakumar + * Copyright (c) 2004-2006 David P. Quigley + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair + * Copyright (c) 2003 Puja Gupta + * Copyright (c) 2003 Harikesavan Krishnan + * Copyright (c) 2003-2007 Stony Brook University + * Copyright (c) 2003-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef _FANOUT_H_ +#define _FANOUT_H_ + +/* + * Inode to private data + * + * Since we use containers and the struct inode is _inside_ the + * unionfs_inode_info structure, UNIONFS_I will always (given a non-NULL + * inode pointer), return a valid non-NULL pointer. + */ +static inline struct unionfs_inode_info *UNIONFS_I(const struct inode *inode) +{ + return container_of(inode, struct unionfs_inode_info, vfs_inode); +} + +#define ibstart(ino) (UNIONFS_I(ino)->bstart) +#define ibend(ino) (UNIONFS_I(ino)->bend) + +/* Superblock to private data */ +#define UNIONFS_SB(super) ((struct unionfs_sb_info *)(super)->s_fs_info) +#define sbstart(sb) 0 +#define sbend(sb) (UNIONFS_SB(sb)->bend) +#define sbmax(sb) (UNIONFS_SB(sb)->bend + 1) +#define sbhbid(sb) (UNIONFS_SB(sb)->high_branch_id) + +/* File to private Data */ +#define UNIONFS_F(file) ((struct unionfs_file_info *)((file)->private_data)) +#define fbstart(file) (UNIONFS_F(file)->bstart) +#define fbend(file) (UNIONFS_F(file)->bend) + +/* macros to manipulate branch IDs in stored in our superblock */ +static inline int branch_id(struct super_block *sb, int index) +{ + BUG_ON(!sb || index < 0); + return UNIONFS_SB(sb)->data[index].branch_id; +} + +static inline void set_branch_id(struct super_block *sb, int index, int val) +{ + BUG_ON(!sb || index < 0); + UNIONFS_SB(sb)->data[index].branch_id = val; +} + +static inline void new_branch_id(struct super_block *sb, int index) +{ + BUG_ON(!sb || index < 0); + set_branch_id(sb, index, ++UNIONFS_SB(sb)->high_branch_id); +} + +/* + * Find new index of matching branch with an existing superblock of a known + * (possibly old) id. This is needed because branches could have been + * added/deleted causing the branches of any open files to shift. + * + * @sb: the new superblock which may have new/different branch IDs + * @id: the old/existing id we're looking for + * Returns index of newly found branch (0 or greater), -1 otherwise. + */ +static inline int branch_id_to_idx(struct super_block *sb, int id) +{ + int i; + for (i = 0; i < sbmax(sb); i++) { + if (branch_id(sb, i) == id) + return i; + } + /* in the non-ODF code, this should really never happen */ + printk(KERN_WARNING "unionfs: cannot find branch with id %d\n", id); + return -1; +} + +/* File to lower file. */ +static inline struct file *unionfs_lower_file(const struct file *f) +{ + BUG_ON(!f); + return UNIONFS_F(f)->lower_files[fbstart(f)]; +} + +static inline struct file *unionfs_lower_file_idx(const struct file *f, + int index) +{ + BUG_ON(!f || index < 0); + return UNIONFS_F(f)->lower_files[index]; +} + +static inline void unionfs_set_lower_file_idx(struct file *f, int index, + struct file *val) +{ + BUG_ON(!f || index < 0); + UNIONFS_F(f)->lower_files[index] = val; + /* save branch ID (may be redundant?) */ + UNIONFS_F(f)->saved_branch_ids[index] = + branch_id((f)->f_dentry->d_sb, index); +} + +static inline void unionfs_set_lower_file(struct file *f, struct file *val) +{ + BUG_ON(!f); + unionfs_set_lower_file_idx((f), fbstart(f), (val)); +} + +/* Inode to lower inode. */ +static inline struct inode *unionfs_lower_inode(const struct inode *i) +{ + BUG_ON(!i); + return UNIONFS_I(i)->lower_inodes[ibstart(i)]; +} + +static inline struct inode *unionfs_lower_inode_idx(const struct inode *i, + int index) +{ + BUG_ON(!i || index < 0); + return UNIONFS_I(i)->lower_inodes[index]; +} + +static inline void unionfs_set_lower_inode_idx(struct inode *i, int index, + struct inode *val) +{ + BUG_ON(!i || index < 0); + UNIONFS_I(i)->lower_inodes[index] = val; +} + +static inline void unionfs_set_lower_inode(struct inode *i, struct inode *val) +{ + BUG_ON(!i); + UNIONFS_I(i)->lower_inodes[ibstart(i)] = val; +} + +/* Superblock to lower superblock. */ +static inline struct super_block *unionfs_lower_super( + const struct super_block *sb) +{ + BUG_ON(!sb); + return UNIONFS_SB(sb)->data[sbstart(sb)].sb; +} + +static inline struct super_block *unionfs_lower_super_idx( + const struct super_block *sb, + int index) +{ + BUG_ON(!sb || index < 0); + return UNIONFS_SB(sb)->data[index].sb; +} + +static inline void unionfs_set_lower_super_idx(struct super_block *sb, + int index, + struct super_block *val) +{ + BUG_ON(!sb || index < 0); + UNIONFS_SB(sb)->data[index].sb = val; +} + +static inline void unionfs_set_lower_super(struct super_block *sb, + struct super_block *val) +{ + BUG_ON(!sb); + UNIONFS_SB(sb)->data[sbstart(sb)].sb = val; +} + +/* Branch count macros. */ +static inline int branch_count(const struct super_block *sb, int index) +{ + BUG_ON(!sb || index < 0); + return atomic_read(&UNIONFS_SB(sb)->data[index].open_files); +} + +static inline void set_branch_count(struct super_block *sb, int index, int val) +{ + BUG_ON(!sb || index < 0); + atomic_set(&UNIONFS_SB(sb)->data[index].open_files, val); +} + +static inline void branchget(struct super_block *sb, int index) +{ + BUG_ON(!sb || index < 0); + atomic_inc(&UNIONFS_SB(sb)->data[index].open_files); +} + +static inline void branchput(struct super_block *sb, int index) +{ + BUG_ON(!sb || index < 0); + atomic_dec(&UNIONFS_SB(sb)->data[index].open_files); +} + +/* Dentry macros */ +static inline struct unionfs_dentry_info *UNIONFS_D(const struct dentry *dent) +{ + BUG_ON(!dent); + return dent->d_fsdata; +} + +static inline int dbstart(const struct dentry *dent) +{ + BUG_ON(!dent); + return UNIONFS_D(dent)->bstart; +} + +static inline void set_dbstart(struct dentry *dent, int val) +{ + BUG_ON(!dent); + UNIONFS_D(dent)->bstart = val; +} + +static inline int dbend(const struct dentry *dent) +{ + BUG_ON(!dent); + return UNIONFS_D(dent)->bend; +} + +static inline void set_dbend(struct dentry *dent, int val) +{ + BUG_ON(!dent); + UNIONFS_D(dent)->bend = val; +} + +static inline int dbopaque(const struct dentry *dent) +{ + BUG_ON(!dent); + return UNIONFS_D(dent)->bopaque; +} + +static inline void set_dbopaque(struct dentry *dent, int val) +{ + BUG_ON(!dent); + UNIONFS_D(dent)->bopaque = val; +} + +static inline void unionfs_set_lower_dentry_idx(struct dentry *dent, int index, + struct dentry *val) +{ + BUG_ON(!dent || index < 0); + UNIONFS_D(dent)->lower_paths[index].dentry = val; +} + +static inline struct dentry *unionfs_lower_dentry_idx( + const struct dentry *dent, + int index) +{ + BUG_ON(!dent || index < 0); + return UNIONFS_D(dent)->lower_paths[index].dentry; +} + +static inline struct dentry *unionfs_lower_dentry(const struct dentry *dent) +{ + BUG_ON(!dent); + return unionfs_lower_dentry_idx(dent, dbstart(dent)); +} + +static inline void unionfs_set_lower_mnt_idx(struct dentry *dent, int index, + struct vfsmount *mnt) +{ + BUG_ON(!dent || index < 0); + UNIONFS_D(dent)->lower_paths[index].mnt = mnt; +} + +static inline struct vfsmount *unionfs_lower_mnt_idx( + const struct dentry *dent, + int index) +{ + BUG_ON(!dent || index < 0); + return UNIONFS_D(dent)->lower_paths[index].mnt; +} + +static inline struct vfsmount *unionfs_lower_mnt(const struct dentry *dent) +{ + BUG_ON(!dent); + return unionfs_lower_mnt_idx(dent, dbstart(dent)); +} + +/* Macros for locking a dentry. */ +static inline void unionfs_lock_dentry(struct dentry *d) +{ + BUG_ON(!d); + mutex_lock(&UNIONFS_D(d)->lock); +} + +static inline void unionfs_unlock_dentry(struct dentry *d) +{ + BUG_ON(!d); + mutex_unlock(&UNIONFS_D(d)->lock); +} + +static inline void verify_locked(struct dentry *d) +{ + BUG_ON(!d); + BUG_ON(!mutex_is_locked(&UNIONFS_D(d)->lock)); +} + +/* copy a/m/ctime from the lower branch with the newest times */ +static inline void unionfs_copy_attr_times(struct inode *upper) +{ + int bindex; + struct inode *lower; + + if (!upper) + return; + for (bindex=ibstart(upper); bindex <= ibend(upper); bindex++) { + lower = unionfs_lower_inode_idx(upper, bindex); + if (!lower) + continue; /* not all lower dir objects may exist */ + if (timespec_compare(&upper->i_mtime, &lower->i_mtime) < 0) + upper->i_mtime = lower->i_mtime; + if (timespec_compare(&upper->i_ctime, &lower->i_ctime) < 0) + upper->i_ctime = lower->i_ctime; + if (timespec_compare(&upper->i_atime, &lower->i_atime) < 0) + upper->i_atime = lower->i_atime; + /* XXX: should we notify_change on our upper inode? */ + } +} + +/* + * A unionfs/fanout version of fsstack_copy_attr_all. Uses a + * unionfs_get_nlinks to properly calcluate the number of links to a file. + * Also, copies the max() of all a/m/ctimes for all lower inodes (which is + * important if the lower inode is a directory type) + */ +static inline void unionfs_copy_attr_all(struct inode *dest, + const struct inode *src) +{ + dest->i_mode = src->i_mode; + dest->i_uid = src->i_uid; + dest->i_gid = src->i_gid; + dest->i_rdev = src->i_rdev; + + unionfs_copy_attr_times(dest); + + dest->i_blkbits = src->i_blkbits; + dest->i_flags = src->i_flags; + + /* + * Update the nlinks AFTER updating the above fields, because the + * get_links callback may depend on them. + */ + dest->i_nlink = unionfs_get_nlinks(dest); +} + +#endif /* not _FANOUT_H */ diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c new file mode 100644 index 0000000..3f6b2d0 --- /dev/null +++ b/fs/unionfs/file.c @@ -0,0 +1,250 @@ +/* + * Copyright (c) 2003-2007 Erez Zadok + * Copyright (c) 2003-2006 Charles P. Wright + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek + * Copyright (c) 2005-2006 Junjiro Okajima + * Copyright (c) 2005 Arun M. Krishnakumar + * Copyright (c) 2004-2006 David P. Quigley + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair + * Copyright (c) 2003 Puja Gupta + * Copyright (c) 2003 Harikesavan Krishnan + * Copyright (c) 2003-2007 Stony Brook University + * Copyright (c) 2003-2007 The Research Foundation of SUNY + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include "union.h" + +static ssize_t unionfs_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + int err; + + unionfs_read_lock(file->f_path.dentry->d_sb); + if ((err = unionfs_file_revalidate(file, 0))) + goto out; + unionfs_check_file(file); + + err = do_sync_read(file, buf, count, ppos); + + if (err >= 0) + touch_atime(unionfs_lower_mnt(file->f_path.dentry), + unionfs_lower_dentry(file->f_path.dentry)); + +out: + unionfs_read_unlock(file->f_path.dentry->d_sb); + unionfs_check_file(file); + return err; +} + +sta