[linux-cifs-client] CIFS Umount apparently causes loss of data

Tue Jun 24 01:13:57 GMT 2008

On Mon, 23 Jun 2008 12:30:02 +0100
"Richard Walters" <richard.walters at tdbnetworks.com> wrote:

> Dear linux-cifs-client list,
> 
> As root user, I am mounting a cifs mount on a RHEL4 server, where the
> filstore is located on a Windows 2000 server on the same network
> segment.  There are no firewalls in between the two physical machines.
> CIFS version is 1.48a.RH.
> 
> I am using the following to mount the share, create a 150 Mb sparsefile,
> associate the sparsefile with a loop device, create an ext3 filesystem
> on the sparsefile, and then mount the subsequently created filesystem.
> 
>    mount -t cifs --verbose -o
>    forcedirectio,username=<DOMAIN>\<USERNAME>,password=<PASSWORD>
>    //<WINDOWS MACHINE>/<MOUNT LOCATION> /mnt/mountpoint 
>     
>    dd if=/dev/zero of="/mnt/mountpoint/SPARSEFILE" bs=1M count=1
>    seek="150" 
>     
>    losetup /dev/loop20 /mnt/mountpoint/SPARSEFILE  
>     
>    mkfs -t ext3 /dev/loop20 
>     
>    mount -t ext3 /dev/loop20 /mnt/backup  
>  
> All of the above occurs without error. 
>  
> I then write files and directory structures to /mnt/backup and confirm
> that they are there with all the correct permissions etc etc.  I can
> manipulate the data in /mnt/backup - change permissions etc etc.
>  
> Then I issue: 
>  
>    umount /mnt/backup
>    losetup -d /dev/loop20
>    umount /mnt/mountpoint 
>  
> At this point I expect the files and directory structures written to
> /mnt/backup to have been populated to the file system created in the
> sparsefile, which is no longer mounted.
> 
> However, if I remount using the following: 
>  
>    mount -t cifs --verbose -o
>    username=<DOMAIN>\<USERNAME>,password=<PASSWORD> //<WINDOWS
>    MACHINE>/<MOUNT LOCATION> /mnt/mountpoint 
>     
>    losetup /dev/loop20 /mnt/mountpoint/SPARSEFILE  
>     
>    mount -t ext3 /dev/loop20 /mnt/backup  
>  
> I find that the data on /mnt/backup is not as I expect.  Generally,
> files in the root (/mnt/backup) are as expected, but any directory
> structures have disappeared.  Changed permissions on files in the root
> remain changed.  Deleted files in the root partition have reappeared.
> I am even getting inconsistent results on the above - depending on how
> quickly I unmount, and on how many files/directories I copy to
> /mnt/backup.  
> 
> It appears from investigation that the files remain until the umount of
> the cifs filesystem. 
> 
> For example, if I just umount the loop device (and complete the losetup
> -d), and then complete 
> 
>    losetup /dev/loop20 /mnt/mountpoint/SPARSEFILE  
>     
>    mount -t ext3 /dev/loop20 /mnt/backup  
> 
> Everything is just as I would expect - there are no surprises at all.
>  
> Initially I suspected cifs data caching, so I employed the directio
> option on the cifs mount command - there has been no discernable
> difference.  dmesg shows that this option is NOT rejected, but looking
> at the /proc/mounts the option does not show up, although this could be
> a red herring: 
>  
>    //WINDOWS MACHINE/MOUNT LOCATION /mnt/mountpoint cifs
>    rw,mand,noatime, nodiratime,unc=\\WINDOWS MACHINE\MOUNT
>    LOCATION,username=<username>,domain=<windows
>    domain>,rsize=16384,wsize=57344 0 0 
> 
> It really does appear that the data loss occurs on the cifs umount - I
> suspected inode caching, but the directio option should have ensured
> that this was OK.
> 
> Out of interest, I have tried this on RHEL5 to a different Windows file
> server, but end up with a similar result.  I have also tried using ext2
> rather than ext3 filesystems - there is no change to the overall
> behaviour.
> 
> I have deployed debug logging on cifs, but dmesg does not provide any
> particular clue as to an underlying cause - all routine rcs = 0
> 
> Has anyone come across something similar, or can point me in the right
> direction to resolve this?
> 

A bit of a strange use-case, but in principle, it should work. This
particular problem isn't ringing any bells for me though...

1.48aRH is pretty old by now. My first suggestion would be to test a
RHEL4.7-ish or 5.2-ish kernel. Those have updated CIFS code. Even
better might be to test the kernels on my people page:

    http://people.redhat.com/jlayton

...they have some patches I'm considering for later updates that are
not yet in the current RHEL releases. If the problem isn't resolved in
there, then I'd suggest opening a support case and having the RH
support people escalate this up to a BZ and we can start working on it
there.

It sounds like there's a nice, well-defined reproducer, so we should be
able to figure out the problem. I'll be on vacation until next week
though, so I'll plan to have a closer look at this after then...

Cheers,
-- 
Jeff Layton <jlayton at redhat.com>