CTDB on cluster filesystem without file locking

lazycy yi.chen1982 at gmail.com
Thu Aug 7 12:16:08 GMT 2008


Hi, ronnie
   Your suggestion is helpful! I downloaded the samba 3.2.1 and made it
installed. It works well with CTDB. I made some simple tests for concurrent
write on my cluster filesystem via samba, it is interesting:
   Test environment: a cluster has 2 nodes, CTDB and samba service running
on each node. Also I have another 2 windows machines as client.(client A and
client B) 
1)Copy a word file to my cluster filesystem. First, open it via client A.
2)Open the same file via client B, according to my cluster filesystem's
behavior, the second open will keep asking my cluster file system for write
permission, however, it takes time. Then I found samba does wait the write
permission query return but open the file with read only.  
   Since I don't have any other cluster filesytem which implement standard
file lock on hand, I compare the result with case that 2 client connect to
the same ext3 directory on one of the node. Open the word file on client A,
then, however, when I try to open it via client B, word will complain that
the file has been locked which I think is the right behavior of file lock
mechanism of ext3.

   Next, I plan to use public ip and rr-dns to implement fail-over and
load-balance. I'm a little confused about the public ip address.  Each of my
node has 2 net interface, eth0(eg. 192.168.11.11...) for cluster internal
traffic and eth1(eg. 10.32.108.67...) for public use. My public ip address
file is like below:
10.32.108.67/24 eth1
10.32.108.68/24 eth1 
  Then, when I restart CTDB on 10.32.108.67, I found that network on
10.32.108.68 is not available anymore. Here is log of CTDB from 10.32.108.67
  
2008/08/07 19:55:40.665395 [ 8557]: server/eventscript.c:365 eventscript
releaseip eth1 10.32.108.67 24 called with no timeout
2008/08/07 19:55:40.665975 [ 8559]: Refusing to run event scripts with
option 'releaseip eth1 10.32.108.67 24' while in recovery
2008/08/07 19:55:45.674853 [ 8560]: We are still serving a public address
'10.32.108.67' that we should not be serving.
2008/08/07 19:55:56.799872 [ 8560]: Public address '10.32.108.68' is missing
and we should serve this ip
2008/08/07 19:56:07.846786 [ 8560]: Public address '10.32.108.68' is missing
and we should serve this ip
2008/08/07 19:56:18.989257 [ 8560]: Public address '10.32.108.68' is missing
and we should serve this ip
2008/08/07 19:56:30.110881 [ 8560]: Public address '10.32.108.68' is missing
and we should serve this ip
2008/08/07 19:56:30.112411 [ 8557]: server/eventscript.c:365 eventscript
releaseip eth1 10.32.108.67 24 called with no timeout
2008/08/07 19:56:30.112881 [ 8557]: server/eventscript.c:365 eventscript
releaseip eth1 10.32.108.67 24 called with no timeout
2008/08/07 19:56:30.113414 [10679]: Refusing to run event scripts with
option 'releaseip eth1 10.32.108.67 24' while in recovery
2008/08/07 19:56:30.113698 [10678]: Refusing to run event scripts with
option 'releaseip eth1 10.32.108.67 24' while in recovery
2008/08/07 19:56:42.196079 [ 8560]: Public address '10.32.108.68' is missing
and we should serve this ip
2008/08/07 19:56:42.197549 [ 8557]: server/eventscript.c:365 eventscript
releaseip eth1 10.32.108.67 24 called with no timeout
2008/08/07 19:56:42.198111 [ 8557]: server/eventscript.c:365 eventscript
releaseip eth1 10.32.108.67 24 called with no timeout
2008/08/07 19:56:42.198615 [11175]: Refusing to run event scripts with
option 'releaseip eth1 10.32.108.67 24' while in recovery
2008/08/07 19:56:42.198882 [11174]: Refusing to run event scripts with
option 'releaseip eth1 10.32.108.67 24' while in recovery
2008/08/07 19:56:54.402273 [ 8560]: Public address '10.32.108.68' is missing
and we should serve this ip

Seems I should not configure the public ip addresses according to my node's
ip address on eth1. Then how to identify the public ip addresses' value?
Please give me some suggestions~   
Thanks,
Ethan 



ronnie sahlberg wrote:
> 
> Please update and try the latest git version of samba 3.2.
> There has been recent changes that changes how samba writes to a
> persistent database.
> 
> Your entry :  tdb(/var/ctdb/persistent/secrets.tdb.0):
> tdb_transaction_start: cannot start
> indicates that it is the samba client dbwrapper code that tries to
> perform a tdb transaction,
> something which is does not do in the most recent code since there
> were conflict with transactions
> and tdb locking.
> 
> 
> Please test carefully and report any issues.
> If your filesystem does not provide coherent byte range locking I do
> not think you will get
> correct lock interaction between the CIFS and the NFS world, unless
> you also provide your own NLM replacement daemon.
> 
> CTDB/Samba has never been tested with and havent been designed for use
> with with
> "lockless" cluster filesystems,   so there are probably a lot of
> things that can (and probably will) break in subtle and
> hard-to-diagnose ways.
> 
> Good luck, and keep us informed.
> 
> 
> regards
> ronnie sahlberg
> 
> 
> On Wed, Aug 6, 2008 at 12:36 PM, lazycy <yi.chen1982 at gmail.com> wrote:
>>
>> Hi, Volker tdb(/var/ctdb/persistent/secrets.tdb.0):
>> tdb_transaction_start: cannot start
>>   Thanks for your guide first. I found that if I put the recoverylock
>> file
>> under my cluster filesystem, CTDB will report a error "recovery lock file
>> not locked when recovering" in RECOVERY which caused by fcntl failure. So
>> I
>> put this file under an nfs share directory as a work-around, and this
>> time
>> CTDB seems start correctly.  But when I try to start samba(with CTDB
>> patch),
>> it dumps. I've post the content in log file and my configurations as well
>> on
>> http://www.nabble.com/Samba-with-CTDB-td18827915.html
>>
>> Also if I try to add an user, I will get a error as below:
>> [root at COS1-001 ~]# /usr/local/samba/bin/smbpasswd -a cy
>> tdb(/var/ctdb/persistent/secrets.tdb.0): tdb_transaction_start: cannot
>> start
>> a t
>> tdb(/var/ctdb/persistent/secrets.tdb.0): tdb_transaction_start: cannot
>> start
>> a t
>> pdb_generate_sam_sid: Failed to store generated machine SID.
>> PANIC (pid 11201): Could not generate a machine SID
>>
>> BACKTRACE: 7 stack frames:
>>  #0 /usr/local/samba/bin/smbpasswd(log_stack_trace+0x1a) [0x4f3482]
>>  #1 /usr/local/samba/bin/smbpasswd(smb_panic+0x69) [0x4f331c]
>>  #2 /usr/local/samba/bin/smbpasswd(get_global_sam_sid+0x34) [0x46f5f3]
>>  #3 /usr/local/samba/bin/smbpasswd [0x45edab]
>>  #4 /usr/local/samba/bin/smbpasswd(main+0xcc) [0x45f517]
>>  #5 /lib64/tls/libc.so.6(__libc_start_main+0xda) [0x2ab960b65a7a]
>>  #6 /usr/local/samba/bin/smbpasswd [0x45e4ba]
>> Aborted
>>
>> Seems the same problem which caused by CTDB transaction not samba itself,
>> Can you give me more suggestions if anything that I probably missed in my
>> CTDB configuration?
>>
>> Thanks,
>> Ethan
>>
>>
>> Volker Lendecke wrote:
>>>
>>> On Tue, Aug 05, 2008 at 06:55:43AM -0700, lazycy wrote:
>>>>
>>>> Hi, all
>>>>    I'm trying to implement cluster samba on my cluster file system. My
>>>> solution is CTDB+smaba, I noticed on CTDB homepage, there is one
>>>> sentence
>>>> in
>>>> the prerequisites part:
>>>>    We have primarily used the GPFS filesystem for our testing but any
>>>> cluster filesystem should work as long as it provides correct file
>>>> locking.
>>>>    My concern is that my own cluster filesystem has not implement the
>>>> standard flie locking(fcntl) yet. (However, we use another solution to
>>>> get
>>>> valid behavior when concurrent write operation happened) So, is it
>>>> still
>>>> possible for me to use CTDB? I'm a newbie of CTDB. Very appreciate if
>>>> you
>>>> can give me any suggestion!
>>>
>>> ctdb uses fcntl locking in the shared file system to make
>>> sure only one ctdb master process is around during recovery
>>> and config changes. If more than one process gets this lock,
>>> it won't work. For normal operation we don't depend on
>>> shared fcntl locks though, many cluster file systems are
>>> MUCH too slow with this.
>>>
>>> Volker
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/CTDB-on-cluster-filesystem-without-file-locking-tp18831651p18843323.html
>> Sent from the Samba - samba-technical mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/CTDB-on-cluster-filesystem-without-file-locking-tp18831651p18869077.html
Sent from the Samba - samba-technical mailing list archive at Nabble.com.



More information about the samba-technical mailing list