rsync lib

John E. Malmberg wb8tyw at qsl.net
Mon Jul 11 03:31:14 GMT 2005


Olivier Thauvin wrote:
> Le Tuesday 5 July 2005 17:48, John E. Malmberg a écrit :
> 
>>Olivier Thauvin wrote:
>>
>>>Currently there is no rsync library for rsync network function. the
>>>librsync project provide functions for the file access/md4 part.
>>>
>>>That exactly the reason why I just started a rewritten of rsync using a
>>>struct and to create a real library.
>>> 
> It is a cvs co made recently, true.
> 
> Rsync code use many global ans static variable, this is not usable for a 
> library. So to done a lib, I took rsync, create a struct and it's typedef 
> (see rspeer.h and rspeer.c) and I am removing all global/static variable from 
> code to put it inside the rspeer struct.

There are about 300 to 400 global or static variables in rsync.

Of these, about 75% appear to be set at startup and only read by the 
other processes that are created.

Another group of them are only used by a single process ever.

> The current code doesn't works and doesn't compile at time, notice I start 
> this one or two weeks ago and I still have to modify all function to pass the 
> rspeer struct as argument:
> 
> -int allow_access(char *addr, char *host, char *allow_list, char *deny_list)
> +int allow_access(rspeer rsp, char *addr, char *host, char *allow_list, char 
> *deny_list)

Since I was not looking at using this in a reentrant library, I took the 
approach that only a integer thread index was needed to find the correct 
variable.  And only the variables that were used by more than one 
process after the additional processes were forked needed to be changed.

I probably do not have all of the variables classified correctly yet. 
It would help if their names were tagged for their use or a comment on them.

I am told from this list that only three processes are active, and in 
getting the just the client to work, I am seeing only two.  So I only 
need an array of three structures which I set up as a static.  So I 
still have to find out what the third process is used for and how to do 
the same on OpenVMS with out a fork() routine.

A routine could get the thread index by either by having it passed as a 
parameter, or could make a POSIX thread call to find it out.

A library could use a process id also as an index for storage maintained 
internally, with some care for garbage collection.  On UNIX it appears 
that each image is run with it's own process id.  On OpenVMS that is not 
the case, so a different method is needed to detect that the calling 
image exited with out cleaning up.

By having many of the routines like write_int() look up thread index 
instead of getting it passed, it significantly reduces the amount of 
source code that needs to be changed or changes that need to be tracked.

In many cases, only the top of the files need to be changed.

As per your example, access.c that contains allow_access is one of the 
routines that I did not need to change at all, since it never references 
any of the global variables directly or has any local static variables.


Compiler macros are also used to minimize the code changes.

For example:

int am_sender; is a member of a structure of global thread specific 
variables.

When I compile for POSIX threads, a macro gets defined:

#define am_sender main_global[thread].am_sender

So none of the references in the code to am_sender need to be changed.

These macros and structures reside in one module thread_global.h

> 
>>I have an interest in such a project as the normal user interface for
>>OpenVMS is a bit different than on UNIX.
>>
>>In order for me to use such a library, all routines must be thread safe,
>>and allow a single process to do the work.
>  
> I currently do not plan to change rsync code else for making code works from 
> library, but I am in the first step of the project, and open to all 
> improvements/suggestion.
> 
> I am open to any help to.

See http://encompasserve.org/rsync_pthread_pre.zip.

This is a gnu unified diff between a snapshot taken today of the rsync 
source + some additional files.

The files *.gdiff are difference files, the *_xxx.new files need to be 
renamed *.xxx.  The resulting files with *_vms_*.* or *.com, *.mms are 
only for OpenVMS use.

Some routines now take an integer thread parameter, others look it up.

I also made some changes as ANSI C will not allow unsigned char and char 
types to be mixed with out a cast, and fixed some other things that VMS 
will need.

The resulting code with the macro USE_PTHREADS defined currently 
compiles and links on my OpenVMS 8.2 system.

It will probably not run with USE_PTHREADS because I still have to write 
the routine that looks up the thread index.  This is a thread index 
number that I will assign to a thread when it starts as I can not 
predict what the actual thread number would be.  I also have to add code 
to set the stack size for each thread.  By default on OpenVMS Alpha, 
only an 8K stack per thread is allocated, and that is not enough for rsync.

With out that macro defined, it should build on a UNIX/LINUX system and 
produce the same binary as the snapshot it was taken from would.

Currently all my changes to existing routines are done by writing VMS 
specific editor macros.  The rsync.mms is a VMS specific type of 
makefile that has been set up to use them.  I have not included the 
*.tpu files, as I mainly wanted to make these difference files available 
for your inspection.

I will probably remove to files from that server in a few weeks because 
of quota limitations on that volume, and I do not know how long it will 
be before they are out of date.

My current broadband ISP prohibits me setting up my own public server, 
and they have no competition.

-John
wb8tyw at qsl.net
Personal Opinion Only



More information about the rsync mailing list