[PATCH] Reduce memory usage

Johannes Altmanninger aclopte at gmail.com
Mon Sep 27 20:03:01 UTC 2021


> On Sun, Sep 26, 2021 at 01:54:13PM +0200, Jindřich Makovička via rsync wrote:
> 
> Applying the attached patch, which reduces the default allocation to 32
> pointers, and preallocates 32K pointers only for the main file lists in
> send_file_list and recv_file_list, reduces the peak memory usage in my case
> from 142MB to 12MB.

The patch looks very reasonable from what I can tell.
Out of curiosity, how did you measure peak memory?
I used "/bin/time -v rsync", and already get (very small) improvements in
maximum RSS when copying the rsync source tree :)

On Mon, Sep 27, 2021 at 04:42:25PM +0200, Jindřich Makovička via rsync wrote:

> Reduce memory usage
>  
> Start only with 32 entries for the partial file lists, instead of 32k.
>

The log message could be a bit more detailed. You already mentioned that
the big 32k allocation predates the recursive version, that's very useful
information to add here.
Maybe even explain why we change the first check from "flist->malloced !=
FLIST_START" to "flist->files".
(Also I'd use "git send-email" to send patches inline but I'm not sure what's
the convention here)

> diff --git a/flist.c b/flist.c
> index 3442d868..0f7a64e6 100644
> --- a/flist.c
> +++ b/flist.c
> @@ -303,11 +303,11 @@ static void flist_expand(struct file_list *flist, int extra)
>  	if (flist->malloced < flist->used + extra)
>  		flist->malloced = flist->used + extra;
>  
>  	new_ptr = realloc_array(flist->files, struct file_struct *, flist->malloced);
>  
> -	if (DEBUG_GTE(FLIST, 1) && flist->malloced != FLIST_START) {
> +	if (DEBUG_GTE(FLIST, 1) && flist->files) {

Yep, the new check makes more sense because now it's more obvious that the
debug message is only printed when flist->files is realloc'd, and not
when it's allocated for the first time.

>  		rprintf(FCLIENT, "[%s] expand file_list pointer array to %s bytes, did%s move\n",
>  		    who_am_i(),
>  		    big_num(sizeof flist->files[0] * flist->malloced),
>  		    (new_ptr == flist->files) ? " not" : "");
>  	}
> @@ -2184,10 +2184,11 @@ struct file_list *send_file_list(int f, int argc, char *argv[])
>  	if (preserve_hard_links && protocol_version >= 30 && !cur_flist)
>  		init_hard_links();
>  #endif
>  
>  	flist = cur_flist = flist_new(0, "send_file_list");
> +	flist_expand(flist, FLIST_START_LARGE);
>  	if (inc_recurse) {
>  		dir_flist = flist_new(FLIST_TEMP, "send_file_list");

This probably wants an flist_expand(dir_flist, FLIST_START_LARGE), because
dir_flist is a global.  I think the idea is that all flist_new() inside
loops/recursive calls should start small, but lists that are only ever
allocated once should stay at 32k.

>  		flags |= FLAG_DIVERT_DIRS;
>  	} else
>  		dir_flist = cur_flist;
> @@ -2539,10 +2540,11 @@ struct file_list *recv_file_list(int f, int dir_ndx)
>  	if (preserve_hard_links && !first_flist)
>  		init_hard_links();
>  #endif
>  
>  	flist = flist_new(0, "recv_file_list");
> +	flist_expand(flist, FLIST_START_LARGE);
>  
>  	if (inc_recurse) {
>  		if (flist->ndx_start == 1)
>  			dir_flist = flist_new(FLIST_TEMP, "recv_file_list");

Same here I guess. Maybe we should add an "initial size" parameter to
flist_new(), so it can call flist_expand() automatically?

>  		dstart = dir_flist->used;
> diff --git a/rsync.h b/rsync.h
> index 88319732..17f8700e 100644
> --- a/rsync.h
> +++ b/rsync.h
> @@ -916,12 +916,13 @@ extern int xattrs_ndx;
>  
>  /*
>   * Start the flist array at FLIST_START entries and grow it
>   * by doubling until FLIST_LINEAR then grow by FLIST_LINEAR
>   */
> -#define FLIST_START	(32 * 1024)
> -#define FLIST_LINEAR	(FLIST_START * 512)
> +#define FLIST_START	(32)
> +#define FLIST_START_LARGE	(32 * 1024)
> +#define FLIST_LINEAR	(FLIST_START_LARGE * 512)

Probably these should remain aligned (I'm assuming tab has width 8)



More information about the rsync mailing list