Memory/CPU usage

Frank J. Pellegrino pellegri at mailhost.sju.edu
Wed Oct 17 12:08:04 GMT 2001


I have upgraded to samba 2.2.2.  Our memory problem seems to be
correct but we still have CPU usage issues.  Samba quickly starts
eating away at the available CPU.  We are running it on a Sun E3500
with 4 400Mhz processors and 2GB of RAM.  Below is top output
showing the top processes to be smbd.  I currently have approximately
210 smbd processes running.

last pid:  1473;  load averages: 17.02, 14.48, 12.18                   14:01:03
414 processes: 392 sleeping, 15 running, 3 zombie, 4 on cpu
CPU states:  0.0% idle, 17.9% user, 82.1% kernel,  0.0% iowait,  0.0% swap
Memory: 2048M real, 34M free, 1615M swap in use, 433M swap free

   PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
  3768 root       1  33    0 5200K 3832K sleep   6:31  3.94% smbd
16898 root       1  33    0 5528K 4168K sleep   0:50  3.84% smbd
25270 root       1  12    0 5512K 4240K run     0:20  3.78% smbd
21839 root       1  21    0 5496K 4264K run     1:17  3.78% smbd
28038 root       1  13    0 5536K 4456K run     0:13  3.76% smbd
16262 root       1  21    0 5440K 4192K sleep   3:10  3.64% smbd
28783 root       1  12    0 5528K 4400K run     0:06  3.52% smbd
25273 root       1  22    0 5792K 4456K run     0:29  3.51% smbd
29415 root       1  31    0 5528K 4424K cpu15   0:20  3.44% smbd
  4026 root       1  20    0 5512K 4320K run     6:12  3.43% smbd
21534 root       1  22    0 5904K 4872K sleep   2:41  3.27% smbd
27639 root       1  21    0 5528K 4320K sleep   0:15  3.16% smbd
   842 root       1  22    0 5720K 4752K run     5:52  3.15% smbd
23081 root       1  38    0 5776K 4536K sleep   0:57  2.89% smbd
28384 root       1  13    0 5536K 4040K run     0:20  2.68% smbd

When I truss any of these processes they appear to be in a loop for long
periods of time.  Below is a section of the truss.

...
20232:  fcntl(13, F_SETLKW64, 0xEFFFE880)               = 0
20232:  fcntl(13, F_SETLKW64, 0xEFFFE810)               = 0
20232:  fcntl(13, F_SETLKW64, 0xEFFFE810)               = 0
20232:  fcntl(13, F_SETLKW64, 0xEFFFE810)               = 0
20232:  fcntl(13, F_SETLKW64, 0xEFFFE810)               = 0
20232:  fcntl(13, F_SETLKW64, 0xEFFFE810)               = 0
...

If anyone has any suggestions it would be greatly appreciated.

Thanks,

Frank



At 09:14 AM 10/8/2001 +0200, Wagner Guenter wrote:
>RE: Memory/CPU usage
>
>Hi Frank,
>
>to this subject there has been a discussion in the samba-technical
>mailing list, with subject
>"Time-critical problem at Sun: exploding smbd memory usage"
>
>We had the same problem when we upgraded Samba 2.0.6 to 2.0.7
>Since 2.0.7 there is a memory leak in the code and is still in
>2.2.1a.  It should be solved in the coming 2.2.2.
>
>The problem is relevant when there is a great number of printers.
>And we also found, that each time making changes (even "touch"ing)
>the configuration file smb.conf while Samba is running, the
>size of the smbd prozesses grow ("explode").
>
>Richard Bollinger wrote an early patch for 2.0.7.
>We use this patch with 2.0.10 (the relevant part of the code
>is the same) and it works fine. But there have been
>discussions about this fix.
>
>Now, 6 Sep 2001,  Richard Bollinger, wrote "the real fix"
>and it seems, that this will go into 2.2.2.
>
>Here are same of the mailings to this subject:
>(if you have not the time to read all: look at the end)
>
>
>
>exploding-smbd-memory-2.txt
>
>Aus: samba-technical digest, Vol 1 #823 - 7 msgs
>Aus: samba-technical digest, Vol 1 #827 - 14 msgs
>Aus: samba digest, Vol 1 #550 - 41 msgs
>Aus: samba-technical digest, Vol 1 #852 - 10 msgs, 4, (Final Patch!),5
>
>
>
>Message: 4
>Date: Thu, 23 Aug 2001 09:32:12 -0700
>From: Jeremy Allison <jeremy at valinux.com>
>To: Richard Bollinger <rabollinger at home.com>
>Cc: Gerald Carter <gcarter at valinux.com>,
>         Samba Technical <samba-technical at samba.org>
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>
>Richard Bollinger wrote:
> >
> > Funny... I don't see the change in 2_2 CVS... did you really apply it?
> >
> > Ahhh I see... Jeremy took it back out.  Nice of him to do so, but I think
>he's wrong.  The "main
> > loop" doesn't clean things up after each printer is added when we're using
>a [printers] clause in
> > smb.conf.  That loop occurs inside pcap_printer_fn.  Per my testing, this
>only seems to make a
> > difference on Solaris.  Maybe fragmentation is occuring inside their
>malloc() free()?
>
>I took it out as it's not safe to do that free
>inside the printer loop. It's only safe to
>do that talloc delete in the main loop, outside
>of any incoming smb processing.
>
>The talloc delete in the main loop should free
>any memory allocated in the tallocs inside the
>printer allocation. Why does this cause the RSS
>to grow on Solaris ?
>
>insure on Linux does not flag this as a alloc
>bug (and believe me, it would....).
>
>Jeremy.
>
>
>--__--__--
>
>Message: 5
>Date: Thu, 23 Aug 2001 12:48:57 -0400
>From: David Collier-Brown <davecb at canada.sun.com>
>Reply-To: David.Collier-Brown at Sun.COM
>Organization: Priv ate Person
>To: Jeremy Allison <jeremy at valinux.com>
>Cc: Richard Bollinger <rabollinger at home.com>,
>         Gerald Carter <gcarter at valinux.com>,
>         Samba Technical <samba-technical at samba.org>
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>
>Jeremy Allison wrote:
>
> > The talloc delete in the main loop should free
> > any memory allocated in the tallocs inside the
> > printer allocation. Why does this cause the RSS
> > to grow on Solaris ?
> >
> > insure on Linux does not flag this as a alloc
> > bug (and believe me, it would....).
>
>         Ok, lets's look at this when the conference
>         is over: it may be a subtle Solaris bug,
>         and I like to find those.
>
>--dave
>--
>David Collier-Brown,           | Always do right. This will gratify
>Performance & Engineering Team | some people and astonish the rest.
>Americas Customer Engineering  |                      -- Mark Twain
>(905) 415-2849                 | davecb at canada.sun.com
>
>
>
>
>Aus: samba-technical digest, Vol 1 #827 - 14 msgs
>
>
>Message: 4
>From: "Richard Bollinger" <rabollinger at home.com>
>To: <David.Collier-Brown at sun.com>,
>         "Michael E Osborne" <mosborne at jacads.com>
>Cc: <jeremy at valinux.com>, <farrar at parc.xerox.com>,
>         <David.Collier-Brown at sun.com>, "Gerald Carter"
><gcarter at valinux.com>,
>         "Kris Desjardins" <kris_desjardins at hotmail.com>,
><tonys at aus.sun.com>,
>         <craig at aus.sun.com>, <allenw at sun.com>, <samba-technical at samba.org>
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>Date: Tue, 21 Aug 2001 08:33:28 -0400
>
>This is a multi-part message in MIME format.
>
>------=_NextPart_000_000F_01C12A1B.F3D52400
>Content-Type: text/plain;
>         charset="iso-8859-1"
>Content-Transfer-Encoding: 7bit
>
>Try the attached patch to fix the printer related leakage on Solaris.  We
>also have about 300
>printers.
>
>Rich Bollinger, Elliott Company
>
>------=_NextPart_000_000F_01C12A1B.F3D52400
>Content-Type: application/octet-stream;
>         name="fixleaks.patch"
>Content-Transfer-Encoding: quoted-printable
>Content-Disposition: attachment;
>         filename="fixleaks.patch"
>
>*** ../samba-2.0.7/source.Linux/param/loadparm.c        Fri Nov 17 09:24:27
>2000=0A=
>--- ./param/loadparm.c  Sat Nov 18 00:46:50 2000=0A=
>***************=0A=
>*** 2711,2716 ****=0A=
>--- 2711,2718 ----=0A=
>                 if ((i=3Dlp_servicenumber(name)) >=3D 0)=0A=
>                         string_set(&iSERVICE(i).comment,comment);=0A=
>         }=0A=
>+       /* free up temporary memory */=0A=
>+       lp_talloc_free();=0A=
>   }=0A=
>   =0A=
>   =
>/************************************************************************=
>***=0A=
>*** ../samba-2.0.7/source.Linux/smbd/server.c   Thu Mar 16 17:59:52 2000=0A=
>--- ./smbd/server.c     Fri Nov 17 22:58:15 2000=0A=
>***************=0A=
>*** 183,188 ****=0A=
>--- 183,191 ----=0A=
>                 fd_set lfds;=0A=
>                 int num;=0A=
>                 =0A=
>+               /* free up temporary memory */=0A=
>+               lp_talloc_free();=0A=
>+ =0A=
>                 memcpy((char *)&lfds, (char *)&listen_set, =0A=
>                        sizeof(listen_set));=0A=
>                 =0A=
>
>------=_NextPart_000_000F_01C12A1B.F3D52400--
>
>
>
>--__--__--
>
>Message: 5
>Date: Thu, 23 Aug 2001 16:30:21 +1000
>From: tony shepherd <tony.shepherd at aus.sun.com>
>To: Richard Bollinger <rabollinger at home.com>
>Cc: David.Collier-Brown at sun.com,
>         Michael E Osborne <mosborne at jacads.com>, jeremy at valinux.com,
>         farrar at parc.xerox.com, Gerald Carter <gcarter at valinux.com>,
>         Kris Desjardins <kris_desjardins at hotmail.com>, tonys at aus.sun.com,
>         craig at aus.sun.com, allenw at sun.com, samba-technical at samba.org
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>
>Many thanks to all of you for help in tracking down the problem and
>providing a fix.  I have applied the patch and it appears to be
>working.  I will let you know if our testing over the next week or so
>turn up any problems.
>
>Again, thanks for this.  I did not expect the fix to be so quick.
>
>regards
>
>tony
>
>Richard Bollinger wrote:
> >
> > Try the attached patch to fix the printer related leakage on Solaris.  We
>also have about 300
> > printers.
> >
> > Rich Bollinger, Elliott Company
> >
> >   ----------------------------------------------------------------------
> >                      Name: fixleaks.patch
> >    fixleaks.patch    Type: unspecified type (application/octet-stream)
> >                  Encoding: quoted-printable
>
>
>
>Aus: samba digest, Vol 1 #550 - 41 msgs
>
>Message: 3
>Date: Mon, 27 Aug 2001 08:18:16 +1000
>From: tony shepherd <tony.shepherd at aus.sun.com>
>To: "Baker, Byran" <Byran_Baker at bmc.com>
>Cc: "'samba at lists.samba.org'" <samba at lists.samba.org>
>Subject: Re: Ultra 60 with 500+ users running out of memory
>
>We also recently had a problem with memory usage, although in 2.0.10.
>There was a memory leak which was causing the parent smbd to get very
>large, and therefore all spawned smbd's for each new connection. This
>problem was not only in 2.0.10, but earlier versions.  Richard
>Bollinger provided a patch (see attached) which seems to have fixed
>the problem.
>
>We also found that the memory requirements under Solaris seemed to be
>quite a bit higher than that under linux (on cobalt qubes). The size
>of the smbd's was also dependent on the number of shares you were
>providing.  For example, on one particular installation on solaris 8,
>samba 2.0.10 (after patch installtion):
>
>35 shares:
>                                         SIZE    RES-MEM
>parent smbd                             2952    1560
>each new smbd (for each new sessions)   4568    2416
>
>313 shares:
>                                         SIZE    RES-MEM
>parent smbd                             3536    1784
>each new smbd (for each new sessions)   5080    2968
>
>
>Also, old smbd processes were no "going away".  To recover resources,
>we set the "deadtime" parameter to 5.  This removed any inactive
>processes after 5 minutes.
>
>Hope this helps.
>
>tony
>
>"Baker, Byran" wrote:
> >
> > I administer a Sun Ultra60 (2x450MHz, 1.5GB real memory, 2.5GB Swap)
>running
> > Samba 2.0.7.  I run an average of 450 users at any given time without
> > problems.  When I get many more than 500 users, I begin to have memory
> > shortage problems.
> >
> > I am trying to find out how I can tune Samba, to reduce the amount of
>memory
> > needed per user so that I do not have to upgrade machines again (I have
> > upgraded from a SPARCstation 5, to an Ultra 10, to the current Ultra 60
>over
> > the years).  The CPUs are idle most of the time, so my only real concern
>is
> > cutting down the memory usage.
> >
> > Thanks in advance,
> > -Byran
> > --
> > To unsubscribe from this list go to the following URL and read the
> > instructions:  http://lists.samba.org/mailman/listinfo/samba
>
>[demime 0.98b removed an attachment of type application/octet-stream which
>had a name of fixleaks.patch]
>
>--__--__--
>
>Message: 4
>Date: Mon, 27 Aug 2001 08:51:25 +1000
>From: tony shepherd <tony.shepherd at aus.sun.com>
>To: "Baker, Byran" <Byran_Baker at bmc.com>, "'samba at lists.samba.org'"
>   <samba at lists.samba.org>
>Subject: Re: Ultra 60 with 500+ users running out of memory
>
>[snip]
>
> >
> > [demime 0.98b removed an attachment of type application/octet-stream which
>had a name of fixleaks.patch]
> > --
>[snip]
>
>Seems the patch got cut by the list server.  Here it is again.
>
>tony
>*** ../samba-2.0.7/source.Linux/param/loadparm.c        Fri Nov 17 09:24:27
>2000
>--- ./param/loadparm.c  Sat Nov 18 00:46:50 2000
>***************
>*** 2711,2716 ****
>--- 2711,2718 ----
>                 if ((i=lp_servicenumber(name)) >= 0)
>                         string_set(&iSERVICE(i).comment,comment);
>         }
>+       /* free up temporary memory */
>+       lp_talloc_free();
>   }
>
>
>/***************************************************************************
>*** ../samba-2.0.7/source.Linux/smbd/server.c   Thu Mar 16 17:59:52 2000
>--- ./smbd/server.c     Fri Nov 17 22:58:15 2000
>***************
>*** 183,188 ****
>--- 183,191 ----
>                 fd_set lfds;
>                 int num;
>
>+               /* free up temporary memory */
>+               lp_talloc_free();
>+
>                 memcpy((char *)&lfds, (char *)&listen_set,
>                        sizeof(listen_set));
>
>
>
>
>Aus: samba-technical digest, Vol 1 #852 - 10 msgs, 4
>
>Message: 4
>From: "Richard Bollinger" <rabollinger at home.com>
>To: <davecb at canada.sun.com>, "Gerald Carter" <gcarter at valinux.com>,
>         <jeremy at valinux.com>
>Cc: <samba-technical at lists.samba.org>
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage ---
>Here's the real fix!
>Date: Thu, 6 Sep 2001 00:02:36 -0400
>
>This is a multi-part message in MIME format.
>
>------=_NextPart_000_0040_01C13667.3F142220
>Content-Type: text/plain;
>         charset="iso-8859-1"
>Content-Transfer-Encoding: 7bit
>
>Using an old memory allocation debugging / tracking tool (mem_man), I
>monitored what was going
>on while smbd processed our 300+ printer printcap file...
>
>After processing 300 printers, the stats were as follows:
>Mem Manager : 196110 blocks, allocation 11553K, real allocation 11553K, 0
>errors
>
>Of that, talloc() accounted for 192036 of the malloc() calls and 11280K of
>the space allocated.
>
>Sure, all of that would be freed eventually, but it amounts to a torture
>test for the system's
>malloc() / free() capabilities, which apparently aren't as aggressive at
>recovering / releasing
>free space with Solaris as with Linux :-).
>
>I tracked the problem to an O(N^2) loop... add_all_printers() calls
>pcap_printer_fn(), which in
>turn calls lp_add_one_printer(), which in turn calls lp_servicenumber(),
>which in turn calls
>lp_servicename(), which in turn calls lp_string(), which in turn calls
>talloc().
>
>Here's the fix to lp_servicenumber(), based on similar code in
>getservicebyname()...
>
>--- ../source/param/loadparm.c Fri Aug 31 07:15:36 2001
>+++ ./param/loadparm.c Wed Sep  5 22:11:36 2001
>@@ -3418,7 +3424,8 @@
>
>   for (iService = iNumServices - 1; iService >= 0; iService--)
>   if (VALID(iService) &&
>-     strequal(lp_servicename(iService), pszServiceName))
>+     ServicePtrs[iService]->szService &&
>+     strequal(ServicePtrs[iService]->szService, pszServiceName))
>   break;
>
>   if (iService < 0)
>
>After the fix is in, the same memory monitoring tools reveal these stats:
>Mem Manager : 4054 blocks, allocation 537K, real allocation 537K, 0 errors
>
>Of that, talloc() now accounts for only 7 of the malloc() calls and 357
>bytes allocated.
>
>Rich Bollinger, Elliott Company
>
>------=_NextPart_000_0040_01C13667.3F142220
>Content-Type: application/octet-stream;
>         name="fixleaks.patch"
>Content-Transfer-Encoding: quoted-printable
>Content-Disposition: attachment;
>         filename="fixleaks.patch"
>
>--- ../source/param/loadparm.c  Fri Aug 31 07:15:36 2001=0A=
>+++ ./param/loadparm.c  Wed Sep  5 22:11:36 2001=0A=
>@@ -3418,7 +3424,8 @@=0A=
>  =0A=
>         for (iService =3D iNumServices - 1; iService >=3D 0; iService--)=0A=
>                 if (VALID(iService) &&=0A=
>-                   strequal(lp_servicename(iService), pszServiceName))=0A=
>+                   ServicePtrs[iService]->szService &&=0A=
>+                   strequal(ServicePtrs[iService]->szService,
>pszServiceName))=0A=
>                         break;=0A=
>  =0A=
>         if (iService < 0)=0A=
>
>------=_NextPart_000_0040_01C13667.3F142220--
>
>
>
>--__--__--
>
>
>
>
>Aus: samba-technical digest, Vol 1 #852 - 10 msgs, 5
>
>Message: 5
>Date: Wed, 05 Sep 2001 23:54:22 -0700
>From: Jeremy Allison <jeremy at valinux.com>
>Reply-To: jra at samba.org
>To: Richard Bollinger <rabollinger at home.com>
>Cc: davecb at canada.sun.com, Gerald Carter <gcarter at valinux.com>,
>         samba-technical at lists.samba.org
>Subject: Re: Time-critical problem at Sun: exploding smbd memory usage ---
>Here's
>  the real fix!
>
>Richard Bollinger wrote:
> >
> > Using an old memory allocation debugging / tracking tool (mem_man), I
>monitored what was going
> > on while smbd processed our 300+ printer printcap file...
> >
> > After processing 300 printers, the stats were as follows:
> > Mem Manager : 196110 blocks, allocation 11553K, real allocation 11553K, 0
>errors
> >
> > Of that, talloc() accounted for 192036 of the malloc() calls and 11280K of
>the space allocated.
> >
> > Sure, all of that would be freed eventually, but it amounts to a torture
>test for the system's
> > malloc() / free() capabilities, which apparently aren't as aggressive at
>recovering / releasing
> > free space with Solaris as with Linux :-).
> >
> > I tracked the problem to an O(N^2) loop... add_all_printers() calls
>pcap_printer_fn(), which in
> > turn calls lp_add_one_printer(), which in turn calls lp_servicenumber(),
>which in turn calls
> > lp_servicename(), which in turn calls lp_string(), which in turn calls
>talloc().
> >
> > Here's the fix to lp_servicenumber(), based on similar code in
>getservicebyname()...
> >
> > --- ../source/param/loadparm.c Fri Aug 31 07:15:36 2001
> > +++ ./param/loadparm.c Wed Sep  5 22:11:36 2001
> > @@ -3418,7 +3424,8 @@
> >
> >   for (iService = iNumServices - 1; iService >= 0; iService--)
> >   if (VALID(iService) &&
> > -     strequal(lp_servicename(iService), pszServiceName))
> > +     ServicePtrs[iService]->szService &&
> > +     strequal(ServicePtrs[iService]->szService, pszServiceName))
> >   break;
> >
> >   if (iService < 0)
> >
> > After the fix is in, the same memory monitoring tools reveal these stats:
> > Mem Manager : 4054 blocks, allocation 537K, real allocation 537K, 0 errors
> >
> > Of that, talloc() now accounts for only 7 of the malloc() calls and 357
>bytes allocated.
>
>*Great* detective work ! Thanks. I'll commit this fix
>to 2.2 and HEAD as soon as samba.org comes back on
>line for me :-).
>
>Jeremy.
>
>
>
>
>
>
>
>
>G. Wagner
>
>--------------------------------------
>Günter Wagner
>MKG Kreditbank GmbH
>Schieferstein 5
>D-65439 Flörsheim
>
>Telefon:  +49 6145 506 358
>FAX:      +49 6145 506 356
>E-Mail:   g.wagner at mkg-bank.de
>--------------------------------------





More information about the samba mailing list