[Samba] "wbinfo -u" considered harmful towards Winbindd...
Peter Eriksson
peter at ifm.liu.se
Sun Dec 9 19:20:00 UTC 2018
Our setup:
Windows AD realm with ~115K users (and numerous groups etc)
FreeBSD servers with Samba 4.7.6 and Samba 4.9.3 (both show the same growth)
We just noticed that one of the ‘winbindd’ daemons on the servers seems to be growing and growing forever. A bit of detective work pointed us at the “wbinfo -u” command being that culprit. As part of a systems monitoring script we ran that once a minute (now disabled) in order to see if all AD users were detected, but somehow that seems to fail sometime and also cause the Winbindd daemon to grow around 455MB per hour… the memory used is not a huge problem on the production servers (they have 256GB RAM) so we didn’t notice this at first (since we restart smbd&winbindd every morning at 7am) - but an old test server with much less RAM ran out of memory around 4:30am… :-)
This is what the process growth look like with a cron job running “wbinfo -u” once a minute:
> 10:50/ps-auxwww.log:root 7658 0.0 0.7 1999680 1861712 - S 07:00 21:54.66 winbindd: domain child [AD] (winbindd)
> 11:00/ps-auxwww.log:root 7658 0.0 0.7 2077504 1943248 - I 07:00 22:48.07 winbindd: domain child [AD] (winbindd)
> 11:10/ps-auxwww.log:root 7658 0.0 0.8 2157376 2022720 - S 07:00 23:41.83 winbindd: domain child [AD] (winbindd)
> 11:20/ps-auxwww.log:root 7658 95.3 0.8 2218816 2063896 - R 07:00 24:26.61 winbindd: domain child [AD] (winbindd)
> 11:30/ps-auxwww.log:root 7658 0.0 0.8 2243392 2107712 - I 07:00 24:39.43 winbindd: domain child [AD] (winbindd)
> 11:40/ps-auxwww.log:root 7658 0.0 0.8 2321216 2185384 - S 07:00 25:35.37 winbindd: domain child [AD] (winbindd)
> 11:50/ps-auxwww.log:root 7658 0.0 0.9 2452288 2317168 - S 07:00 27:21.18 winbindd: domain child [AD] (winbindd)
> 12:00/ps-auxwww.log:root 7658 0.0 0.9 2534208 2399384 - S 07:00 28:15.81 winbindd: domain child [AD] (winbindd)
> 12:10/ps-auxwww.log:root 7658 96.4 0.9 2587456 2433792 - R 07:00 29:52.67 winbindd: domain child [AD] (winbindd)
> 12:20/ps-auxwww.log:root 7658 0.0 0.9 2609984 2474460 - S 07:00 30:07.13 winbindd: domain child [AD] (winbindd)
> 12:30/ps-auxwww.log:root 7658 0.0 1.0 2736960 2672352 - S 07:00 32:03.49 winbindd: domain child [AD] (winbindd)
> 12:40/ps-auxwww.log:root 7658 0.0 1.0 2736960 2672744 - S 07:00 32:03.51 winbindd: domain child [AD] (winbindd)
This is a listing of the user listings from “wbinfo -u” during that same time:
> -rw-r--r-- 1 root wheel 0 Dec 9 10:48 10:48/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:49 10:49/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:50 10:50/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:51 10:51/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:52 10:52/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:53 10:53/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:54 10:54/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:55 10:55/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:56 10:56/wbinfo-u.log
> -rw-r--r-- 1 root wheel 0 Dec 9 10:57 10:57/wbinfo-u.log
> -rw-r--r-- 1 root wheel 0 Dec 9 10:58 10:58/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 10:59 10:59/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:00 11:00/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:01 11:01/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:02 11:02/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:03 11:03/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:04 11:04/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:05 11:05/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:06 11:06/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:07 11:07/wbinfo-u.log
> -rw-r--r-- 1 root wheel 0 Dec 9 11:08 11:08/wbinfo-u.log
> -rw-r--r-- 1 root wheel 0 Dec 9 11:09 11:09/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:10 11:10/wbinfo-u.log
> -rw-r--r-- 1 root wheel 1062356 Dec 9 11:11 11:11/wbinfo-u.log
An attempt I just made at manually running the “wbinfo -u” command gave:
> # ps uxwwww 7658
> USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
> root 7658 0.0 1.0 2736960 2675044 - S 07:00 32:04.70 winbindd: domain child [AD] (winbindd)
> # /usr/bin/time /liu/bin/wbinfo -u > /tmp/wbinfo-u.out
> Error looking up domain users
> 64.82 real 0.00 user 0.00 sys
> # ls -l /tmp/wbinfo-u.out
> -rw-r--r-- 1 root wheel 0 Dec 9 19:50 /tmp/wbinfo-u.out
> # ps uxwwww 7658
> USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
> root 7658 0.0 1.0 2775872 2715664 - I 07:00 33:13.21 winbindd: domain child [AD] (winbindd)
I have saved a gcore-generated core dump from one of the big Winbindd:s in case it might be of use in order to pin-point the leak. Here’s a GDB stack trace (doubt that gives much info though):
> Reading symbols from /liu/sbin/winbindd...done.
> [New LWP 102857]
> Core was generated by `winbindd: domain child [AD]'.
> #0 0x0000000809b48948 in ?? () from /lib/libc.so.7
> (gdb) bt
> #0 0x0000000809b48948 in ?? () from /lib/libc.so.7
> #1 0x0000000809bad3c5 in snprintf () from /lib/libc.so.7
> #2 0x0000000806852f56 in namemap_cache_set_sid2name (sid=sid at entry=0x7fffffffccf0, domain=<optimized out>, domain at entry=0x8be412930 "AD", name=name at entry=0x8becdef10 "annha585",
> type=type at entry=SID_NAME_USER, timeout=1544354765) at ../source3/lib/namemap_cache.c:74
> #3 0x0000000001054ff6 in wcache_save_sid_to_name (domain=domain at entry=0x812669fe0, status=..., sid=sid at entry=0x7fffffffccf0, domain_name=0x8be412930 "AD", name=0x8becdef10 "annha585",
> type=SID_NAME_USER) at ../source3/winbindd/winbindd_cache.c:969
> #4 0x0000000001059b0f in wb_cache_rids_to_names (domain=domain at entry=0x812669fe0, mem_ctx=0x81261d660, domain_sid=domain_sid at entry=0x8b3e881e0, rids=rids at entry=0x8b800dfa0,
> num_rids=num_rids at entry=117864, domain_name=domain_name at entry=0x7fffffffcd90, names=0x7fffffffcd98, types=0x7fffffffcda0) at ../source3/winbindd/winbindd_cache.c:2202
> #5 0x000000000107b4d0 in _wbint_LookupRids (p=p at entry=0x7fffffffce30, r=r at entry=0x8a9444080) at ../source3/winbindd/winbindd_dual_srv.c:660
> #6 0x00000000010cf243 in api_wbint_LookupRids (p=0x7fffffffce30) at default/librpc/gen_ndr/srv_winbind.c:1353
> #7 0x000000000107a345 in winbindd_dual_ndrcmd (domain=<optimized out>, state=0x7fffffffd0c8) at ../source3/winbindd/winbindd_dual_ndr.c:369
> #8 0x0000000001076082 in child_process_request (state=0x7fffffffd0c8, child=<optimized out>) at ../source3/winbindd/winbindd_dual.c:745
> #9 child_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=0x7fffffffd0c0) at ../source3/winbindd/winbindd_dual.c:1602
> #10 0x000000080984bafd in tevent_common_invoke_fd_handler () from /usr/local/lib/libtevent.so.0
> #11 0x000000080984e85b in ?? () from /usr/local/lib/libtevent.so.0
> #12 0x000000080984ae4e in _tevent_loop_once () from /usr/local/lib/libtevent.so.0
> #13 0x00000000010797de in fork_domain_child (child=0x812681760) at ../source3/winbindd/winbindd_dual.c:1817
> #14 0x0000000001079956 in wb_child_request_waited (subreq=0x0) at ../source3/winbindd/winbindd_dual.c:238
> #15 0x000000080984bf37 in tevent_common_invoke_immediate_handler () from /usr/local/lib/libtevent.so.0
> #16 0x000000080984bf94 in tevent_common_loop_immediate () from /usr/local/lib/libtevent.so.0
> #17 0x000000080984e17c in ?? () from /usr/local/lib/libtevent.so.0
> #18 0x000000080984ae4e in _tevent_loop_once () from /usr/local/lib/libtevent.so.0
> #19 0x000000000104e9c2 in main (argc=<optimized out>, argv=<optimized out>) at ../source3/winbindd/winbindd.c:1911
smb.conf stuff related to Winbindd:
> ; Security type
> security = ADS
> realm = AD.LIU.SE
> workgroup = AD
>
> ;; ID Mappings
> idmap config * : backend = tdb
> idmap config * : range = 2000000001-2100000000
> idmap config AD : backend = ad
> idmap config AD : range = 1-2000000000
> idmap config AD : schema_mode = rfc2307
> idmap config AD : unix_primary_group = yes
> winbind nested groups = false
> winbind enum users = false
> winbind enum groups = false
> winbind use default domain = yes
> winbind normalize names = yes
> winbind max clients = 1000
> winbind max domain connections = 10
> winbind nss info = template
Let me know if there is something else I can check to pinpoint this leak…
- Peter
More information about the samba
mailing list