[Samba] Odd behavior since upgrading to 4.9.6
mray at xes-inc.com
Tue Apr 23 16:11:21 UTC 2019
About a week and half ago I upgraded from 4.0.12 to 4.9.6. Overall, things are functioning.
However, I have come across several strange behaviors and wandered if anyone else has noticed similar behavior on 4.9.6 or has any suggestions of what might be occurring.
As background information, I have 3 DCs (dc3, dc4 and dc5) -- all running the same version (4.9.6) and all have the same configuration; dc3 was the original holder of all 7 FSMO roles, but as of last night, they were all transferred to dc4.
First off, all the DCs hold steady at different levels of memory utilized. dc3 hovers at about 1.5 GB used, dc4 hovers at about .75 GB used and dc5 hovers at a little less than .5 GB used. I think that the difference in memory used might be related is the number of samba/smbd processes running; dc3 has about 250 samba/smbd processes running, dc4 has about 100 and dc5 has about 30. But why are so many more clients connecting to dc3?
Secondly, dc5 has been having quirky issues ever since the upgrade. I run various health checks on the DCs nightly and it seems that every other day "samba-tool drs kcc dc5" from one of the other two DCs fails with "ERROR(runtime): DsExecuteKCC failed - (3221356597, 'The operation cannot be performed.')". dc5 also has issues creating an online backup and intermittently errors out with: "ERROR(<type 'exceptions.IndexError'>): uncaught exception - list index out of range". I did see a note about this in the troubleshooting section of the samba backup wiki page; however, the error comes and goes, so I don't know if this means it is something else.
Lastly (and most disturbingly), I moved the FSMO roles from dc3 to dc4 last night (to see if the load on dc3 was related to owning those roles) and had huge instability this morning. All dcs looked OK last night (I did restart samba on dc3 when the system was experiencing low memory), but I cam in this morning and found that dc4 and dc5 had such high loads that clients that were communicating with those DCs were unable to log in. Our monitoring system saw huge CPU loads as of this morning and a memory instability (jumping up and down) since just after the FSMO role transfer last night. Are there known issues with transferring FSMO roles that might explain the instability? Is it best practice to restart samba after doing a FSMO transfer just in case?
I know this is a wide range of issues, but I appreciate any input on any of them.
More information about the samba