removing stale replication targets

Matthieu Patou mat at samba.org
Wed Feb 1 02:39:05 MST 2012


Hi Tridge,

On 31/01/2012 22:31, Matthieu Patou wrote:
> On 31/01/2012 21:21, Andrew Tridgell wrote:
>> Hi Matthieu,
>>
>> I've just reverted your patch 5bfd6251. It was causing regular
>> segmentation faults in the build farm. As I said in my revert message, I
>> wasn't able to reproduce the segfault, but it is pretty clear that this
>> patch was the cause, given the backtraces and the complexity of the list
>> handling. What I think was happening was a DSA that had an in-progress
>> operation was being freed by the changes in this patch.
> I'm not sure that this could happen if it is then it's unwanted 
> because with this patch we should purge only the connection not purged 
> anymore.
> In order to free a DSA object that we still use we check
>         /* re-use an existing source if found */
>         for (s2=*oldlist; s2; s2=s2->next) {
>                 if (GUID_compare(&s2->repsFrom1->source_dsa_obj_guid,
> &source->repsFrom1->source_dsa_obj_guid) == 0) {
>                         talloc_free(s2->repsFrom1->other_info);
>                         *s2->repsFrom1 = *source->repsFrom1;
>                         talloc_steal(s2, s2->repsFrom1->other_info);
>                         talloc_free(source);
>                         source = s2;
>                         DLIST_REMOVE(*oldlist, s2);
>                         break;
>                 }
>         }
>
> In a "normal" mode all elements from the oldlist will be removed and 
> transfered to the current list.
>
>
>> I'm also curious why the change was needed. The
>> dreplsrv_refresh_partitions() code should already be coping with changes
>> made by the KCC, including removals.
> The KCC takes care of changing the repsTo and the repsFrom that for 
> sure, but  dreplsrv_refresh_partition() for the moment will only cope 
> if you add entries.
> Before this patch there wasn't any code that would actually remove a 
> source if it isn't anymore in repsFrom.
> If you restart samba then it will be ok but if not (and in production 
> you don't restart it every 2 minutes) you'll see errors messages 
> saying that you can't replicate with a DSA. This DSA is the one that 
> has been removed.
>
> A way to reproduce it is to join a Windows DC to a Samba Domain then 
> remove it and then see error message appearing in the logs.
>> Can you explain what situation this
>> was trying to cope with? If the patch is needed then we'll need to work
>> out how to make it work without the segfaults :-)
Found with the help of valgrind the possible cause of problem, it's that
         if (oldsources) {
                 src = oldsources;
                 while(src) {
                         struct dreplsrv_partition_source_dsa *tmp = 
src->next;
                         talloc_free(src);
                         src = tmp;
                 }
         }

Frees a source but for some reasons this source is still used in 
dreplsrv_notify_op_callback
with op->source_dsa->notify_uSN = op->uSN;

I don't understand yet why this is occuring but obviously not freeing 
the source in dreplsrv_refresh_partition() should make it but I have the 
impression that it will leak then source structures.

Any remarks ?


-- 
Matthieu Patou
Samba Team
http://samba.org



More information about the samba-technical mailing list