[PATCH] Fix CTDB protocol marshalling

Amitay Isaacs amitay at gmail.com
Mon Aug 28 08:10:02 UTC 2017


Hi,

The CTDB protocol (client-server over unix domain socket & server-server
over TCP socket) has been structs on wire.  This has few drawbacks:

- The sizes of the structures vary between 32-bit and 64-bit
- The protocol is not endian-neutral

Both are not such a big issues nowadays with majority of platforms are
little-endian 64-bit.  Still there is still a concern over sending
uninitialized (padding) bytes over wire.

The new CTDB code (mostly client and little bit of server code) uses
protocol abstraction layer (ctdb/protocol/*.[ch]).  This gives an
opportunity to fix (and improve) protocol marshalling without affecting the
rest of the code.  The protocol marshalling code so far has been writing
structs on wire.  It had following API for ctdb data types:

  size_t TYPE_len(TYPE *in);
  void TYPE_push(TYPE *in, uint8_t *buf);
  int TYPE_pull(uint8_t *buf, size_t buflen, TALLOC_CTX *mem_ctx, TYPE
**out);

With the fixes to the protocol marshalling, structures are not directly
copied to wire (with the exception of system defined structures).  Instead
every structure is split into it's basic types and only the basic types are
copied.  This introduces new API for basic types:

  size_t TYPE_len(TYPE *in);
  void TYPE_push(TYPE *in, uint8_t *buf, size_t *npush);
  int TYPE_pull(uint8_t *buf, size_t buflen, TYPE *out, size_t *npull);

And new API for derived types:

  size_t TYPE_len(TYPE *in);
  void TYPE_push(TYPE *in, uint8_t *buf, size_t *npush);
  int TYPE_pull(uint8_t *buf, size_t buflen, TALLOC_CTX *mem_ctx, TYPE
**out, size_t *npull);

To handle the padding, a dummy padding type is introduced with the similar
api as of basic types.

  size_t padding_len(int padding);
  void padding_push(int padding, uint8_t *buf, size_t *npush);
  int padding_pull(uint8_t *buf, size_t buflen, int padding, size_t *npull);

To ensure that the above changes do not introduce regressions in
marshalling, the old marshalling code is copied in test code and
compatibility tests are added.  All the tests are driven by templates to
ensure conformity of API.  There are no changes to the top-level API for
ctdb protocol elements.

This is a large change comprising of 100 patches and has following changes:

  - marshalling basic data types (e.g. uint32_t)  (10)
  - marshalling system defined data types (e.g. struct timeval)  (2)
  - marshalling ctdb defined data types (e.g. ctdb_connection)  (48)
  - marshalling ctdb protocol elements (e.g. ctdb_req_control)  (14)
  - marshalling eventd protocol elements (e.g. event_request)  (14)
  - Changes in test code  (8)

Other changes include fixing marshalling for control GET_DB_SEQNUM and
minor fixes.

I have tested the protocol compatibility on following platforms:

  - Fedora26 on x86_64 (little-endian, 64-bit)
  - Centos6 on i686 (little-endian, 32-bit)
  - RHEL6 on ppc64 (big-endian, 64-bit)
  - Freebsd on amd64 (little-endian, 64-bit)

This is in preparation to splitting the main daemon into multiple daemons
as outlined in SambaXP 2017.  Once the old implementation of protocol in
the ctdb daemon goes away, many of the idiosyncrasies in the protocol can
be fixed and padding can be dropped.  This will be a backward incompatible
change.  Hopefully then we will be able to drop the hand-written code and
auto-generate it.

Martin has already reviewed the current version (and various previous
iterations).  I would like to hear any comments/suggestions that others may
have.

The changes are available in my tree ctdb-protocol2 branch.


https://git.samba.org/?p=amitay/samba.git;a=shortlog;h=refs/heads/ctdb-protocol2

Amitay.


More information about the samba-technical mailing list