[clug] DGSH - directed graph shell. adding parallelism to shell & pipes

Luke Mewburn lukem-clug at mewburn.net
Tue Jul 25 03:41:05 UTC 2017

On Fri, Jul 14, 2017 at 01:58:33PM +1000, Brenton Ross via linux wrote:
  | I've had a preliminary look at dgsh, and I'm not overly taken with the
  | approach they took.
  | They have replaced the normal Unix pipe interface for stdin and stdout
  | with sockets, which means that the core utilities (and anything else you
  | want to use via pipes) has to be the modified version for dgsh.

Does the "dgsh-wrap" tool they provide assist with interfacing with
existing stdin/stdout tools?

Could you just (ab)use socat to interface between stdin/stdout and
the dgsh sockets? I've used that technique elsewhere; socat is awesome,
(if complex to use):

  | However, it got me wondering if there was another way, one that did not
  | require modifying the programs.
  | I think I could add a couple of extensions to VICI that would cover a
  | lot of dgsh's capabilities, and have some further advantages.
  | The first change would be to introduce named streams - the data flows
  | could be given a label. If a program connected to a named stream used
  | the name as a filename parameter, then VICI would substitute the label
  | with the path to a Unix named pipe. This would allow programs to connect
  | to multiple pipes. Of course it would not help for the cases where dgsh
  | has modified the actual interface to the program, such as grep having
  | multiple inputs and outputs, but you could create a modified grep with
  | that capability that would still be compatible with bash etc.

If your platform provides /dev/fd/* (which Linux does), creative
use of shell redirection to fds in the invocation of the command,
and providing /dev/fd/.. as filenames may just work.
(This can fail when tools assume that a file is seekable.)

  | The second change is to introduce what I call a "manifold". This object
  | can have any number of stdin and stdout streams. It would have several
  | modes of operation:
  |      1. Sequential, where it reads from its first stream until its
  |         exhausted (closed), then reads from the second until that is
  |         finished, etc
  |      2. Merge, where any input is sent immediately to the output (line
  |         by line)
  |      3. Parallel, where reading blocks until something is ready on all
  |         the input streams. This would help to synchronise processing.
  |      4. Copy, where each input is sent to all the output streams
  |      5. Distribute, where the input lines are sent to the output streams
  |         in round-robin fashion.
  | The manifold would start a new thread for each of its output streams to
  | achieve the multiprocessing capability of dgsh.
  | Hence, I think it would have been possible to create dgsh without having
  | to fork the core utility programs to create an new set of incompatible
  | programs.

That manifold idea is interesting.

As an implementation detail, personally I would probably experiment /
prototype that tool in python using an async I/O mechanism and some
generator trickery, rather than using a thread per stream. 

(Or just write it in C++ and play with boost::asio; only using
threads as a thread pool behind the boost::asio io_service runner.
I digress :)

That's just a personal choice - YMMV.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/linux/attachments/20170725/af1ee86d/attachment.sig>

More information about the linux mailing list