[clug] DGSH - directed graph shell. adding parallelism to shell & pipes
Luke Mewburn
lukem-clug at mewburn.net
Tue Jul 25 03:41:05 UTC 2017
On Fri, Jul 14, 2017 at 01:58:33PM +1000, Brenton Ross via linux wrote:
| I've had a preliminary look at dgsh, and I'm not overly taken with the
| approach they took.
| They have replaced the normal Unix pipe interface for stdin and stdout
| with sockets, which means that the core utilities (and anything else you
| want to use via pipes) has to be the modified version for dgsh.
Does the "dgsh-wrap" tool they provide assist with interfacing with
existing stdin/stdout tools?
https://www.spinellis.gr/sw/dgsh/dgsh-wrap.html
Could you just (ab)use socat to interface between stdin/stdout and
the dgsh sockets? I've used that technique elsewhere; socat is awesome,
(if complex to use):
http://www.dest-unreach.org/socat/
| However, it got me wondering if there was another way, one that did not
| require modifying the programs.
|
| I think I could add a couple of extensions to VICI that would cover a
| lot of dgsh's capabilities, and have some further advantages.
|
| The first change would be to introduce named streams - the data flows
| could be given a label. If a program connected to a named stream used
| the name as a filename parameter, then VICI would substitute the label
| with the path to a Unix named pipe. This would allow programs to connect
| to multiple pipes. Of course it would not help for the cases where dgsh
| has modified the actual interface to the program, such as grep having
| multiple inputs and outputs, but you could create a modified grep with
| that capability that would still be compatible with bash etc.
If your platform provides /dev/fd/* (which Linux does), creative
use of shell redirection to fds in the invocation of the command,
and providing /dev/fd/.. as filenames may just work.
(This can fail when tools assume that a file is seekable.)
| The second change is to introduce what I call a "manifold". This object
| can have any number of stdin and stdout streams. It would have several
| modes of operation:
|
| 1. Sequential, where it reads from its first stream until its
| exhausted (closed), then reads from the second until that is
| finished, etc
| 2. Merge, where any input is sent immediately to the output (line
| by line)
| 3. Parallel, where reading blocks until something is ready on all
| the input streams. This would help to synchronise processing.
| 4. Copy, where each input is sent to all the output streams
| 5. Distribute, where the input lines are sent to the output streams
| in round-robin fashion.
|
| The manifold would start a new thread for each of its output streams to
| achieve the multiprocessing capability of dgsh.
|
| Hence, I think it would have been possible to create dgsh without having
| to fork the core utility programs to create an new set of incompatible
| programs.
That manifold idea is interesting.
As an implementation detail, personally I would probably experiment /
prototype that tool in python using an async I/O mechanism and some
generator trickery, rather than using a thread per stream.
(Or just write it in C++ and play with boost::asio; only using
threads as a thread pool behind the boost::asio io_service runner.
I digress :)
That's just a personal choice - YMMV.
cheers,
Luke.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/linux/attachments/20170725/af1ee86d/attachment.sig>
More information about the linux
mailing list