[clug] DGSH - directed graph shell. adding parallelism to shell & pipes

steve jenkin sjenkin at canb.auug.org.au
Tue Jul 11 23:35:41 UTC 2017

This is an interesting take on a 25+ year-old idea of ‘Multipipes’ in the Unix shell. Much more than the ‘parallel’ command or managing a bunch of named pipes.
This one is based on ‘bash’ with another 12 or so commands modified to read & write to multiple pipes.

One that appeals to me is ‘grep’. It takes 0-2 input streams and writes to 0-4 streams.
	Available output streams (via arguments): matching files, non-matching files, matching lines, and non-matching lines

The paper uses the same examples & diagrams as the website, but has much more discussion, a good history of the topic and 46 references.

The design & examples are about a very Unix-y thing: streaming data and processing it just once. Not have to save intermediate files and reprocess them multiple times.
In a world of many cores and ‘Big Data’, being able to ‘naturally’ process data streams in parallel is an important new facility.
It’s even useful at the other end of the spectrum where I/O bandwidth & storage space is limited. On low-power, low-performance “IoT” devices like Single Board Computers and low-end smartphones.
Will we see a version built for ‘busybox’? It’s possible because of the design’s “coupling and cohesion” choices.

They’ve thought about the design and implementation - limiting it to a limited syntax change to the (bash) shell.
Not sure how well tested & debugged it is, but because of the design you’d think there wouldn’t be many.



dgsh — directed graph shell
> The directed graph shell, dgsh (pronounced /dæɡʃ/ — dagsh), provides an expressive way to construct sophisticated and efficient big data set and stream processing pipelines using existing Unix tools as well as custom-built components.
> It is a Unix-style shell (based on bash) allowing the specification of pipelines with non-linear non-uniform operations.
> These form a directed acyclic process graph, which is typically executed by multiple processor cores, thus increasing the operation's processing throughput.
> If you want to get a feeling on how dgsh works in practice, skip right down to the examples section.
> For a more formal introduction to dgsh or to cite it in your work, see:
> Diomidis Spinellis and Marios Fragkoulis. Extending Unix Pipelines to DAGs. IEEE Transactions on Computers, 2017. doi: 10.1109/TC.2017.2695447

Nuclear magnetic resonance processing - 12-stage pipeline run in parallel

Extending Unix Pipelines to DAGs
	Diomidis Spinellis, Senior Member, IEEE
	Marios Fragkoulis
> Abstract—The Unix shell dgsh provides an expressive way to construct sophisticated and efficient non-linear pipelines using standard Unix tools, as well as third-party and custom-built components. 
> Dgsh allows the specification of pipelines that perform non-uniform non-linear processing. 
> These form a directed acyclic process graph, which is typically executed by multiple processor cores, thus increasing the processing task’s throughput. 
> A number of existing Unix tools have been adapted to take advantage of the shell’s multiple pipe input/output capabilities. 
> The shell supports visualization of the process graphs, which can also aid debugging. 
> Dgsh was evaluated through a number of common data processing and domain-specific examples, and was found to offer an expressive way to specify processing topologies, while also generally increasing processing throughput.
> Index Terms—Process-level parallelism, Unix, pipeline, pipes and filters architecture

Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin

More information about the linux mailing list