[clug] DGSH - directed graph shell. adding parallelism to shell & pipes

Brenton Ross rossb at fwi.net.au
Wed Jul 12 11:44:41 UTC 2017


Steve,

Thanks for posting this. 
I have been contemplating adding something similar to VICI. 
I will have to read up on this to see how they manage the interaction
between the data flow and the flow of control. 

Cheers
Brenton

On Wed, 2017-07-12 at 09:35 +1000, steve jenkin via linux wrote:

> This is an interesting take on a 25+ year-old idea of ‘Multipipes’ in the Unix shell. Much more than the ‘parallel’ command or managing a bunch of named pipes.
> This one is based on ‘bash’ with another 12 or so commands modified to read & write to multiple pipes.
> 
> One that appeals to me is ‘grep’. It takes 0-2 input streams and writes to 0-4 streams.
> 	Available output streams (via arguments): matching files, non-matching files, matching lines, and non-matching lines
> 
> The paper uses the same examples & diagrams as the website, but has much more discussion, a good history of the topic and 46 references.
> 
> The design & examples are about a very Unix-y thing: streaming data and processing it just once. Not have to save intermediate files and reprocess them multiple times.
> In a world of many cores and ‘Big Data’, being able to ‘naturally’ process data streams in parallel is an important new facility.
> It’s even useful at the other end of the spectrum where I/O bandwidth & storage space is limited. On low-power, low-performance “IoT” devices like Single Board Computers and low-end smartphones.
> Will we see a version built for ‘busybox’? It’s possible because of the design’s “coupling and cohesion” choices.
> 
> They’ve thought about the design and implementation - limiting it to a limited syntax change to the (bash) shell.
> Not sure how well tested & debugged it is, but because of the design you’d think there wouldn’t be many.
> 
> regards
> steve
> 
> ———————————
> 
> dgsh — directed graph shell
> <https://www.spinellis.gr/sw/dgsh/#intro>
> > The directed graph shell, dgsh (pronounced /dæɡʃ/ — dagsh), provides an expressive way to construct sophisticated and efficient big data set and stream processing pipelines using existing Unix tools as well as custom-built components.
> > It is a Unix-style shell (based on bash) allowing the specification of pipelines with non-linear non-uniform operations.
> > These form a directed acyclic process graph, which is typically executed by multiple processor cores, thus increasing the operation's processing throughput.
> > 
> > If you want to get a feeling on how dgsh works in practice, skip right down to the examples section.
> > 
> > For a more formal introduction to dgsh or to cite it in your work, see:
> > Diomidis Spinellis and Marios Fragkoulis. Extending Unix Pipelines to DAGs. IEEE Transactions on Computers, 2017. doi: 10.1109/TC.2017.2695447
> 
> 
> Nuclear magnetic resonance processing - 12-stage pipeline run in parallel
> <https://www.spinellis.gr/sw/dgsh/#NMRPipe>
> 
> 
> Extending Unix Pipelines to DAGs
> 	Diomidis Spinellis, Senior Member, IEEE
> 	Marios Fragkoulis
> <http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7903579>
> > 
> > Abstract—The Unix shell dgsh provides an expressive way to construct sophisticated and efficient non-linear pipelines using standard Unix tools, as well as third-party and custom-built components. 
> > Dgsh allows the specification of pipelines that perform non-uniform non-linear processing. 
> > These form a directed acyclic process graph, which is typically executed by multiple processor cores, thus increasing the processing task’s throughput. 
> > A number of existing Unix tools have been adapted to take advantage of the shell’s multiple pipe input/output capabilities. 
> > The shell supports visualization of the process graphs, which can also aid debugging. 
> > Dgsh was evaluated through a number of common data processing and domain-specific examples, and was found to offer an expressive way to specify processing topologies, while also generally increasing processing throughput.
> > 
> > Index Terms—Process-level parallelism, Unix, pipeline, pipes and filters architecture
> 
> --
> Steve Jenkin, IT Systems and Design 
> 0412 786 915 (+61 412 786 915)
> PO Box 38, Kippax ACT 2615, AUSTRALIA
> 
> mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin
> 
> 




More information about the linux mailing list