[clug] Dependency resolution for non-source code

Tue Feb 8 02:29:37 MST 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/07/2011 07:47 AM, Brad Hards wrote:
> G'day Paul,
> 
> On Sunday, February 06, 2011 10:24:10 pm Paul Wayper wrote:
>> Alternately, I can write a program to work out the commands to run and even
>> provide the order, but I haven't yet worked through the dependency
>> resolution and the whole 'determining which files need to be updated'
>> process.
> Sounds like this is the root cause of the problem. If you don't know this, you 
> can't really express it, and if you can't express it, then the tools can't 
> possibly infer it.

Sorry, I was perhaps a bit inaccurate there:

I know how the dependencies work - to munge some Makefile syntax:

"%{num} - %{longname}.ogg": "%{num} - %{longname}.wav"
	oggenc -b 160 ${ogg_options} \
	 "%{num} - %{longname}.wav" -o "%{num} - %{longname}.ogg"

"%{num} - %{longname}.mp3": "%{num} - %{longname}.wav"
	lame --abr 160 ${mp3_options} \
	 "%{num} - %{longname}.wav" "%{num} - %{longname}.mp3"

%{num}-ogg.torrent: "%{num} - %{longname}.ogg"
	mktorrent "%{num} - %{longname}.ogg" -o %{num}-mp3.torrent

%{num}-mp3.torrent: "%{num} - %{longname}.mp3"
	mktorrent "%{num} - %{longname}.mp3" -o %{num}-mp3.torrent

The complication is that things like ${ogg_options} and ${mp3_options} get
similar information (artist name, album name, track name, etc.); some of this
is standard throughout, some of this is derived from the long name, and some
of it needs to be sourced from a separate config file.

What I'm trying to figure out, in my own program, is how to start the process
of building the mp3.torrent after the mp3 file has been built, and parallelise
the build process so as many processes can be run in parallel as possible.  I
started looking at Algorithm::Dependency, which seemed to do the work of
working out what needed to be done in what order, but it didn't seem to do
more than statically work out a dependency graph based on a predefined and
precalculated set of relationships - which I'm fairly sure I can work out on
my own.

What I'd figured out so far was to start with a list of files that are
present.  For each one, create an object (i.e. just a dict, really) which
remembers the metadata about that file.  Then work out what targets it can be
used to generate and start each generation process in turn (serially at first,
parallelism later), which put their target back into the sources list.  We
alternate between finding new rules to apply to sources, and turning sources
into targets, until no rules apply to any of the current 'sources' (i.e. all
'sources' are actually end products).

Each target rule has an associated subroutine that turns that source object
into a target object, passing on whatever metadata it needs in the process.
Each target sub does the test of whether or not its target needs to be
regenerated internally, using a helper that knows the last generated checksum
for the source - if that differs from the current checksum, the target needs
to be regenerated.  (The rules are hard coded subroutines to start with.)

But I figured this had to be a process so exercised already that I could use
it rather than hack something up on my own.  This may seem to be ignoring
Tridgell's Maxim ("why do you need an excuse to reinvent the wheel?"), but
here I am writing source code because nothing I've found so far does what I want.

Have fun,

Paul
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk1RDPAACgkQu7W0U8VsXYJGUACeNu2/ELfce2f/VQ2RMmAKLaEN
DdoAnjXXgbIXHsZpNbV33dfDFFYcNyDW
=OhVD
-----END PGP SIGNATURE-----