[clug] July Programming SIG: the fourteen principles for new programmers

Paul Wayper paulway at mabula.net
Wed Jul 16 09:20:45 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David wrote:
| Paul Wayper wrote:
|> 2) If you do have to write from scratch, try to think of the large cases.
|> The more you build in limitations, the more you will inevitably have to
|> remove them later.  Your choices in the future will be harder than they
|> are now.
|
| If you try and think of all the limitations you end up with a floating
| car.  It drives everywhere but it's a crap boat and a crap car.

I was trying to avoid the "plan everything out in advance" process that has
doomed so many projects.  As I said in 10, you learn more by writing than by
theorising.  I guess a better way to say what I wanted, at the risk of
revisionism, is:

2) Think intelligently a little when starting each module about what its
abilities and scope are.  Don't build in limitations (e.g. constants, fixed
behaviour or dependence on external configuration) if it doesn't take much
more work to be flexible.  This doesn't mean "assume that we might not have
Perl in the standard location", it means "don't assume that you'll always get
data from that file in exactly those positions".

For instance, I'm writing a program that takes two directories and shuffles
the files around between them, e.g. so I have a different set of music on my
music player each time I take it away.  It's not difficult to design that to
deal with multiple source directories, so I'm doing that, even though that's
not in my current configuration.  Likewise, I could assume that it will be
able to do an 'export' and an 'import' simultaneously since the source file
share is on a network, but what if I change?  I shouldn't build in bad
performance when it takes little extra effort to write a 'synchronous copies'
function.

OTOH, we've all seen what happens when people hard wire configuration into
programs, or hard wire which configuration file it reads, and so forth.  One
program I worked with recently assumed that if you fork()ed off a process in C
then the child could use the parent's database handle without affecting other
children, something that MySQL (in this configuration) seems to not like at
all.  Some of those things are easy to update; some of those require lots of
extra work in substantial rewrites.  By thinking ahead a little, you may save
yourself lots of problems later on.

This doesn't mean that you think ahead a lot, or get carried away with scope
early on.  When we've been talking about the LUG-in-a-box concept, I'm keeping
in the back of my mind a project that I've had in mind for a while for a
general club web system.  If there are chances to slot in ideas that could be
used for that system (and thus CLUG and the LUG-in-a-box programmers could
contribute to the wider community) then I do so.  But letting the meeting
management system get bogged down in whether we should support payment
systems, and if so what type, and which provider we might use in that case,
and how to store the financial data securely, leads nowhere and writes no
code.  In that, I think we both agree.

|> 3) Write things to be modular.  Make them behave well.  Be liberal in
|> what you accept, and strict in what you output.  Use standards where
|> possible - after all, there are so many to choose from.
|
| I really don't like being liberal in what you accept.  It makes your
| code complex and hideous.  It also leads to unpredictable behaviour, if
| your program takes a date string you have terrible issues.  What's
| 08/05/03?  year/month/day, day/month/year, month/day/year?  And if you
| have two modules written by different people who are liberal in what
| they accept, are they going to parse it in the same way?

Your date example isn't liberal, it's pathological and ends up being an
arbitrary decision.  Whereas accepting 08/05/03, 8/5/03 and 8/5/2003 as
day/month/year strings is a better example of being liberal in what you
accept.  Good code would probably throw a warning or exception at that point,
depending on how severe the consequences of getting the date wrong are.  I
just had eleven million rows of completely useless data go into a database
because Perl's DBI and MySQL didn't catch the fact that I was sending a date
string as an integer and vice versa.  So I deleted it - the data is still
waiting to be imported correctly and I haven't lost anything.

It's a true pain in the arse for users to be told "Please format your phone
number correctly" when there's absolutely no instruction on which way to
format it, and when 6555 1234 is a perfectly valid way to write a phone
number.  If the database absolutely cannot have a phone number with spaces in
it, and there are spaces in what you've got supplied, then at the very least
the user should be told 'please remove spaces'.  Or, if the programmer could
be reasonably certain that removing spaces from the phone number didn't
actually change its meaning, then just remove the offending things for the user.

Maybe my point here is a bit of a reiteration of #2 and #8 - a bit of thought
about what you _might_ get in the future may save you a lot of heartache in
debugging (and a lot of money in bad things going wrong).

I also realise that the 'be liberal in what you accept' has caused lots of
problems in the HTML rendering domain.  The cycle of browser writers trying to
cope with nearly-but-not-quite-valid syntax and web page authors not noticing
that they're writing nearly-but-not-quite-valid syntax is a feedback-loss loop
we should try to avoid.  The key there is that the browser authors have no way
to take a big stick to the authors of the dodgy HTML; we as programmers
probably have a lot more influence over the people that produce the data our
programs consume.  When you lose the ability to give useful feedback to
prevent the error in the future, one should go back to being stricter (in my
opinion) and let your users do your complaining for you.

|> 10) Program the first one to be thrown away, because you will anyway.
|> This works with 2 because you will learn more about what your program
|> needs to do by writing it than in theorising, and you want to produce
|> code not theorise.  You may find it easier to not tell your
|> management about the first one in case they want to keep it.
|
| I think this concept doesn't work once you start to scale.  If it's
| something that takes two days and has one person working on it then this
| might be a good idea.  For a product that takes several years to develop
| it's completely insane.

I think Martijn's point here is brilliant - this need not be the entire
program to be thrown away, but only small parts of it in an evolutionary
process.  The Linux kernel gets new interfaces and loses old ones over time,
and what we see is increased performance and programming hours, not throwing
the entire one away and starting again.  Joel Spolsky also points out that
keeping old code often keeps the little clevernesses and trap-avoidances that
one can easily forget to program in when writing a new one.  So throwing out
an entire application should only be a last resort.

The sense I feel we were trying to get to in our conversation at the PSIG was
that we often ended up doing good programming because we'd made, and learned
from, mistakes in the past.  As we said, learning from the mistakes of others
is better than learning from your own.  You and I both probably do things like
use sensible variable and function names, format our code to regular tab and
bracketing standards, and extract repeated code into functions unconsciously
now.  These are all things that we could (and perhaps should) have touched on
too.  I guess the feeling I got from the meeting was that these were more
general bits of advice to complement some of the more at-the-coal-face lessons
about actual coding that should also be taught.  (This, I freely admit, is one
of the goals of the LUG-in-a-box project).

Anyway, good points, David and Martijn!

Have fun,

Paul
|
| Prototyping is one thing, creating a prototype isn't the same as
| creating the end program.
|
| Refactoring code after isn't been worked on for a while is also a good
| idea, but it's not throwing away the earlier work and shouldn't be
| presented as such.
|
|
| David
|

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkh9vWwACgkQu7W0U8VsXYKJlQCfS+gE/LBbzsp2WG+cheRI7Jzh
2OMAniT5lrgT9hg6TYccAxc+bz8FPdtb
=MT7U
-----END PGP SIGNATURE-----


More information about the linux mailing list