[clug] Flash memory: Load-levelling question.

Fri Mar 27 15:42:04 GMT 2009

On Fri, Mar 27, 2009 at 10:36 AM, steve jenkin <sjenkin at canb.auug.org.au> wrote:
> With flash (NAND) memory, especially CF, 'load levelling' is critically
> important in creating a 'useful life'.
>
> But doesn't it make the assumption that the whole of the array is being
> written? Nothing I've read details the algorithms used, so I'm guessing
> how it works.
>
> When you've got FAT16 being used in a camera (write-upload-wipe), it's
> pretty obvious load-levelling will be a perfect solution because there
> is almost no static data 'blocking' writes to cells.
>
> But for an embedded device where:
>  - most of the system image is static
>  - the image mostly fills the CF
>
> Won't simple load-levelling be of little benefit in this case?
>
> Imagine you've got a 64Mb CF with 63Mb of static image.
> Simplistically, all the writes will be confined to 1Mb of free memory.
> i.e. just 1-in-64 blocks will be candidates for load-levelling writes.

It depends on the algorithm being used, but take this hypothetical
simplified hardware based filesystem independent algorithm for
example:

Disclaimer: as Daniel mentioned, the real hardware algorithms used
tend to be trade secrets and since I've never worked for one of these
companies I don't know how any of their algorithms actually work, this
is just how I imagine it may be done:

Randomly a certain percentage of the time when a block is erased swap
that block with another randomly selected block. Although you only
have 1mb free in the filesystem, the algorithm doesn't actually know
which blocks they are (as it's filesystem independent) so it just
randomly remaps the blocks throughout the filesystem some of the time
when the blocks are written. So, say that is only a single block being
written in the filesystem, say it maps to block 89 to begin with. It
goes through a number of writes, but then the algorithm randomly swaps
block 89 with block 36. Now block 36 is getting those writes instead,
then it does another swap and that block gets the writes instead, then
another, then another....

The net result is that the chip is completely wear levelled despite
your lack of free space. I tried to emphasise that the algorithm would
be filesystem independent because the implication is that this same
remapping will occur even if the figures are reversed and you have
only 1mb used and 63mb free since it doesn't know which blocks are
used by files and which are "free" as far as the filesystem is
concerned - as far as it knows they are *all* in use at all times and
it only knows the ones that are being written to for reference.

--warning: tangent ahead--

More sophisticated algorithms may keep track of which blocks are
written the most and which the least to enable them to always swap the
least written and the most written which should significantly increase
the lifespan. I really would only trust a filesystem dependent wear
levelling algorithm when it has been built into the filesytem itself
(such as JFFS2), but that can only be used when the kernel has direct
access to the MTD (Memory Technology Device, inducing various NAND and
NOR chips). And no, there's no point in using the block2mtd and
mtdchar* drivers to get that filesystem on there, that driver is for
testing and accessing MTD images only (eg, to retrieve files - but
please back up the image first and please triple check the erase size
you pass in matches that used by the chip the image is for/from), not
for putting a special purpose filesystem onto something that defeats
it's purpose by design.

*mtdblock is also provided to support the uninitiated who often get
confused by the lack of specifying /dev/ in the char bindings (mount
-t jffs2 mtd0 /mnt vs. mount -t jffs2 /dev/mtdblock0 /mnt), but my
understanding is that using mtdblock on a real MTD is a bad idea
(unless it's for / when there isn't a choice) because the char
bindings gives the filesystem control over useful things such as
erasing blocks...

Food for thought (the "oh wait, it's not quite that simple" section):
How does the hardware store it's block mapping and how is that being
wear levelled?
How does it store it's sector write tally if it has one and how that
being wear levelled?
How is the list of bad blocks stored (almost all NAND chips ship with
bad blocks these days as it is prohibitively expensive to create ones
that don't - this is why the exact amount of storage available on
newly purchased flash sticks tends to vary from stick to stick, and of
course it may also want to store a list of blocks that have worn out
since then so they can be remapped to spare blocks)? Does this list
need to be wear levelled as well or could some careful consideration
eliminate this need?

Obviously the answers to these are implementation specific, though
naturally there are some common techniques. A trawl through the
openmoko wiki amongst other resources may provide some insight.

Cheers,
-Ian

-- 
http://darkstarshout.blogspot.com/
--
On the day *I* go to work for Microsoft, faint oinking sounds will be
heard from far overhead, the moon will not merely turn blue but
develop polkadots, and hell will freeze over so solid the brimstone
will go superconductive.
     -- Eric S. Raymond, 2005
--
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html