avoiding stat() races (Was: RE: Samba login)

Cole, Timothy D. timothy_d_cole at md.northgrum.com
Fri Nov 10 16:54:00 GMT 2000

> -----Original Message-----
> From:	Kenichi Okuyama [SMTP:okuyamak at dd.iij4u.or.jp]
> Sent:	Thursday, November 09, 2000 21:54
> To:	samba-technical at samba.org
> Subject:	Re: avoiding stat() races (Was: RE: Samba login)
> Dear Timothy,
> >>>>> "CTD" == Cole, Timothy D <timothy_d_cole at md.northgrum.com> writes:
> CTD> 	The essential problem here is that the verification of the statcache
> CTD> entry and the intended action really ought to be atomic (as you point
> out)
> CTD> -- but to do that, the statcache is going to need to "know" what
> needs to be
> CTD> done with the file once it is found.
> Why try to keep interface so less?
> "statcache" is really some sort of  OBJECT. It have data of their
> own, and since so, have multiple way of accessing them. What you're
> trying to do is give parameter to the Object so that interface will
> call different method. That's what you're really doing.
	This is a good point; I was thinking of the problem inside-out.

> Make
> "StatCache_open()"
> "StatCache_open_caseinsensitive()"
> "StatCache_stat()"
> "StatCache_stat_caseinsensitive()"
> "StatCache_close()"
> etc.
> and let each parameter be same as that of dos_open etc. that we're
> using now ( well, you can add one extra parameter, "pointer to
> statcache" as means of object, if you wish ).
	Okay, so something to the effect of:

	 scache *scache_new(int flags);
	 int scache_open(scache *cache, const char *path, int flags, int
	 int scache_openi(scache *cache, const char *path, int flags, int
	 int scache_stat(scache *cache, const char *path, SMB_STAT_STRUCT
	 int scache_stati(scache *cache, const char *path, SMB_STAT_STRUCT
	 void scache_close(int fd);
	 void scache_destroy(scache *cache);

> Let "StatCache" take care of all the validness. Let him only return
> the valid information( like valid file descriptor, valid stat
> information, etc ). StatCache can now do lazy closing, sharing
> information among processes, etc, without effecting what's outside.
> CTD> 	Incidentally, regarding the need you indicated for increased
> CTD> resolution timestamps -- increasing timestamp resolution would only
> serve to
> CTD> "shrink" the window wherein the stat information can be erroneously
> CTD> identified as still valid.
> You should rather say, current timestamp only serve to give you
> information of "INVALIDNESS", like hash function.
	I ... think that's what I said, isn't it?  Mmm.. wait, we're looking
at 'valid' from different directions.  Maybe 'not stale' would have been
better than 'valid' in this case.

> CTD> 	Since this would be mucking about with the kernel and filesystem
> CTD> layout anyway, I think an e.g. 32-bit "generation count" (not in the
> CTD> sense) on the inode, incremented with every modification would be a
> CTD> preferable (although still not ideal) solution.
> Well, what I ment as pico-sec is same thing. If you have accuracy of
> pico-second, and if access to file is being serialized somewhere,
> and as long as we do not have Peta-Hz order accessable HDD, we'll not
> have same valud for any accession, at least, for changing.
	Eh, I still like the idea of having a generation count on files,
really.  Even with picosecond timestamps and adequate serialization, the
method used to generate the timestamps (particularly in the picosecond
realm) may not be quite that accurate _or_ necessarily properly synchronized
-- consider weird situatinos with multiple.

	It is academic to some extent, though, since once you get down to
picoseconds the window for races is small enough that they become impossibly

> What I belieave is, that we should have 256bits for timestamp.  128
> for describing over dot seconds, 128bit for under dot second.  If
> system time does not have accuracy of 128bits, like ... 30 bits for
> example ... use 128-30=98bits for reference counter within that time
> accuracy.
	This is reasonable.

	I still think keeping a separate 'reference counter' regardless of
the availible time precision would be preferable.  It's a nice hedge against
access times passing timestamp resolution (or more important, accuracy),
which _will_ keep happening.

> #32bits was not enough for over dot seconds. nano-sec is within our
> # hand. So, we need at least 64 bits for over dots, and 64bits for 
> # under dots, this is minimum. Biggest problem is that,
> # accuracy of time is increasing in 10bits every 15 year or so.
> # ( not as accurate as moore's law though )
> # So, if we are to face that fact, 128bit as total for time
> # treatment is not enough.
	Well, that's the thing, though.  The a 'generation count' would be
less sensitive to increases in timekeeping precision -- a 32 bit generation
count is probably enough for the century, at least.

	Even in a fast, fairly heavily used system, it seems to me that 2^32
operations on a given file would take a considerable amount of time;
certainly longer than whatever the unit of timestamp resolution is.

