avoiding stat() races (Was: RE: Samba login)

Fri Nov 10 16:54:00 GMT 2000

> -----Original Message-----
> From:	Kenichi Okuyama [SMTP:okuyamak at dd.iij4u.or.jp]
> Sent:	Thursday, November 09, 2000 21:54
> To:	samba-technical at samba.org
> Subject:	Re: avoiding stat() races (Was: RE: Samba login)
> 
> Dear Timothy,
> 
> >>>>> "CTD" == Cole, Timothy D <timothy_d_cole at md.northgrum.com> writes:
> CTD> 	The essential problem here is that the verification of the statcache
> CTD> entry and the intended action really ought to be atomic (as you point
> out)
> CTD> -- but to do that, the statcache is going to need to "know" what
> needs to be
> CTD> done with the file once it is found.
> 
> Why try to keep interface so less?
> 
> "statcache" is really some sort of  OBJECT. It have data of their
> own, and since so, have multiple way of accessing them. What you're
> trying to do is give parameter to the Object so that interface will
> call different method. That's what you're really doing.
> 
	This is a good point; I was thinking of the problem inside-out.

> Make
> 
> "StatCache_open()"
> "StatCache_open_caseinsensitive()"
> "StatCache_stat()"
> "StatCache_stat_caseinsensitive()"
> "StatCache_close()"
> etc.
> 
> and let each parameter be same as that of dos_open etc. that we're
> using now ( well, you can add one extra parameter, "pointer to
> statcache" as means of object, if you wish ).
> 
	Okay, so something to the effect of:

	 scache *scache_new(int flags);
	 int scache_open(scache *cache, const char *path, int flags, int
mode);
	 int scache_openi(scache *cache, const char *path, int flags, int
mode);
	 int scache_stat(scache *cache, const char *path, SMB_STAT_STRUCT
*st_buf);
	 int scache_stati(scache *cache, const char *path, SMB_STAT_STRUCT
*st_buf);
	 void scache_close(int fd);
	 void scache_destroy(scache *cache);

> Let "StatCache" take care of all the validness. Let him only return
> the valid information( like valid file descriptor, valid stat
> information, etc ). StatCache can now do lazy closing, sharing
> information among processes, etc, without effecting what's outside.
> 
> CTD> 	Incidentally, regarding the need you indicated for increased
> CTD> resolution timestamps -- increasing timestamp resolution would only
> serve to
> CTD> "shrink" the window wherein the stat information can be erroneously
> CTD> identified as still valid.
> 
> You should rather say, current timestamp only serve to give you
> information of "INVALIDNESS", like hash function.
> 
	I ... think that's what I said, isn't it?  Mmm.. wait, we're looking
at 'valid' from different directions.  Maybe 'not stale' would have been
better than 'valid' in this case.

> CTD> 	Since this would be mucking about with the kernel and filesystem
> CTD> layout anyway, I think an e.g. 32-bit "generation count" (not in the
> NFS
> CTD> sense) on the inode, incremented with every modification would be a
> CTD> preferable (although still not ideal) solution.
> 
> Well, what I ment as pico-sec is same thing. If you have accuracy of
> pico-second, and if access to file is being serialized somewhere,
> and as long as we do not have Peta-Hz order accessable HDD, we'll not
> have same valud for any accession, at least, for changing.
> 
	Eh, I still like the idea of having a generation count on files,
really.  Even with picosecond timestamps and adequate serialization, the
method used to generate the timestamps (particularly in the picosecond
realm) may not be quite that accurate _or_ necessarily properly synchronized
-- consider weird situatinos with multiple.

	It is academic to some extent, though, since once you get down to
picoseconds the window for races is small enough that they become impossibly
unlikely.

> What I belieave is, that we should have 256bits for timestamp.  128
> for describing over dot seconds, 128bit for under dot second.  If
> system time does not have accuracy of 128bits, like ... 30 bits for
> example ... use 128-30=98bits for reference counter within that time
> accuracy.
> 
	This is reasonable.

	I still think keeping a separate 'reference counter' regardless of
the availible time precision would be preferable.  It's a nice hedge against
access times passing timestamp resolution (or more important, accuracy),
which _will_ keep happening.

> #32bits was not enough for over dot seconds. nano-sec is within our
> # hand. So, we need at least 64 bits for over dots, and 64bits for 
> # under dots, this is minimum. Biggest problem is that,
> # accuracy of time is increasing in 10bits every 15 year or so.
> # ( not as accurate as moore's law though )
> # So, if we are to face that fact, 128bit as total for time
> # treatment is not enough.
> 
	Well, that's the thing, though.  The a 'generation count' would be
less sensitive to increases in timekeeping precision -- a 32 bit generation
count is probably enough for the century, at least.

	Even in a fast, fairly heavily used system, it seems to me that 2^32
operations on a given file would take a considerable amount of time;
certainly longer than whatever the unit of timestamp resolution is.