[jcifs] jCIFS can't contact LMB when on same machine as LMB?

Sat Dec 11 07:33:36 GMT 2004

Hi there, I am using jCIFS (1.1.3 currently, although 1.1.4 exhibits the same
behavior), and appear to have found a problem with browsing when jCIFS is
running on the machine that is the Master Browser. When this is the situation,
network browsing using jCIFS is non functional. And this message is a bit long,
as I'm filling it chock full of details. :)

On my test network (or any other network I've tried where the LMB is the same
machine as the jCIFS client), I have two PCs, Machine A and Machine B, both in
the same workgroup. I am running examples/SmbShell to simply run "ls ATLAS/" (my
workgroup is ATLAS..but that's obvious, huh?). When I run this command on the
LMB, I get:

jcifs.smb.SmbException: smb://ATLAS/
java.net.UnknownHostException: ATLAS
	at jcifs.UniAddress.getByName(UniAddress.java:297)
	at jcifs.smb.SmbFile.getAddress(SmbFile.java:789)
(etc...)

If I run the same command on the non-LMB, I get the list of servers as expected.
A simple matrix:

LMB		jCIFS 	Result
Machine A	Machine A	Fail
Machine A	Machine B	Pass
Machine B	Machine A	Pass
Machine B	Machine B	Fail

(the LMB is changed by enabling/disabling the Browser service, and forcing an
election using browstat.exe from MS)

Using Ethereal, the packets are being sent to the broadcast addr as expected:

No.     Time                       Source                Destination          
Protocol Info
      1 2004-12-10 22:24:47.577640 192.168.1.22          255.255.255.255      
NBNS     Name query NB ATLAS<1d>

Frame 1 (92 bytes on wire, 92 bytes captured)
Ethernet II, Src: 00:30:bd:99:78:6e, Dst: ff:ff:ff:ff:ff:ff
Internet Protocol, Src Addr: 192.168.1.22 (192.168.1.22), Dst Addr:
255.255.255.255 (255.255.255.255)
User Datagram Protocol, Src Port: 4539 (4539), Dst Port: netbios-ns (137)
NetBIOS Name Service
    Transaction ID: 0x0009
    Flags: 0x0110 (Name query)
        0... .... .... .... = Response: Message is a query
        .000 0... .... .... = Opcode: Name query (0)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... ...1 .... = Broadcast: Broadcast packet
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0
    Queries
        ATLAS<1d>: type NB, class inet
            Name: ATLAS<1d> (Local Master Browser)
            Type: NB
            Class: inet

I see two packets set to ATLAS<1d>, and two to ATLAS<20> (I don't remember what
the <20> name means....)

When jCIFS is on a different machine as the LMB, I get the responses as
expected, but when the LMB is the same machine, jCIFS times out. THis is tested
by running the exact same jcifs.jar & SmbShell.class on both machines at the
same time - the LMB instance will not work, the non-LMB instance will work.

So, this lead me to believe that there's a problem with routing, and that the
packet is going over the windows equivalent of the loopback adapter, which I
have since found out doesn't exactly exist.

I downloaded TDIMon (http://www.sysinternals.com/ntw2k/freeware/tdimon.shtml),
and when the UDP query is sent, I see the following (columns edited for space,
but it's still way too wide for easy display):
Seq	Process		Request 				Local 			Remote 			Result 		Other 
535	java.exe:1584	TDI_SEND_DATAGRAM			UDP:0.0.0.0:4539		255.255.255.255:137
SUCCESS-537		Length:50 
536	System:4 		TDI_EVENT_RECEIVE_DATAGRAM 	UDP:192.168.1.22:137 
192.168.1.22:4539 	DATA_NOT_ACCEPTED	Bytes taken: 0 Flags: BROADCAST 
538	java.exe:1584 	TDI_SEND_DATAGRAM 		UDP:0.0.0.0:4539 		255.255.255.255:137 
SUCCESS-540 	Length:50 
539	System:4 		TDI_EVENT_RECEIVE_DATAGRAM 	UDP:192.168.1.22:137 1
92.168.1.22:4539 		DATA_NOT_ACCEPTED	Bytes taken: 50 Flags: BROADCAST 

So, "DATA_NOT_ACCEPTED". Uhm, ok....I have no idea why that might be the case,
as the same queries work with the LMB on another machine. Also, the "Result"
field is interesting for the SUCESSES - they contain -537 or -540, which
apparently means that the TDI_SEND_DATAGRAM finished after the
TDI_EVENT_RECEIVE_DATAGRAM. At least that's my guess from reading the TDIMon web
page, please correct me if anyone knows what's really happening.

Has anyone else out there seen this? I don't have access to the Windows DDK to
research the DATA_NOT_ACCEPTED in response to the TDI_EVENT_RECEIVE_DATAGRAM,
but I'm going to try some more googling and playing with the DatagramSocket and
DatagramPacket bind addresses some more to see if I can make something break.
Before finding TDIMon, I tried playing with the receive socket in
NameServiceClient.ensureOpen() to various endpoints (0.0.0.0, null, etc), but I
don't think that the LMB is talking back, so that's having no effect.

Some more notes: 
- running "ls" in SmbShell also fails with the same type of error, although if
the network as an LMB for another workgroup, it will reply with the workgroup
list, but ls THISDOMAIN/ doesn't work
- I haven't tested yet with a backup browser on the same network, as most of the
time I don't get a back up even with both machine's Browser service started)
- The reason this whole thing is important is that in a lot of small home
networks, there will only be a one or two PC's, with likely only one workgroup
(MSHOME anyone? :))
- I need to run SmbShell on the non-LMB with TDImon running, but....the other
half of my test network ain't here right now. 

Anyone have any clue? I'm still learning the guts of CIFS, and man, they are
scary guts. :)

Thanks much in advance, and thanks for jCIFS, also!

Kerry Kopp