[PATCH v2] smb: client: Fix use-after-free of network namespace.

Tom Talpey tom at talpey.com
Sun Nov 3 02:47:34 UTC 2024


On 11/2/2024 5:24 PM, Kuniyuki Iwashima wrote:
> Recently, we got a customer report that CIFS triggers oops while
> reconnecting to a server.  [0]
> 
> The workload runs on Kubernetes, and some pods mount CIFS servers
> in non-root network namespaces.  The problem rarely happened, but
> it was always while the pod was dying.
> 
> The root cause is wrong reference counting for network namespace.

Saw your similar fix in the NFS client, thanks for attending to the
SMB client as well.

Question below...

> CIFS uses kernel sockets, which do not hold refcnt of the netns that
> the socket belongs to.  That means CIFS must ensure the socket is
> always freed before its netns; otherwise, use-after-free happens.
> 
> The repro steps are roughly:
> 
>    1. mount CIFS in a non-root netns
>    2. drop packets from the netns
>    3. destroy the netns
>    4. unmount CIFS
> 
> We can reproduce the issue quickly with the script [1] below and see
> the splat [2] if CONFIG_NET_NS_REFCNT_TRACKER is enabled.
> 
> When the socket is TCP, it is hard to guarantee the netns lifetime
> without holding refcnt due to async timers.
> 
> Let's hold netns refcnt for each socket as done for SMC in commit
> 9744d2bf1976 ("smc: Fix use-after-free in tcp_write_timer_handler().").
> 
> Note that we need to move put_net() from cifs_put_tcp_session() to
> clean_demultiplex_info(); otherwise, __sock_create() still could touch a
> freed netns while cifsd tries to reconnect from cifs_demultiplex_thread().
> 
> Also, maybe_get_net() cannot be put just before __sock_create() because
> the code is not under RCU and there is a small chance that the same
> address happened to be reallocated to another netns.
> 
> [0]:
> CIFS: VFS: \\XXXXXXXXXXX has not responded in 15 seconds. Reconnecting...
> CIFS: Serverclose failed 4 times, giving up
> Unable to handle kernel paging request at virtual address 14de99e461f84a07
> Mem abort info:
>    ESR = 0x0000000096000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
> Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
> [14de99e461f84a07] address between user and kernel address ranges
> Internal error: Oops: 0000000096000004 [#1] SMP
> Modules linked in: cls_bpf sch_ingress nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver tcp_diag inet_diag veth xt_state xt_connmark nf_conntrack_netlink xt_nat xt_statistic xt_MASQUERADE xt_mark xt_addrtype ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nf_tables nfnetlink overlay nls_ascii nls_cp437 sunrpc vfat fat aes_ce_blk aes_ce_cipher ghash_ce sm4_ce_cipher sm4 sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 sha1_ce ena button sch_fq_codel loop fuse configfs dmi_sysfs sha2_ce sha256_arm64 dm_mirror dm_region_hash dm_log dm_mod dax efivarfs
> CPU: 5 PID: 2690970 Comm: cifsd Not tainted 6.1.103-109.184.amzn2023.aarch64 #1
> Hardware name: Amazon EC2 r7g.4xlarge/, BIOS 1.0 11/1/2018
> pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : fib_rules_lookup+0x44/0x238
> lr : __fib_lookup+0x64/0xbc
> sp : ffff8000265db790
> x29: ffff8000265db790 x28: 0000000000000000 x27: 000000000000bd01
> x26: 0000000000000000 x25: ffff000b4baf8000 x24: ffff00047b5e4580
> x23: ffff8000265db7e0 x22: 0000000000000000 x21: ffff00047b5e4500
> x20: ffff0010e3f694f8 x19: 14de99e461f849f7 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000000 x12: 3f92800abd010002
> x11: 0000000000000001 x10: ffff0010e3f69420 x9 : ffff800008a6f294
> x8 : 0000000000000000 x7 : 0000000000000006 x6 : 0000000000000000
> x5 : 0000000000000001 x4 : ffff001924354280 x3 : ffff8000265db7e0
> x2 : 0000000000000000 x1 : ffff0010e3f694f8 x0 : ffff00047b5e4500
> Call trace:
>   fib_rules_lookup+0x44/0x238
>   __fib_lookup+0x64/0xbc
>   ip_route_output_key_hash_rcu+0x2c4/0x398
>   ip_route_output_key_hash+0x60/0x8c
>   tcp_v4_connect+0x290/0x488
>   __inet_stream_connect+0x108/0x3d0
>   inet_stream_connect+0x50/0x78
>   kernel_connect+0x6c/0xac
>   generic_ip_connect+0x10c/0x6c8 [cifs]
>   __reconnect_target_unlocked+0xa0/0x214 [cifs]
>   reconnect_dfs_server+0x144/0x460 [cifs]
>   cifs_reconnect+0x88/0x148 [cifs]
>   cifs_readv_from_socket+0x230/0x430 [cifs]
>   cifs_read_from_socket+0x74/0xa8 [cifs]
>   cifs_demultiplex_thread+0xf8/0x704 [cifs]
>   kthread+0xd0/0xd4
> Code: aa0003f8 f8480f13 eb18027f 540006c0 (b9401264)
> 
> [1]:
> CIFS_CRED="/root/cred.cifs"
> CIFS_USER="Administrator"
> CIFS_PASS="Password"
> CIFS_IP="X.X.X.X"
> CIFS_PATH="//${CIFS_IP}/Users/Administrator/Desktop/CIFS_TEST"
> CIFS_MNT="/mnt/smb"
> DEV="enp0s3"
> 
> cat <<EOF > ${CIFS_CRED}
> username=${CIFS_USER}
> password=${CIFS_PASS}
> domain=EXAMPLE.COM
> EOF
> 
> unshare -n bash -c "
> mkdir -p ${CIFS_MNT}
> ip netns attach root 1
> ip link add eth0 type veth peer veth0 netns root
> ip link set eth0 up
> ip -n root link set veth0 up
> ip addr add 192.168.0.2/24 dev eth0
> ip -n root addr add 192.168.0.1/24 dev veth0
> ip route add default via 192.168.0.1 dev eth0
> ip netns exec root sysctl net.ipv4.ip_forward=1
> ip netns exec root iptables -t nat -A POSTROUTING -s 192.168.0.2 -o ${DEV} -j MASQUERADE
> mount -t cifs ${CIFS_PATH} ${CIFS_MNT} -o vers=3.0,sec=ntlmssp,credentials=${CIFS_CRED},rsize=65536,wsize=65536,cache=none,echo_interval=1
> touch ${CIFS_MNT}/a.txt
> ip netns exec root iptables -t nat -D POSTROUTING -s 192.168.0.2 -o ${DEV} -j MASQUERADE
> "
> 
> umount ${CIFS_MNT}
> 
> [2]:
> ref_tracker: net notrefcnt at 000000004bbc008d has 1/1 users at
>       sk_alloc (./include/net/net_namespace.h:339 net/core/sock.c:2227)
>       inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
>       __sock_create (net/socket.c:1576)
>       generic_ip_connect (fs/smb/client/connect.c:3075)
>       cifs_get_tcp_session.part.0 (fs/smb/client/connect.c:3160 fs/smb/client/connect.c:1798)
>       cifs_mount_get_session (fs/smb/client/trace.h:959 fs/smb/client/connect.c:3366)
>       dfs_mount_share (fs/smb/client/dfs.c:63 fs/smb/client/dfs.c:285)
>       cifs_mount (fs/smb/client/connect.c:3622)
>       cifs_smb3_do_mount (fs/smb/client/cifsfs.c:949)
>       smb3_get_tree (fs/smb/client/fs_context.c:784 fs/smb/client/fs_context.c:802 fs/smb/client/fs_context.c:794)
>       vfs_get_tree (fs/super.c:1800)
>       path_mount (fs/namespace.c:3508 fs/namespace.c:3834)
>       __x64_sys_mount (fs/namespace.c:3848 fs/namespace.c:4057 fs/namespace.c:4034 fs/namespace.c:4034)
>       do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> 
> Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> Signed-off-by: Kuniyuki Iwashima <kuniyu at amazon.com>
> ---
> Note:
> 
> After this fix lands on net-next.git in the next release cycle,
> I'll post the series [3] below and convert some users doing the
> similar conversion for sock_create_kern(), MPTCP, SMC, RDS, CIFS.
> Then, the CIFS code will look the mostly same with now [4].
> 
> [3]: https://github.com/q2ven/linux/commits/427_rename_sock_create_kern
> [4]: https://github.com/q2ven/linux/commit/41f8874bb4ac329b388c38b14d1c615a968abbb3
> 
> Changes:
>    v2: Convert socket to hold netns instead of passing @kern=0
>        to __sock_create() to avoid inet_release() BPF hook
> 
>    v1: https://lore.kernel.org/linux-cifs/20241031175709.20111-1-kuniyu@amazon.com/
> ---
>   fs/smb/client/connect.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> index 15d94ac4095e..0ce2d704b1f3 100644
> --- a/fs/smb/client/connect.c
> +++ b/fs/smb/client/connect.c
> @@ -1037,6 +1037,7 @@ clean_demultiplex_info(struct TCP_Server_Info *server)
>   		 */
>   	}
>   
> +	put_net(cifs_net_ns(server));
>   	kfree(server->leaf_fullpath);
>   	kfree(server);
>   
> @@ -1635,8 +1636,6 @@ cifs_put_tcp_session(struct TCP_Server_Info *server, int from_reconnect)
>   	/* srv_count can never go negative */
>   	WARN_ON(server->srv_count < 0);
>   
> -	put_net(cifs_net_ns(server));
> -
>   	list_del_init(&server->tcp_ses_list);
>   	spin_unlock(&cifs_tcp_ses_lock);
>   
> @@ -3070,13 +3069,22 @@ generic_ip_connect(struct TCP_Server_Info *server)
>   	if (server->ssocket) {
>   		socket = server->ssocket;
>   	} else {
> -		rc = __sock_create(cifs_net_ns(server), sfamily, SOCK_STREAM,
> +		struct net *net = cifs_net_ns(server);
> +		struct sock *sk;
> +
> +		rc = __sock_create(net, sfamily, SOCK_STREAM,
>   				   IPPROTO_TCP, &server->ssocket, 1);

Wouldn't it be clearer and less fragile to use the sock_create_kernel()
wrapper instead of __sock_create()?

Apart from that,

Acked-by: Tom Talpey <tom at talpey.com>

Also, does this issue appear in the ksmbd server?
In fs/smb/server/transport_tcp.c, it's calling regular sock_create().

Tom.

>   		if (rc < 0) {
>   			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
>   			return rc;
>   		}
>   
> +		sk = server->ssocket->sk;
> +		__netns_tracker_free(net, &sk->ns_tracker, false);
> +		sk->sk_net_refcnt = 1;
> +		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> +		sock_inuse_add(net, 1);
> +
>   		/* BB other socket options to set KEEPALIVE, NODELAY? */
>   		cifs_dbg(FYI, "Socket created\n");
>   		socket = server->ssocket;




More information about the samba-technical mailing list