In the Linux kernel, the following vulnerability has been resolved: platform/chrome: cros_ec_uart: properly fix race condition The cros_ec_uart_probe() function calls devm_serdev_device_open() before it calls serdev_device_set_client_ops(). This can trigger a NULL pointer dereference: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... Call Trace: <TASK> ... ? ttyport_receive_buf A simplified version of crashing code is as follows: static inline size_t serdev_controller_receive_buf(struct serdev_controller *ctrl, const u8 *data, size_t count) { struct serdev_device *serdev = ctrl->serdev; if (!serdev || !serdev->ops->receive_buf) // CRASH! return 0; return serdev->ops->receive_buf(serdev, data, count); } It assumes that if SERPORT_ACTIVE is set and serdev exists, serdev->ops will also exist. This conflicts with the existing cros_ec_uart_probe() logic, as it first calls devm_serdev_device_open() (which sets SERPORT_ACTIVE), and only later sets serdev->ops via serdev_device_set_client_ops(). Commit 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race condition") attempted to fix a similar race condition, but while doing so, made the window of error for this race condition to happen much wider. Attempt to fix the race condition again, making sure we fully setup before calling devm_serdev_device_open().
Race condition in the tty_fasync function in drivers/char/tty_io.c in the Linux kernel before 2.6.32.6 allows local users to cause a denial of service (NULL pointer dereference and system crash) or possibly have unspecified other impact via unknown vectors, related to the put_tty_queue and __f_setown functions. NOTE: the vulnerability was addressed in a different way in 2.6.32.9.
Race condition in some Intel(R) Aptio* V UEFI Firmware Integrator Tools may allow an authenticated user to potentially enable denial of service via local access.
A race problem was found in fs/proc/task_mmu.c in the memory management sub-component in the Linux kernel. This issue may allow a local attacker with user privilege to cause a denial of service.
In the Linux kernel, the following vulnerability has been resolved: ipv4: Fix a data-race around sysctl_fib_sync_mem. While reading sysctl_fib_sync_mem, it can be changed concurrently. So, we need to add READ_ONCE() to avoid a data-race.
In the Linux kernel, the following vulnerability has been resolved: raw: Fix a data-race around sysctl_raw_l3mdev_accept. While reading sysctl_raw_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: icmp: Fix data-races around sysctl_icmp_echo_enable_probe. While reading sysctl_icmp_echo_enable_probe, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: cipso: Fix data-races around sysctl. While reading cipso sysctl variables, they can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix a data-race around sysctl_tcp_ecn_fallback. While reading sysctl_tcp_ecn_fallback, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: igmp: Fix data-races around sysctl_igmp_llm_reports. While reading sysctl_igmp_llm_reports, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. This test can be packed into a helper, so such changes will be in the follow-up series after net is merged into net-next. if (ipv4_is_local_multicast(pmc->multiaddr) && !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports))
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix a data-race around sysctl_tcp_notsent_lowat. While reading sysctl_tcp_notsent_lowat, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: drm/msm/dp: do not complete dp_aux_cmd_fifo_tx() if irq is not for aux transfer There are 3 possible interrupt sources are handled by DP controller, HPDstatus, Controller state changes and Aux read/write transaction. At every irq, DP controller have to check isr status of every interrupt sources and service the interrupt if its isr status bits shows interrupts are pending. There is potential race condition may happen at current aux isr handler implementation since it is always complete dp_aux_cmd_fifo_tx() even irq is not for aux read or write transaction. This may cause aux read transaction return premature if host aux data read is in the middle of waiting for sink to complete transferring data to host while irq happen. This will cause host's receiving buffer contains unexpected data. This patch fixes this problem by checking aux isr and return immediately at aux isr handler if there are no any isr status bits set. Current there is a bug report regrading eDP edid corruption happen during system booting up. After lengthy debugging to found that VIDEO_READY interrupt was continuously firing during system booting up which cause dp_aux_isr() to complete dp_aux_cmd_fifo_tx() prematurely to retrieve data from aux hardware buffer which is not yet contains complete data transfer from sink. This cause edid corruption. Follows are the signature at kernel logs when problem happen, EDID has corrupt header panel-simple-dp-aux aux-aea0000.edp: Couldn't identify panel via EDID Changes in v2: -- do complete if (ret == IRQ_HANDLED) ay dp-aux_isr() -- add more commit text Changes in v3: -- add Stephen suggested -- dp_aux_isr() return IRQ_XXX back to caller -- dp_ctrl_isr() return IRQ_XXX back to caller Changes in v4: -- split into two patches Changes in v5: -- delete empty line between tags Changes in v6: -- remove extra "that" and fixed line more than 75 char at commit text Patchwork: https://patchwork.freedesktop.org/patch/516121/
In the Linux kernel, the following vulnerability has been resolved: ip: Fix a data-race around sysctl_fwmark_reflect. While reading sysctl_fwmark_reflect, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix data-races around sysctl_tcp_slow_start_after_idle. While reading sysctl_tcp_slow_start_after_idle, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix data-races around sysctl_tcp_recovery. While reading sysctl_tcp_recovery, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept. While reading sysctl_tcp_fwmark_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix data-races around sysctl_tcp_l3mdev_accept. While reading sysctl_tcp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: ipv4: Fix data-races around sysctl_fib_multipath_hash_fields. While reading sysctl_fib_multipath_hash_fields, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: ip: Fix data-races around sysctl_ip_fwd_update_priority. While reading sysctl_ip_fwd_update_priority, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: ice: Fix race condition during interface enslave Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating auxiliary device") changes a process of re-creation of aux device so ice_plug_aux_dev() is called from ice_service_task() context. This unfortunately opens a race window that can result in dead-lock when interface has left LAG and immediately enters LAG again. Reproducer: ``` #!/bin/sh ip link add lag0 type bond mode 1 miimon 100 ip link set lag0 for n in {1..10}; do echo Cycle: $n ip link set ens7f0 master lag0 sleep 1 ip link set ens7f0 nomaster done ``` This results in: [20976.208697] Workqueue: ice ice_service_task [ice] [20976.213422] Call Trace: [20976.215871] __schedule+0x2d1/0x830 [20976.219364] schedule+0x35/0xa0 [20976.222510] schedule_preempt_disabled+0xa/0x10 [20976.227043] __mutex_lock.isra.7+0x310/0x420 [20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core] [20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core] [20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core] [20976.261079] ib_register_device+0x40d/0x580 [ib_core] [20976.266139] irdma_ib_register_device+0x129/0x250 [irdma] [20976.281409] irdma_probe+0x2c1/0x360 [irdma] [20976.285691] auxiliary_bus_probe+0x45/0x70 [20976.289790] really_probe+0x1f2/0x480 [20976.298509] driver_probe_device+0x49/0xc0 [20976.302609] bus_for_each_drv+0x79/0xc0 [20976.306448] __device_attach+0xdc/0x160 [20976.310286] bus_probe_device+0x9d/0xb0 [20976.314128] device_add+0x43c/0x890 [20976.321287] __auxiliary_device_add+0x43/0x60 [20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice] [20976.330109] ice_service_task+0xd0c/0xed0 [ice] [20976.342591] process_one_work+0x1a7/0x360 [20976.350536] worker_thread+0x30/0x390 [20976.358128] kthread+0x10a/0x120 [20976.365547] ret_from_fork+0x1f/0x40 ... [20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084 [20976.446469] Call Trace: [20976.448921] __schedule+0x2d1/0x830 [20976.452414] schedule+0x35/0xa0 [20976.455559] schedule_preempt_disabled+0xa/0x10 [20976.460090] __mutex_lock.isra.7+0x310/0x420 [20976.464364] device_del+0x36/0x3c0 [20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice] [20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice] [20976.477288] notifier_call_chain+0x47/0x70 [20976.481386] __netdev_upper_dev_link+0x18b/0x280 [20976.489845] bond_enslave+0xe05/0x1790 [bonding] [20976.494475] do_setlink+0x336/0xf50 [20976.502517] __rtnl_newlink+0x529/0x8b0 [20976.543441] rtnl_newlink+0x43/0x60 [20976.546934] rtnetlink_rcv_msg+0x2b1/0x360 [20976.559238] netlink_rcv_skb+0x4c/0x120 [20976.563079] netlink_unicast+0x196/0x230 [20976.567005] netlink_sendmsg+0x204/0x3d0 [20976.570930] sock_sendmsg+0x4c/0x50 [20976.574423] ____sys_sendmsg+0x1eb/0x250 [20976.586807] ___sys_sendmsg+0x7c/0xc0 [20976.606353] __sys_sendmsg+0x57/0xa0 [20976.609930] do_syscall_64+0x5b/0x1a0 [20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca 1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev() is called from ice_service_task() context, aux device is created and associated device->lock is taken. 2. Command 'ip link ... set master...' calls ice's notifier under RTNL lock and that notifier calls ice_unplug_aux_dev(). That function tries to take aux device->lock but this is already taken by ice_plug_aux_dev() in step 1 3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already taken in step 2 4. Dead-lock The patch fixes this issue by following changes: - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev() call in ice_service_task() - The bit is checked in ice_clear_rdma_cap() and only if it is not set then ice_unplug_aux_dev() is called. If it is set (in other words plugging of aux device was requested and ice_plug_aux_dev() is potentially running) then the function only clears the ---truncated---
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix a data-race around sysctl_tcp_probe_interval. While reading sysctl_tcp_probe_interval, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: ip: Fix data-races around sysctl_ip_fwd_use_pmtu. While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix data-races around sysctl_tcp_mtu_probing. While reading sysctl_tcp_mtu_probing, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: ipv4: Fix data-races around sysctl_fib_multipath_hash_policy. While reading sysctl_fib_multipath_hash_policy, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix data-races around sysctl_tcp_fastopen_blackhole_timeout. While reading sysctl_tcp_fastopen_blackhole_timeout, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: configfs: fix a race in configfs_{,un}register_subsystem() When configfs_register_subsystem() or configfs_unregister_subsystem() is executing link_group() or unlink_group(), it is possible that two processes add or delete list concurrently. Some unfortunate interleavings of them can cause kernel panic. One of cases is: A --> B --> C --> D A <-- B <-- C <-- D delete list_head *B | delete list_head *C --------------------------------|----------------------------------- configfs_unregister_subsystem | configfs_unregister_subsystem unlink_group | unlink_group unlink_obj | unlink_obj list_del_init | list_del_init __list_del_entry | __list_del_entry __list_del | __list_del // next == C | next->prev = prev | | next->prev = prev prev->next = next | | // prev == B | prev->next = next Fix this by adding mutex when calling link_group() or unlink_group(), but parent configfs_subsystem is NULL when config_item is root. So I create a mutex configfs_subsystem_mutex.
In the Linux kernel, the following vulnerability has been resolved: list: fix a data-race around ep->rdllist ep_poll() first calls ep_events_available() with no lock held and checks if ep->rdllist is empty by list_empty_careful(), which reads rdllist->prev. Thus all accesses to it need some protection to avoid store/load-tearing. Note INIT_LIST_HEAD_RCU() already has the annotation for both prev and next. Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") added the first lockless ep_events_available(), and commit c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") made some ep_events_available() calls lockless and added single call under a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") made the last ep_events_available() lockless. BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0: INIT_LIST_HEAD include/linux/list.h:38 [inline] list_splice_init include/linux/list.h:492 [inline] ep_start_scan fs/eventpoll.c:622 [inline] ep_send_events fs/eventpoll.c:1656 [inline] ep_poll fs/eventpoll.c:1806 [inline] do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1: list_empty_careful include/linux/list.h:329 [inline] ep_events_available fs/eventpoll.c:381 [inline] ep_poll fs/eventpoll.c:1797 [inline] do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffff88810480c7d0 -> 0xffff888103c15098 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
In the Linux kernel, the following vulnerability has been resolved: ibmvnic: fix race between xmit and reset There is a race between reset and the transmit paths that can lead to ibmvnic_xmit() accessing an scrq after it has been freed in the reset path. It can result in a crash like: Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0080000016189f8 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c0080000016189f8] ibmvnic_xmit+0x60/0xb60 [ibmvnic] LR [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 Call Trace: [c008000001618f08] ibmvnic_xmit+0x570/0xb60 [ibmvnic] (unreliable) [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 [c000000000c9cfcc] sch_direct_xmit+0xec/0x330 [c000000000bfe640] __dev_xmit_skb+0x3a0/0x9d0 [c000000000c00ad4] __dev_queue_xmit+0x394/0x730 [c008000002db813c] __bond_start_xmit+0x254/0x450 [bonding] [c008000002db8378] bond_start_xmit+0x40/0xc0 [bonding] [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 [c000000000c00ca4] __dev_queue_xmit+0x564/0x730 [c000000000cf97e0] neigh_hh_output+0xd0/0x180 [c000000000cfa69c] ip_finish_output2+0x31c/0x5c0 [c000000000cfd244] __ip_queue_xmit+0x194/0x4f0 [c000000000d2a3c4] __tcp_transmit_skb+0x434/0x9b0 [c000000000d2d1e0] __tcp_retransmit_skb+0x1d0/0x6a0 [c000000000d2d984] tcp_retransmit_skb+0x34/0x130 [c000000000d310e8] tcp_retransmit_timer+0x388/0x6d0 [c000000000d315ec] tcp_write_timer_handler+0x1bc/0x330 [c000000000d317bc] tcp_write_timer+0x5c/0x200 [c000000000243270] call_timer_fn+0x50/0x1c0 [c000000000243704] __run_timers.part.0+0x324/0x460 [c000000000243894] run_timer_softirq+0x54/0xa0 [c000000000ea713c] __do_softirq+0x15c/0x3e0 [c000000000166258] __irq_exit_rcu+0x158/0x190 [c000000000166420] irq_exit+0x20/0x40 [c00000000002853c] timer_interrupt+0x14c/0x2b0 [c000000000009a00] decrementer_common_virt+0x210/0x220 --- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c The immediate cause of the crash is the access of tx_scrq in the following snippet during a reset, where the tx_scrq can be either NULL or an address that will soon be invalid: ibmvnic_xmit() { ... tx_scrq = adapter->tx_scrq[queue_num]; txq = netdev_get_tx_queue(netdev, queue_num); ind_bufp = &tx_scrq->ind_buf; if (test_bit(0, &adapter->resetting)) { ... } But beyond that, the call to ibmvnic_xmit() itself is not safe during a reset and the reset path attempts to avoid this by stopping the queue in ibmvnic_cleanup(). However just after the queue was stopped, an in-flight ibmvnic_complete_tx() could have restarted the queue even as the reset is progressing. Since the queue was restarted we could get a call to ibmvnic_xmit() which can then access the bad tx_scrq (or other fields). We cannot however simply have ibmvnic_complete_tx() check the ->resetting bit and skip starting the queue. This can race at the "back-end" of a good reset which just restarted the queue but has not cleared the ->resetting bit yet. If we skip restarting the queue due to ->resetting being true, the queue would remain stopped indefinitely potentially leading to transmit timeouts. IOW ->resetting is too broad for this purpose. Instead use a new flag that indicates whether or not the queues are active. Only the open/ reset paths control when the queues are active. ibmvnic_complete_tx() and others wake up the queue only if the queue is marked active. So we will have: A. reset/open thread in ibmvnic_cleanup() and __ibmvnic_open() ->resetting = true ->tx_queues_active = false disable tx queues ... ->tx_queues_active = true start tx queues B. Tx interrupt in ibmvnic_complete_tx(): if (->tx_queues_active) netif_wake_subqueue(); To ensure that ->tx_queues_active and state of the queues are consistent, we need a lock which: - must also be taken in the interrupt path (ibmvnic_complete_tx()) - shared across the multiple ---truncated---
In the Linux kernel, the following vulnerability has been resolved: nbd: fix race between nbd_alloc_config() and module removal When nbd module is being removing, nbd_alloc_config() may be called concurrently by nbd_genl_connect(), although try_module_get() will return false, but nbd_alloc_config() doesn't handle it. The race may lead to the leak of nbd_config and its related resources (e.g, recv_workq) and oops in nbd_read_stat() due to the unload of nbd module as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000040 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: knbd16-recv recv_work [nbd] RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd] Call Trace: recv_work+0x3b/0xb0 [nbd] process_one_work+0x1ed/0x390 worker_thread+0x4a/0x3d0 kthread+0x12a/0x150 ret_from_fork+0x22/0x30 Fixing it by checking the return value of try_module_get() in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV), assign nbd->config only when nbd_alloc_config() succeeds to ensure the value of nbd->config is binary (valid or NULL). Also adding a debug message to check the reference counter of nbd_config during module removal.
In the Linux kernel, the following vulnerability has been resolved: fscache: Fix oops due to race with cookie_lru and use_cookie If a cookie expires from the LRU and the LRU_DISCARD flag is set, but the state machine has not run yet, it's possible another thread can call fscache_use_cookie and begin to use it. When the cookie_worker finally runs, it will see the LRU_DISCARD flag set, transition the cookie->state to LRU_DISCARDING, which will then withdraw the cookie. Once the cookie is withdrawn the object is removed the below oops will occur because the object associated with the cookie is now NULL. Fix the oops by clearing the LRU_DISCARD bit if another thread uses the cookie before the cookie_worker runs. BUG: kernel NULL pointer dereference, address: 0000000000000008 ... CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs] RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles] ... Call Trace: netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs] process_one_work+0x217/0x3e0 worker_thread+0x4a/0x3b0 kthread+0xd6/0x100
In the Linux kernel, the following vulnerability has been resolved: xsk: Fix race at socket teardown Fix a race in the xsk socket teardown code that can lead to a NULL pointer dereference splat. The current xsk unbind code in xsk_unbind_dev() starts by setting xs->state to XSK_UNBOUND, sets xs->dev to NULL and then waits for any NAPI processing to terminate using synchronize_net(). After that, the release code starts to tear down the socket state and free allocated memory. BUG: kernel NULL pointer dereference, address: 00000000000000c0 PGD 8000000932469067 P4D 8000000932469067 PUD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 25 PID: 69132 Comm: grpcpp_sync_ser Tainted: G I 5.16.0+ #2 Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.2.10 03/09/2015 RIP: 0010:__xsk_sendmsg+0x2c/0x690 [...] RSP: 0018:ffffa2348bd13d50 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000040 RCX: ffff8d5fc632d258 RDX: 0000000000400000 RSI: ffffa2348bd13e10 RDI: ffff8d5fc5489800 RBP: ffffa2348bd13db0 R08: 0000000000000000 R09: 00007ffffffff000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d5fc5489800 R13: ffff8d5fcb0f5140 R14: ffff8d5fcb0f5140 R15: 0000000000000000 FS: 00007f991cff9400(0000) GS:ffff8d6f1f700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c0 CR3: 0000000114888005 CR4: 00000000001706e0 Call Trace: <TASK> ? aa_sk_perm+0x43/0x1b0 xsk_sendmsg+0xf0/0x110 sock_sendmsg+0x65/0x70 __sys_sendto+0x113/0x190 ? debug_smp_processor_id+0x17/0x20 ? fpregs_assert_state_consistent+0x23/0x50 ? exit_to_user_mode_prepare+0xa5/0x1d0 __x64_sys_sendto+0x29/0x30 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae There are two problems with the current code. First, setting xs->dev to NULL before waiting for all users to stop using the socket is not correct. The entry to the data plane functions xsk_poll(), xsk_sendmsg(), and xsk_recvmsg() are all guarded by a test that xs->state is in the state XSK_BOUND and if not, it returns right away. But one process might have passed this test but still have not gotten to the point in which it uses xs->dev in the code. In this interim, a second process executing xsk_unbind_dev() might have set xs->dev to NULL which will lead to a crash for the first process. The solution here is just to get rid of this NULL assignment since it is not used anymore. Before commit 42fddcc7c64b ("xsk: use state member for socket synchronization"), xs->dev was the gatekeeper to admit processes into the data plane functions, but it was replaced with the state variable xs->state in the aforementioned commit. The second problem is that synchronize_net() does not wait for any process in xsk_poll(), xsk_sendmsg(), or xsk_recvmsg() to complete, which means that the state they rely on might be cleaned up prematurely. This can happen when the notifier gets called (at driver unload for example) as it uses xsk_unbind_dev(). Solve this by extending the RCU critical region from just the ndo_xsk_wakeup to the whole functions mentioned above, so that both the test of xs->state == XSK_BOUND and the last use of any member of xs is covered by the RCU critical section. This will guarantee that when synchronize_net() completes, there will be no processes left executing xsk_poll(), xsk_sendmsg(), or xsk_recvmsg() and state can be cleaned up safely. Note that we need to drop the RCU lock for the skb xmit path as it uses functions that might sleep. Due to this, we have to retest the xs->state after we grab the mutex that protects the skb xmit code from, among a number of things, an xsk_unbind_dev() being executed from the notifier at the same time.
In the Linux kernel, the following vulnerability has been resolved: cfg80211: fix race in netlink owner interface destruction My previous fix here to fix the deadlock left a race where the exact same deadlock (see the original commit referenced below) can still happen if cfg80211_destroy_ifaces() already runs while nl80211_netlink_notify() is still marking some interfaces as nl_owner_dead. The race happens because we have two loops here - first we dev_close() all the netdevs, and then we destroy them. If we also have two netdevs (first one need only be a wdev though) then we can find one during the first iteration, close it, and go to the second iteration -- but then find two, and try to destroy also the one we didn't close yet. Fix this by only iterating once.
In the Linux kernel, the following vulnerability has been resolved: ip: Fix a data-race around sysctl_ip_autobind_reuse. While reading sysctl_ip_autobind_reuse, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: udp: Fix a data-race around sysctl_udp_l3mdev_accept. While reading sysctl_udp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts. While reading sysctl_tcp_thin_linear_timeouts, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix a data-race around sysctl_tcp_probe_threshold. While reading sysctl_tcp_probe_threshold, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix data-races around sysctl_tcp_fastopen. While reading sysctl_tcp_fastopen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
In the Linux kernel, the following vulnerability has been resolved: netfilter: ipset: fix performance regression in swap operation The patch "netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test", commit 28628fa9 fixes a race condition. But the synchronize_rcu() added to the swap function unnecessarily slows it down: it can safely be moved to destroy and use call_rcu() instead. Eric Dumazet pointed out that simply calling the destroy functions as rcu callback does not work: sets with timeout use garbage collectors which need cancelling at destroy which can wait. Therefore the destroy functions are split into two: cancelling garbage collectors safely at executing the command received by netlink and moving the remaining part only into the rcu callback.
A use-after-free vulnerability was found in network namespaces code affecting the Linux kernel before 4.14.11. The function get_net_ns_by_id() in net/core/net_namespace.c does not check for the net::count value after it has found a peer network in netns_ids idr, which could lead to double free and memory corruption. This vulnerability could allow an unprivileged local user to induce kernel memory corruption on the system, leading to a crash. Due to the nature of the flaw, privilege escalation cannot be fully ruled out, although it is thought to be unlikely.
In the Linux kernel, the following vulnerability has been resolved: tls: fix race between async notify and socket close The submitting thread (one which called recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete() so any code past that point risks touching already freed data. Try to avoid the locking and extra flags altogether. Have the main thread hold an extra reference, this way we can depend solely on the atomic ref counter for synchronization. Don't futz with reiniting the completion, either, we are now tightly controlling when completion fires.
In the Linux kernel, the following vulnerability has been resolved: tls: fix race between tx work scheduling and socket close Similarly to previous commit, the submitting thread (recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete(). Reorder scheduling the work before calling complete(). This seems more logical in the first place, as it's the inverse order of what the submitting thread will do.
roccat_report_event in drivers/hid/hid-roccat.c in the Linux kernel through 5.19.12 has a race condition and resultant use-after-free in certain situations where a report is received while copying a report->value is in progress.
In the Linux kernel, the following vulnerability has been resolved: f2fs: fix to truncate meta inode pages forcely Below race case can cause data corruption: Thread A GC thread - gc_data_segment - ra_data_block - locked meta_inode page - f2fs_inplace_write_data - invalidate_mapping_pages : fail to invalidate meta_inode page due to lock failure or dirty|writeback status - f2fs_submit_page_bio : write last dirty data to old blkaddr - move_data_block - load old data from meta_inode page - f2fs_submit_page_write : write old data to new blkaddr Because invalidate_mapping_pages() will skip invalidating page which has unclear status including locked, dirty, writeback and so on, so we need to use truncate_inode_pages_range() instead of invalidate_mapping_pages() to make sure meta_inode page will be dropped.
In the Linux kernel, the following vulnerability has been resolved: hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove In commit ac5047671758 ("hv_netvsc: Disable NAPI before closing the VMBus channel"), napi_disable was getting called for all channels, including all subchannels without confirming if they are enabled or not. This caused hv_netvsc getting hung at napi_disable, when netvsc_probe() has finished running but nvdev->subchan_work has not started yet. netvsc_subchan_work() -> rndis_set_subchannel() has not created the sub-channels and because of that netvsc_sc_open() is not running. netvsc_remove() calls cancel_work_sync(&nvdev->subchan_work), for which netvsc_subchan_work did not run. netif_napi_add() sets the bit NAPI_STATE_SCHED because it ensures NAPI cannot be scheduled. Then netvsc_sc_open() -> napi_enable will clear the NAPIF_STATE_SCHED bit, so it can be scheduled. napi_disable() does the opposite. Now during netvsc_device_remove(), when napi_disable is called for those subchannels, napi_disable gets stuck on infinite msleep. This fix addresses this problem by ensuring that napi_disable() is not getting called for non-enabled NAPI struct. But netif_napi_del() is still necessary for these non-enabled NAPI struct for cleanup purpose. Call trace: [ 654.559417] task:modprobe state:D stack: 0 pid: 2321 ppid: 1091 flags:0x00004002 [ 654.568030] Call Trace: [ 654.571221] <TASK> [ 654.573790] __schedule+0x2d6/0x960 [ 654.577733] schedule+0x69/0xf0 [ 654.581214] schedule_timeout+0x87/0x140 [ 654.585463] ? __bpf_trace_tick_stop+0x20/0x20 [ 654.590291] msleep+0x2d/0x40 [ 654.593625] napi_disable+0x2b/0x80 [ 654.597437] netvsc_device_remove+0x8a/0x1f0 [hv_netvsc] [ 654.603935] rndis_filter_device_remove+0x194/0x1c0 [hv_netvsc] [ 654.611101] ? do_wait_intr+0xb0/0xb0 [ 654.615753] netvsc_remove+0x7c/0x120 [hv_netvsc] [ 654.621675] vmbus_remove+0x27/0x40 [hv_vmbus]
In the Linux kernel, the following vulnerability has been resolved: wireguard: receive: annotate data-race around receiving_counter.counter Syzkaller with KCSAN identified a data-race issue when accessing keypair->receiving_counter.counter. Use READ_ONCE() and WRITE_ONCE() annotations to mark the data race as intentional. BUG: KCSAN: data-race in wg_packet_decrypt_worker / wg_packet_rx_poll write to 0xffff888107765888 of 8 bytes by interrupt on cpu 0: counter_validate drivers/net/wireguard/receive.c:321 [inline] wg_packet_rx_poll+0x3ac/0xf00 drivers/net/wireguard/receive.c:461 __napi_poll+0x60/0x3b0 net/core/dev.c:6536 napi_poll net/core/dev.c:6605 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6738 __do_softirq+0xc4/0x279 kernel/softirq.c:553 do_softirq+0x5e/0x90 kernel/softirq.c:454 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x6c5/0x700 drivers/net/wireguard/receive.c:499 process_one_work kernel/workqueue.c:2633 [inline] ... read to 0xffff888107765888 of 8 bytes by task 3196 on cpu 1: decrypt_packet drivers/net/wireguard/receive.c:252 [inline] wg_packet_decrypt_worker+0x220/0x700 drivers/net/wireguard/receive.c:501 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 ...
In the Linux kernel, the following vulnerability has been resolved: quota: Fix potential NULL pointer dereference Below race may cause NULL pointer dereference P1 P2 dquot_free_inode quota_off drop_dquot_ref remove_dquot_ref dquots = i_dquot(inode) dquots = i_dquot(inode) srcu_read_lock dquots[cnt]) != NULL (1) dquots[type] = NULL (2) spin_lock(&dquots[cnt]->dq_dqb_lock) (3) .... If dquot_free_inode(or other routines) checks inode's quota pointers (1) before quota_off sets it to NULL(2) and use it (3) after that, NULL pointer dereference will be triggered. So let's fix it by using a temporary pointer to avoid this issue.
In the Linux kernel, the following vulnerability has been resolved: net: bridge: switchdev: Skip MDB replays of deferred events on offload Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, either from the IGMP/MLD snooping logic or from user configuration. While new memberships are immediately visible to walkers of br->mdb_list, the notification of their existence to switchdev event subscribers is deferred until a later point in time. So if a replay list was generated during a time that overlapped with such a window, it would also contain a replay of the not-yet-delivered event. The driver would thus receive two copies of what the bridge internally considered to be one single event. On destruction of the bridge, only a single membership deletion event was therefore sent. As a consequence of this, drivers which reference count memberships (at least DSA), would be left with orphan groups in their hardware database when the bridge was destroyed. This is only an issue when replaying additions. While deletion events may still be pending on the deferred queue, they will already have been removed from br->mdb_list, so no duplicates can be generated in that scenario. To a user this meant that old group memberships, from a bridge in which a port was previously attached, could be reanimated (in hardware) when the port joined a new bridge, without the new bridge's knowledge. For example, on an mv88e6xxx system, create a snooping bridge and immediately add a port to it: root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > ip link set dev x3 up master br0 And then destroy the bridge: root@infix-06-0b-00:~$ ip link del dev br0 root@infix-06-0b-00:~$ mvls atu ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a DEV:0 Marvell 88E6393X 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a root@infix-06-0b-00:~$ The two IPv6 groups remain in the hardware database because the port (x3) is notified of the host's membership twice: once via the original event and once via a replay. Since only a single delete notification is sent, the count remains at 1 when the bridge is destroyed. Then add the same port (or another port belonging to the same hardware domain) to a new bridge, this time with snooping disabled: root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \ > ip link set dev x3 up master br1 All multicast, including the two IPv6 groups from br0, should now be flooded, according to the policy of br1. But instead the old memberships are still active in the hardware database, causing the switch to only forward traffic to those groups towards the CPU (port 0). Eliminate the race in two steps: 1. Grab the write-side lock of the MDB while generating the replay list. This prevents new memberships from showing up while we are generating the replay list. But it leaves the scenario in which a deferred event was already generated, but not delivered, before we grabbed the lock. Therefore: 2. Make sure that no deferred version of a replay event is already enqueued to the switchdev deferred queue, before adding it to the replay list, when replaying additions.
A race condition was found in the Linux kernel's media/dvb-core in dvbdmx_write() function. This can result in a null pointer dereference issue, possibly leading to a kernel panic or denial of service issue.
An issue was discovered in the Linux kernel through 5.19.8. drivers/firmware/efi/capsule-loader.c has a race condition with a resultant use-after-free.
A race condition flaw was found in the Linux kernel sound subsystem due to improper locking. It could lead to a NULL pointer dereference while handling the SNDCTL_DSP_SYNC ioctl. A privileged local user (root or member of the audio group) could use this flaw to crash the system, resulting in a denial of service condition