In the Linux kernel, the following vulnerability has been resolved: mt76: mt7921: fix possible invalid register access Disable the interrupt and synchronze for the pending irq handlers to ensure the irq tasklet is not being scheduled after the suspend to avoid the possible invalid register access acts when the host pcie controller is suspended. [17932.910534] mt7921e 0000:01:00.0: pci_pm_suspend+0x0/0x22c returned 0 after 21375 usecs [17932.910590] pcieport 0000:00:00.0: calling pci_pm_suspend+0x0/0x22c @ 18565, parent: pci0000:00 [17932.910602] pcieport 0000:00:00.0: pci_pm_suspend+0x0/0x22c returned 0 after 8 usecs [17932.910671] mtk-pcie 11230000.pcie: calling platform_pm_suspend+0x0/0x60 @ 22783, parent: soc [17932.910674] mtk-pcie 11230000.pcie: platform_pm_suspend+0x0/0x60 returned 0 after 0 usecs ... 17933.615352] x1 : 00000000000d4200 x0 : ffffff8269ca2300 [17933.620666] Call trace: [17933.623127] mt76_mmio_rr+0x28/0xf0 [mt76] [17933.627234] mt7921_rr+0x38/0x44 [mt7921e] [17933.631339] mt7921_irq_tasklet+0x54/0x1d8 [mt7921e] [17933.636309] tasklet_action_common+0x12c/0x16c [17933.640754] tasklet_action+0x24/0x2c [17933.644418] __do_softirq+0x16c/0x344 [17933.648082] irq_exit+0xa8/0xac [17933.651224] scheduler_ipi+0xd4/0x148 [17933.654890] handle_IPI+0x164/0x2d4 [17933.658379] gic_handle_irq+0x140/0x178 [17933.662216] el1_irq+0xb8/0x180 [17933.665361] cpuidle_enter_state+0xf8/0x204 [17933.669544] cpuidle_enter+0x38/0x4c [17933.673122] do_idle+0x1a4/0x2a8 [17933.676352] cpu_startup_entry+0x24/0x28 [17933.680276] rest_init+0xd4/0xe0 [17933.683508] arch_call_rest_init+0x10/0x18 [17933.687606] start_kernel+0x340/0x3b4 [17933.691279] Code: aa0003f5 d503201f f953eaa8 8b344108 (b9400113) [17933.697373] ---[ end trace a24b8e26ffbda3c5 ]--- [17933.767846] Kernel panic - not syncing: Fatal exception in interrupt
In the Linux kernel, the following vulnerability has been resolved: nexthop: Fix division by zero while replacing a resilient group The resilient nexthop group torture tests in fib_nexthop.sh exposed a possible division by zero while replacing a resilient group [1]. The division by zero occurs when the data path sees a resilient nexthop group with zero buckets. The tests replace a resilient nexthop group in a loop while traffic is forwarded through it. The tests do not specify the number of buckets while performing the replacement, resulting in the kernel allocating a stub resilient table (i.e, 'struct nh_res_table') with zero buckets. This table should never be visible to the data path, but the old nexthop group (i.e., 'oldg') might still be used by the data path when the stub table is assigned to it. Fix this by only assigning the stub table to the old nexthop group after making sure the group is no longer used by the data path. Tested with fib_nexthops.sh: Tests passed: 222 Tests failed: 0 [1] divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 1850 Comm: ping Not tainted 5.14.0-custom-10271-ga86eb53057fe #1107 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:nexthop_select_path+0x2d2/0x1a80 [...] Call Trace: fib_select_multipath+0x79b/0x1530 fib_select_path+0x8fb/0x1c10 ip_route_output_key_hash_rcu+0x1198/0x2da0 ip_route_output_key_hash+0x190/0x340 ip_route_output_flow+0x21/0x120 raw_sendmsg+0x91d/0x2e10 inet_sendmsg+0x9e/0xe0 __sys_sendto+0x23d/0x360 __x64_sys_sendto+0xe1/0x1b0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae
In the Linux kernel, the following vulnerability has been resolved: dm: Always split write BIOs to zoned device limits Any zoned DM target that requires zone append emulation will use the block layer zone write plugging. In such case, DM target drivers must not split BIOs using dm_accept_partial_bio() as doing so can potentially lead to deadlocks with queue freeze operations. Regular write operations used to emulate zone append operations also cannot be split by the target driver as that would result in an invalid writen sector value return using the BIO sector. In order for zoned DM target drivers to avoid such incorrect BIO splitting, we must ensure that large BIOs are split before being passed to the map() function of the target, thus guaranteeing that the limits for the mapped device are not exceeded. dm-crypt and dm-flakey are the only target drivers supporting zoned devices and using dm_accept_partial_bio(). In the case of dm-crypt, this function is used to split BIOs to the internal max_write_size limit (which will be suppressed in a different patch). However, since crypt_alloc_buffer() uses a bioset allowing only up to BIO_MAX_VECS (256) vectors in a BIO. The dm-crypt device max_segments limit, which is not set and so default to BLK_MAX_SEGMENTS (128), must thus be respected and write BIOs split accordingly. In the case of dm-flakey, since zone append emulation is not required, the block layer zone write plugging is not used and no splitting of BIOs required. Modify the function dm_zone_bio_needs_split() to use the block layer helper function bio_needs_zone_write_plugging() to force a call to bio_split_to_limits() in dm_split_and_process_bio(). This allows DM target drivers to avoid using dm_accept_partial_bio() for write operations on zoned DM devices.
In the Linux kernel, the following vulnerability has been resolved: sched: Fix sched_numa_find_nth_cpu() if mask offline sched_numa_find_nth_cpu() uses a bsearch to look for the 'closest' CPU in sched_domains_numa_masks and given cpus mask. However they might not intersect if all CPUs in the cpus mask are offline. bsearch will return NULL in that case, bail out instead of dereferencing a bogus pointer. The previous behaviour lead to this bug when using maxcpus=4 on an rk3399 (LLLLbb) (i.e. booting with all big CPUs offline): [ 1.422922] Unable to handle kernel paging request at virtual address ffffff8000000000 [ 1.423635] Mem abort info: [ 1.423889] ESR = 0x0000000096000006 [ 1.424227] EC = 0x25: DABT (current EL), IL = 32 bits [ 1.424715] SET = 0, FnV = 0 [ 1.424995] EA = 0, S1PTW = 0 [ 1.425279] FSC = 0x06: level 2 translation fault [ 1.425735] Data abort info: [ 1.425998] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ 1.426499] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1.426952] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1.427428] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000004a9f000 [ 1.428038] [ffffff8000000000] pgd=18000000f7fff403, p4d=18000000f7fff403, pud=18000000f7fff403, pmd=0000000000000000 [ 1.429014] Internal error: Oops: 0000000096000006 [#1] SMP [ 1.429525] Modules linked in: [ 1.429813] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc4-dirty #343 PREEMPT [ 1.430559] Hardware name: Pine64 RockPro64 v2.1 (DT) [ 1.431012] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1.431634] pc : sched_numa_find_nth_cpu+0x2a0/0x488 [ 1.432094] lr : sched_numa_find_nth_cpu+0x284/0x488 [ 1.432543] sp : ffffffc084e1b960 [ 1.432843] x29: ffffffc084e1b960 x28: ffffff80078a8800 x27: ffffffc0846eb1d0 [ 1.433495] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 1.434144] x23: 0000000000000000 x22: fffffffffff7f093 x21: ffffffc081de6378 [ 1.434792] x20: 0000000000000000 x19: 0000000ffff7f093 x18: 00000000ffffffff [ 1.435441] x17: 3030303866666666 x16: 66663d736b73616d x15: ffffffc104e1b5b7 [ 1.436091] x14: 0000000000000000 x13: ffffffc084712860 x12: 0000000000000372 [ 1.436739] x11: 0000000000000126 x10: ffffffc08476a860 x9 : ffffffc084712860 [ 1.437389] x8 : 00000000ffffefff x7 : ffffffc08476a860 x6 : 0000000000000000 [ 1.438036] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 [ 1.438683] x2 : 0000000000000000 x1 : ffffffc0846eb000 x0 : ffffff8000407b68 [ 1.439332] Call trace: [ 1.439559] sched_numa_find_nth_cpu+0x2a0/0x488 (P) [ 1.440016] smp_call_function_any+0xc8/0xd0 [ 1.440416] armv8_pmu_init+0x58/0x27c [ 1.440770] armv8_cortex_a72_pmu_init+0x20/0x2c [ 1.441199] arm_pmu_device_probe+0x1e4/0x5e8 [ 1.441603] armv8_pmu_device_probe+0x1c/0x28 [ 1.442007] platform_probe+0x5c/0xac [ 1.442347] really_probe+0xbc/0x298 [ 1.442683] __driver_probe_device+0x78/0x12c [ 1.443087] driver_probe_device+0xdc/0x160 [ 1.443475] __driver_attach+0x94/0x19c [ 1.443833] bus_for_each_dev+0x74/0xd4 [ 1.444190] driver_attach+0x24/0x30 [ 1.444525] bus_add_driver+0xe4/0x208 [ 1.444874] driver_register+0x60/0x128 [ 1.445233] __platform_driver_register+0x24/0x30 [ 1.445662] armv8_pmu_driver_init+0x28/0x4c [ 1.446059] do_one_initcall+0x44/0x25c [ 1.446416] kernel_init_freeable+0x1dc/0x3bc [ 1.446820] kernel_init+0x20/0x1d8 [ 1.447151] ret_from_fork+0x10/0x20 [ 1.447493] Code: 90022e21 f000e5f5 910de2b5 2a1703e2 (f8767803) [ 1.448040] ---[ end trace 0000000000000000 ]--- [ 1.448483] note: swapper/0[1] exited with preempt_count 1 [ 1.449047] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 1.449741] SMP: stopping secondary CPUs [ 1.450105] Kernel Offset: disabled [ 1.450419] CPU features: 0x000000,00080000,20002001,0400421b [ ---truncated---
The do_umount function in fs/namespace.c in the Linux kernel through 3.17 does not require the CAP_SYS_ADMIN capability for do_remount_sb calls that change the root filesystem to read-only, which allows local users to cause a denial of service (loss of writability) by making certain unshare system calls, clearing the / MNT_LOCKED flag, and making an MNT_FORCE umount system call.
The pivot_root implementation in fs/namespace.c in the Linux kernel through 3.17 does not properly interact with certain locations of a chroot directory, which allows local users to cause a denial of service (mount-tree loop) via . (dot) values in both arguments to the pivot_root system call.
In the Linux kernel, the following vulnerability has been resolved: ksmbd: fix recursive locking in RPC handle list access Since commit 305853cce3794 ("ksmbd: Fix race condition in RPC handle list access"), ksmbd_session_rpc_method() attempts to lock sess->rpc_lock. This causes hung connections / tasks when a client attempts to open a named pipe. Using Samba's rpcclient tool: $ rpcclient //192.168.1.254 -U user%password $ rpcclient $> srvinfo <connection hung here> Kernel side: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/0:0 state:D stack:0 pid:5021 tgid:5021 ppid:2 flags:0x00200000 Workqueue: ksmbd-io handle_ksmbd_work Call trace: __schedule from schedule+0x3c/0x58 schedule from schedule_preempt_disabled+0xc/0x10 schedule_preempt_disabled from rwsem_down_read_slowpath+0x1b0/0x1d8 rwsem_down_read_slowpath from down_read+0x28/0x30 down_read from ksmbd_session_rpc_method+0x18/0x3c ksmbd_session_rpc_method from ksmbd_rpc_open+0x34/0x68 ksmbd_rpc_open from ksmbd_session_rpc_open+0x194/0x228 ksmbd_session_rpc_open from create_smb2_pipe+0x8c/0x2c8 create_smb2_pipe from smb2_open+0x10c/0x27ac smb2_open from handle_ksmbd_work+0x238/0x3dc handle_ksmbd_work from process_scheduled_works+0x160/0x25c process_scheduled_works from worker_thread+0x16c/0x1e8 worker_thread from kthread+0xa8/0xb8 kthread from ret_from_fork+0x14/0x38 Exception stack(0x8529ffb0 to 0x8529fff8) The task deadlocks because the lock is already held: ksmbd_session_rpc_open down_write(&sess->rpc_lock) ksmbd_rpc_open ksmbd_session_rpc_method down_read(&sess->rpc_lock) <-- deadlock Adjust ksmbd_session_rpc_method() callers to take the lock when necessary.
In the Linux kernel, the following vulnerability has been resolved: mfd: intel_soc_pmic_bxtwc: Use IRQ domain for USB Type-C device While design wise the idea of converting the driver to use the hierarchy of the IRQ chips is correct, the implementation has (inherited) flaws. This was unveiled when platform_get_irq() had started WARN() on IRQ 0 that is supposed to be a Linux IRQ number (also known as vIRQ). Rework the driver to respect IRQ domain when creating each MFD device separately, as the domain is not the same for all of them.
In the Linux kernel, the following vulnerability has been resolved: smb/client: avoid dereferencing rdata=NULL in smb2_new_read_req() This happens when called from SMB2_read() while using rdma and reaching the rdma_readwrite_threshold.
In the Linux kernel, the following vulnerability has been resolved: smb: client: let smbd_destroy() call disable_work_sync(&info->post_send_credits_work) In smbd_destroy() we may destroy the memory so we better wait until post_send_credits_work is no longer pending and will never be started again. I actually just hit the case using rxe: WARNING: CPU: 0 PID: 138 at drivers/infiniband/sw/rxe/rxe_verbs.c:1032 rxe_post_recv+0x1ee/0x480 [rdma_rxe] ... [ 5305.686979] [ T138] smbd_post_recv+0x445/0xc10 [cifs] [ 5305.687135] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687149] [ T138] ? __kasan_check_write+0x14/0x30 [ 5305.687185] [ T138] ? __pfx_smbd_post_recv+0x10/0x10 [cifs] [ 5305.687329] [ T138] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 5305.687356] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687368] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687378] [ T138] ? _raw_spin_unlock_irqrestore+0x11/0x60 [ 5305.687389] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687399] [ T138] ? get_receive_buffer+0x168/0x210 [cifs] [ 5305.687555] [ T138] smbd_post_send_credits+0x382/0x4b0 [cifs] [ 5305.687701] [ T138] ? __pfx_smbd_post_send_credits+0x10/0x10 [cifs] [ 5305.687855] [ T138] ? __pfx___schedule+0x10/0x10 [ 5305.687865] [ T138] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 5305.687875] [ T138] ? queue_delayed_work_on+0x8e/0xa0 [ 5305.687889] [ T138] process_one_work+0x629/0xf80 [ 5305.687908] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687917] [ T138] ? __kasan_check_write+0x14/0x30 [ 5305.687933] [ T138] worker_thread+0x87f/0x1570 ... It means rxe_post_recv was called after rdma_destroy_qp(). This happened because put_receive_buffer() was triggered by ib_drain_qp() and called: queue_work(info->workqueue, &info->post_send_credits_work);
In the Linux kernel, the following vulnerability has been resolved: spi: microchip-core-qspi: stop checking viability of op->max_freq in supports_op callback In commit 13529647743d9 ("spi: microchip-core-qspi: Support per spi-mem operation frequency switches") the logic for checking the viability of op->max_freq in mchp_coreqspi_setup_clock() was copied into mchp_coreqspi_supports_op(). Unfortunately, op->max_freq is not valid when this function is called during probe but is instead zero. Accordingly, baud_rate_val is calculated to be INT_MAX due to division by zero, causing probe of the attached memory device to fail. Seemingly spi-microchip-core-qspi was the only driver that had such a modification made to its supports_op callback when the per_op_freq capability was added, so just remove it to restore prior functionality.
In the Linux kernel, the following vulnerability has been resolved: drm/amd/display: Add null pointer check in mod_hdcp_hdcp1_create_session() The function mod_hdcp_hdcp1_create_session() calls the function get_first_active_display(), but does not check its return value. The return value is a null pointer if the display list is empty. This will lead to a null pointer dereference. Add a null pointer check for get_first_active_display() and return MOD_HDCP_STATUS_DISPLAY_NOT_FOUND if the function return null. This is similar to the commit c3e9826a2202 ("drm/amd/display: Add null pointer check for get_first_active_display()"). (cherry picked from commit 5e43eb3cd731649c4f8b9134f857be62a416c893)
In the Linux kernel, the following vulnerability has been resolved: dm-stripe: fix a possible integer overflow There's a possible integer overflow in stripe_io_hints if we have too large chunk size. Test if the overflow happened, and if it did, don't set limits->io_min and limits->io_opt;
In the Linux kernel, the following vulnerability has been resolved: unshare: fix unshare_fs() handling There's an unpleasant corner case in unshare(2), when we have a CLONE_NEWNS in flags and current->fs hadn't been shared at all; in that case copy_mnt_ns() gets passed current->fs instead of a private copy, which causes interesting warts in proof of correctness] > I guess if private means fs->users == 1, the condition could still be true. Unfortunately, it's worse than just a convoluted proof of correctness. Consider the case when we have CLONE_NEWCGROUP in addition to CLONE_NEWNS (and current->fs->users == 1). We pass current->fs to copy_mnt_ns(), all right. Suppose it succeeds and flips current->fs->{pwd,root} to corresponding locations in the new namespace. Now we proceed to copy_cgroup_ns(), which fails (e.g. with -ENOMEM). We call put_mnt_ns() on the namespace created by copy_mnt_ns(), it's destroyed and its mount tree is dissolved, but... current->fs->root and current->fs->pwd are both left pointing to now detached mounts. They are pinning those, so it's not a UAF, but it leaves the calling process with unshare(2) failing with -ENOMEM _and_ leaving it with pwd and root on detached isolated mounts. The last part is clearly a bug. There is other fun related to that mess (races with pivot_root(), including the one between pivot_root() and fork(), of all things), but this one is easy to isolate and fix - treat CLONE_NEWNS as "allocate a new fs_struct even if it hadn't been shared in the first place". Sure, we could go for something like "if both CLONE_NEWNS *and* one of the things that might end up failing after copy_mnt_ns() call in create_new_namespaces() are set, force allocation of new fs_struct", but let's keep it simple - the cost of copy_fs_struct() is trivial. Another benefit is that copy_mnt_ns() with CLONE_NEWNS *always* gets a freshly allocated fs_struct, yet to be attached to anything. That seriously simplifies the analysis... FWIW, that bug had been there since the introduction of unshare(2) ;-/
In the Linux kernel, the following vulnerability has been resolved: LoongArch: BPF: Fix jump offset calculation in tailcall The extra pass of bpf_int_jit_compile() skips JIT context initialization which essentially skips offset calculation leaving out_offset = -1, so the jmp_offset in emit_bpf_tail_call is calculated by "#define jmp_offset (out_offset - (cur_offset))" is a negative number, which is wrong. The final generated assembly are as follow. 54: bgeu $a2, $t1, -8 # 0x0000004c 58: addi.d $a6, $s5, -1 5c: bltz $a6, -16 # 0x0000004c 60: alsl.d $t2, $a2, $a1, 0x3 64: ld.d $t2, $t2, 264 68: beq $t2, $zero, -28 # 0x0000004c Before apply this patch, the follow test case will reveal soft lock issues. cd tools/testing/selftests/bpf/ ./test_progs --allow=tailcalls/tailcall_bpf2bpf_1 dmesg: watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_progs:25056]
In the Linux kernel, the following vulnerability has been resolved: netfilter: br_netfilter: do not check confirmed bit in br_nf_local_in() after confirm When send a broadcast packet to a tap device, which was added to a bridge, br_nf_local_in() is called to confirm the conntrack. If another conntrack with the same hash value is added to the hash table, which can be triggered by a normal packet to a non-bridge device, the below warning may happen. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 96 at net/bridge/br_netfilter_hooks.c:632 br_nf_local_in+0x168/0x200 CPU: 1 UID: 0 PID: 96 Comm: tap_send Not tainted 6.17.0-rc2-dirty #44 PREEMPT(voluntary) RIP: 0010:br_nf_local_in+0x168/0x200 Call Trace: <TASK> nf_hook_slow+0x3e/0xf0 br_pass_frame_up+0x103/0x180 br_handle_frame_finish+0x2de/0x5b0 br_nf_hook_thresh+0xc0/0x120 br_nf_pre_routing_finish+0x168/0x3a0 br_nf_pre_routing+0x237/0x5e0 br_handle_frame+0x1ec/0x3c0 __netif_receive_skb_core+0x225/0x1210 __netif_receive_skb_one_core+0x37/0xa0 netif_receive_skb+0x36/0x160 tun_get_user+0xa54/0x10c0 tun_chr_write_iter+0x65/0xb0 vfs_write+0x305/0x410 ksys_write+0x60/0xd0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> ---[ end trace 0000000000000000 ]--- To solve the hash conflict, nf_ct_resolve_clash() try to merge the conntracks, and update skb->_nfct. However, br_nf_local_in() still use the old ct from local variable 'nfct' after confirm(), which leads to this warning. If confirm() does not insert the conntrack entry and return NF_DROP, the warning may also occur. There is no need to reserve the WARN_ON_ONCE, just remove it.
In the Linux kernel, the following vulnerability has been resolved: can: j1939: implement NETDEV_UNREGISTER notification handler syzbot is reporting unregister_netdevice: waiting for vcan0 to become free. Usage count = 2 problem, for j1939 protocol did not have NETDEV_UNREGISTER notification handler for undoing changes made by j1939_sk_bind(). Commit 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback") expects that a call to j1939_priv_put() can be unconditionally delayed until j1939_sk_sock_destruct() is called. But we need to call j1939_priv_put() against an extra ref held by j1939_sk_bind() call (as a part of undoing changes made by j1939_sk_bind()) as soon as NETDEV_UNREGISTER notification fires (i.e. before j1939_sk_sock_destruct() is called via j1939_sk_release()). Otherwise, the extra ref on "struct j1939_priv" held by j1939_sk_bind() call prevents "struct net_device" from dropping the usage count to 1; making it impossible for unregister_netdevice() to continue. [mkl: remove space in front of label]
In the Linux kernel, the following vulnerability has been resolved: net/sched: Fix backlog accounting in qdisc_dequeue_internal This issue applies for the following qdiscs: hhf, fq, fq_codel, and fq_pie, and occurs in their change handlers when adjusting to the new limit. The problem is the following in the values passed to the subsequent qdisc_tree_reduce_backlog call given a tbf parent: When the tbf parent runs out of tokens, skbs of these qdiscs will be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued, which accounts for both qlen and backlog. However, in the case of qdisc_dequeue_internal, ONLY qlen is accounted for when pulling from gso_skb. This means that these qdiscs are missing a qdisc_qstats_backlog_dec when dropping packets to satisfy the new limit in their change handlers. One can observe this issue with the following (with tc patched to support a limit of 0): export TARGET=fq tc qdisc del dev lo root tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000 echo ''; echo 'add child'; tc -s -d qdisc show dev lo ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null echo ''; echo 'after ping'; tc -s -d qdisc show dev lo tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0 echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo tc qdisc replace dev lo handle 2: parent 1:1 sfq echo ''; echo 'post graft'; tc -s -d qdisc show dev lo The second to last show command shows 0 packets but a positive number (74) of backlog bytes. The problem becomes clearer in the last show command, where qdisc_purge_queue triggers qdisc_tree_reduce_backlog with the positive backlog and causes an underflow in the tbf parent's backlog (4096 Mb instead of 0). To fix this issue, the codepath for all clients of qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq, fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog adjustments for all cases that do not directly use the dequeue handler. The old fq_codel_change limit adjustment loop accumulated the arguments to the subsequent qdisc_tree_reduce_backlog call through the cstats field. However, this is confusing and error prone as fq_codel_dequeue could also potentially mutate this field (which qdisc_dequeue_internal calls in the non gso_skb case), so we have unified the code here with other qdiscs.
In the Linux kernel, the following vulnerability has been resolved: crypto: x86/aegis - Add missing error checks The skcipher_walk functions can allocate memory and can fail, so checking for errors is necessary.
In the Linux kernel, the following vulnerability has been resolved: ALSA: timer: fix ida_free call while not allocated In the snd_utimer_create() function, if the kasprintf() function return NULL, snd_utimer_put_id() will be called, finally use ida_free() to free the unallocated id 0. the syzkaller reported the following information: ------------[ cut here ]------------ ida_free called for id=0 which is not allocated. WARNING: CPU: 1 PID: 1286 at lib/idr.c:592 ida_free+0x1fd/0x2f0 lib/idr.c:592 Modules linked in: CPU: 1 UID: 0 PID: 1286 Comm: syz-executor164 Not tainted 6.15.8 #3 PREEMPT(lazy) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014 RIP: 0010:ida_free+0x1fd/0x2f0 lib/idr.c:592 Code: f8 fc 41 83 fc 3e 76 69 e8 70 b2 f8 (...) RSP: 0018:ffffc900007f79c8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 1ffff920000fef3b RCX: ffffffff872176a5 RDX: ffff88800369d200 RSI: 0000000000000000 RDI: ffff88800369d200 RBP: 0000000000000000 R08: ffffffff87ba60a5 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f6f1abc1740(0000) GS:ffff8880d76a0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6f1ad7a784 CR3: 000000007a6e2000 CR4: 00000000000006f0 Call Trace: <TASK> snd_utimer_put_id sound/core/timer.c:2043 [inline] [snd_timer] snd_utimer_create+0x59b/0x6a0 sound/core/timer.c:2184 [snd_timer] snd_utimer_ioctl_create sound/core/timer.c:2202 [inline] [snd_timer] __snd_timer_user_ioctl.isra.0+0x724/0x1340 sound/core/timer.c:2287 [snd_timer] snd_timer_user_ioctl+0x75/0xc0 sound/core/timer.c:2298 [snd_timer] vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __x64_sys_ioctl+0x198/0x200 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x7b/0x160 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] The utimer->id should be set properly before the kasprintf() function, ensures the snd_utimer_put_id() function will free the allocated id.
In the Linux kernel, the following vulnerability has been resolved: drm/hisilicon/hibmc: fix irq_request()'s irq name variable is local The local variable is passed in request_irq (), and there will be use after free problem, which will make request_irq failed. Using the global irq name instead of it to fix.
In the Linux kernel, the following vulnerability has been resolved: net/tcp: Fix socket memory leak in TCP-AO failure handling for IPv6 When tcp_ao_copy_all_matching() fails in tcp_v6_syn_recv_sock() it just exits the function. This ends up causing a memory-leak: unreferenced object 0xffff0000281a8200 (size 2496): comm "softirq", pid 0, jiffies 4295174684 hex dump (first 32 bytes): 7f 00 00 06 7f 00 00 06 00 00 00 00 cb a8 88 13 ................ 0a 00 03 61 00 00 00 00 00 00 00 00 00 00 00 00 ...a............ backtrace (crc 5ebdbe15): kmemleak_alloc+0x44/0xe0 kmem_cache_alloc_noprof+0x248/0x470 sk_prot_alloc+0x48/0x120 sk_clone_lock+0x38/0x3b0 inet_csk_clone_lock+0x34/0x150 tcp_create_openreq_child+0x3c/0x4a8 tcp_v6_syn_recv_sock+0x1c0/0x620 tcp_check_req+0x588/0x790 tcp_v6_rcv+0x5d0/0xc18 ip6_protocol_deliver_rcu+0x2d8/0x4c0 ip6_input_finish+0x74/0x148 ip6_input+0x50/0x118 ip6_sublist_rcv+0x2fc/0x3b0 ipv6_list_rcv+0x114/0x170 __netif_receive_skb_list_core+0x16c/0x200 netif_receive_skb_list_internal+0x1f0/0x2d0 This is because in tcp_v6_syn_recv_sock (and the IPv4 counterpart), when exiting upon error, inet_csk_prepare_forced_close() and tcp_done() need to be called. They make sure the newsk will end up being correctly free'd. tcp_v4_syn_recv_sock() makes this very clear by having the put_and_exit label that takes care of things. So, this patch here makes sure tcp_v4_syn_recv_sock and tcp_v6_syn_recv_sock have similar error-handling and thus fixes the leak for TCP-AO.
In the Linux kernel, the following vulnerability has been resolved: net: ethernet: ti: am65-cpsw-nuss: Fix null pointer dereference for ndev In the TX completion packet stage of TI SoCs with CPSW2G instance, which has single external ethernet port, ndev is accessed without being initialized if no TX packets have been processed. It results into null pointer dereference, causing kernel to crash. Fix this by having a check on the number of TX packets which have been processed.
In the Linux kernel, the following vulnerability has been resolved: net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete() A malicious USB device claiming to be a CDC Phonet modem can overflow the skb_shared_info->frags[] array by sending an unbounded sequence of full-page bulk transfers. Drop the skb and increment the length error when the frag limit is reached. This matches the same fix that commit f0813bcd2d9d ("net: wwan: t7xx: fix potential skb->frags overflow in RX path") did for the t7xx driver.
In the Linux kernel, the following vulnerability has been resolved: ALSA: hda: tas2781: Fix wrong reference of tasdevice_priv During the conversion to unify the calibration data management, the reference to tasdevice_priv was wrongly set to h->hda_priv instead of h->priv. This resulted in memory corruption and crashes eventually. Unfortunately it's a void pointer, hence the compiler couldn't know that it's wrong.
In the Linux kernel, the following vulnerability has been resolved: net: vlan: don't propagate flags on open With the device instance lock, there is now a possibility of a deadlock: [ 1.211455] ============================================ [ 1.211571] WARNING: possible recursive locking detected [ 1.211687] 6.14.0-rc5-01215-g032756b4ca7a-dirty #5 Not tainted [ 1.211823] -------------------------------------------- [ 1.211936] ip/184 is trying to acquire lock: [ 1.212032] ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_set_allmulti+0x4e/0xb0 [ 1.212207] [ 1.212207] but task is already holding lock: [ 1.212332] ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_open+0x50/0xb0 [ 1.212487] [ 1.212487] other info that might help us debug this: [ 1.212626] Possible unsafe locking scenario: [ 1.212626] [ 1.212751] CPU0 [ 1.212815] ---- [ 1.212871] lock(&dev->lock); [ 1.212944] lock(&dev->lock); [ 1.213016] [ 1.213016] *** DEADLOCK *** [ 1.213016] [ 1.213143] May be due to missing lock nesting notation [ 1.213143] [ 1.213294] 3 locks held by ip/184: [ 1.213371] #0: ffffffff838b53e0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x1b/0xa0 [ 1.213543] #1: ffffffff84e5fc70 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x37/0xa0 [ 1.213727] #2: ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_open+0x50/0xb0 [ 1.213895] [ 1.213895] stack backtrace: [ 1.213991] CPU: 0 UID: 0 PID: 184 Comm: ip Not tainted 6.14.0-rc5-01215-g032756b4ca7a-dirty #5 [ 1.213993] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 1.213994] Call Trace: [ 1.213995] <TASK> [ 1.213996] dump_stack_lvl+0x8e/0xd0 [ 1.214000] print_deadlock_bug+0x28b/0x2a0 [ 1.214020] lock_acquire+0xea/0x2a0 [ 1.214027] __mutex_lock+0xbf/0xd40 [ 1.214038] dev_set_allmulti+0x4e/0xb0 # real_dev->flags & IFF_ALLMULTI [ 1.214040] vlan_dev_open+0xa5/0x170 # ndo_open on vlandev [ 1.214042] __dev_open+0x145/0x270 [ 1.214046] __dev_change_flags+0xb0/0x1e0 [ 1.214051] netif_change_flags+0x22/0x60 # IFF_UP vlandev [ 1.214053] dev_change_flags+0x61/0xb0 # for each device in group from dev->vlan_info [ 1.214055] vlan_device_event+0x766/0x7c0 # on netdevsim0 [ 1.214058] notifier_call_chain+0x78/0x120 [ 1.214062] netif_open+0x6d/0x90 [ 1.214064] dev_open+0x5b/0xb0 # locks netdevsim0 [ 1.214066] bond_enslave+0x64c/0x1230 [ 1.214075] do_set_master+0x175/0x1e0 # on netdevsim0 [ 1.214077] do_setlink+0x516/0x13b0 [ 1.214094] rtnl_newlink+0xaba/0xb80 [ 1.214132] rtnetlink_rcv_msg+0x440/0x490 [ 1.214144] netlink_rcv_skb+0xeb/0x120 [ 1.214150] netlink_unicast+0x1f9/0x320 [ 1.214153] netlink_sendmsg+0x346/0x3f0 [ 1.214157] __sock_sendmsg+0x86/0xb0 [ 1.214160] ____sys_sendmsg+0x1c8/0x220 [ 1.214164] ___sys_sendmsg+0x28f/0x2d0 [ 1.214179] __x64_sys_sendmsg+0xef/0x140 [ 1.214184] do_syscall_64+0xec/0x1d0 [ 1.214190] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 1.214191] RIP: 0033:0x7f2d1b4a7e56 Device setup: netdevsim0 (down) ^ ^ bond netdevsim1.100@netdevsim1 allmulticast=on (down) When we enslave the lower device (netdevsim0) which has a vlan, we propagate vlan's allmuti/promisc flags during ndo_open. This causes (re)locking on of the real_dev. Propagate allmulti/promisc on flags change, not on the open. There is a slight semantics change that vlans that are down now propagate the flags, but this seems unlikely to result in the real issues. Reproducer: echo 0 1 > /sys/bus/netdevsim/new_device dev_path=$(ls -d /sys/bus/netdevsim/devices/netdevsim0/net/*) dev=$(echo $dev_path | rev | cut -d/ -f1 | rev) ip link set dev $dev name netdevsim0 ip link set dev netdevsim0 up ip link add link netdevsim0 name netdevsim0.100 type vlan id 100 ip link set dev netdevsim0.100 allm ---truncated---
In the Linux kernel, the following vulnerability has been resolved: parisc: Revise __get_user() to probe user read access Because of the way read access support is implemented, read access interruptions are only triggered at privilege levels 2 and 3. The kernel executes at privilege level 0, so __get_user() never triggers a read access interruption (code 26). Thus, it is currently possible for user code to access a read protected address via a system call. Fix this by probing read access rights at privilege level 3 (PRIV_USER) and setting __gu_err to -EFAULT (-14) if access isn't allowed. Note the cmpiclr instruction does a 32-bit compare because COND macro doesn't work inside asm.
In the Linux kernel, the following vulnerability has been resolved: ASoC: simple-card-utils: Don't use __free(device_node) at graph_util_parse_dai() commit 419d1918105e ("ASoC: simple-card-utils: use __free(device_node) for device node") uses __free(device_node) for dlc->of_node, but we need to keep it while driver is in use. Don't use __free(device_node) in graph_util_parse_dai().
In the Linux kernel, the following vulnerability has been resolved: gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops Clears up the warning added in 7ee3647243e5 ("migrate: Remove call to ->writepage") that occurs in various xfstests, causing "something found in dmesg" failures. [ 341.136573] gfs2_meta_aops does not implement migrate_folio [ 341.136953] WARNING: CPU: 1 PID: 36 at mm/migrate.c:944 move_to_new_folio+0x2f8/0x300
In the Linux kernel, the following vulnerability has been resolved: net, hsr: reject HSR frame if skb can't hold tag Receiving HSR frame with insufficient space to hold HSR tag in the skb can result in a crash (kernel BUG): [ 45.390915] skbuff: skb_under_panic: text:ffffffff86f32cac len:26 put:14 head:ffff888042418000 data:ffff888042417ff4 tail:0xe end:0x180 dev:bridge_slave_1 [ 45.392559] ------------[ cut here ]------------ [ 45.392912] kernel BUG at net/core/skbuff.c:211! [ 45.393276] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 45.393809] CPU: 1 UID: 0 PID: 2496 Comm: reproducer Not tainted 6.15.0 #12 PREEMPT(undef) [ 45.394433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 45.395273] RIP: 0010:skb_panic+0x15b/0x1d0 <snip registers, remove unreliable trace> [ 45.402911] Call Trace: [ 45.403105] <IRQ> [ 45.404470] skb_push+0xcd/0xf0 [ 45.404726] br_dev_queue_push_xmit+0x7c/0x6c0 [ 45.406513] br_forward_finish+0x128/0x260 [ 45.408483] __br_forward+0x42d/0x590 [ 45.409464] maybe_deliver+0x2eb/0x420 [ 45.409763] br_flood+0x174/0x4a0 [ 45.410030] br_handle_frame_finish+0xc7c/0x1bc0 [ 45.411618] br_handle_frame+0xac3/0x1230 [ 45.413674] __netif_receive_skb_core.constprop.0+0x808/0x3df0 [ 45.422966] __netif_receive_skb_one_core+0xb4/0x1f0 [ 45.424478] __netif_receive_skb+0x22/0x170 [ 45.424806] process_backlog+0x242/0x6d0 [ 45.425116] __napi_poll+0xbb/0x630 [ 45.425394] net_rx_action+0x4d1/0xcc0 [ 45.427613] handle_softirqs+0x1a4/0x580 [ 45.427926] do_softirq+0x74/0x90 [ 45.428196] </IRQ> This issue was found by syzkaller. The panic happens in br_dev_queue_push_xmit() once it receives a corrupted skb with ETH header already pushed in linear data. When it attempts the skb_push() call, there's not enough headroom and skb_push() panics. The corrupted skb is put on the queue by HSR layer, which makes a sequence of unintended transformations when it receives a specific corrupted HSR frame (with incomplete TAG). Fix it by dropping and consuming frames that are not long enough to contain both ethernet and hsr headers. Alternative fix would be to check for enough headroom before skb_push() in br_dev_queue_push_xmit(). In the reproducer, this is injected via AF_PACKET, but I don't easily see why it couldn't be sent over the wire from adjacent network. Further Details: In the reproducer, the following network interface chain is set up: ┌────────────────┐ ┌────────────────┐ │ veth0_to_hsr ├───┤ hsr_slave0 ┼───┐ └────────────────┘ └────────────────┘ │ │ ┌──────┐ ├─┤ hsr0 ├───┐ │ └──────┘ │ ┌────────────────┐ ┌────────────────┐ │ │┌────────┐ │ veth1_to_hsr ┼───┤ hsr_slave1 ├───┘ └┤ │ └────────────────┘ └────────────────┘ ┌┼ bridge │ ││ │ │└────────┘ │ ┌───────┐ │ │ ... ├──────┘ └───────┘ To trigger the events leading up to crash, reproducer sends a corrupted HSR fr ---truncated---
A use-after-free flaw was found in reconn_set_ipaddr_from_hostname in fs/cifs/connect.c in the Linux kernel. The issue occurs when it forgets to set the free pointer server->hostname to NULL, leading to an invalid pointer request.
A NULL pointer dereference was found in io_file_bitmap_get in io_uring/filetable.c in the io_uring sub-component in the Linux Kernel. When fixed files are unregistered, some context information (file_alloc_{start,end} and alloc_hint) is not cleared. A subsequent request that has auto index selection enabled via IORING_FILE_INDEX_ALLOC can cause a NULL pointer dereference. An unprivileged user can use the flaw to cause a system crash.
In the Linux kernel, the following vulnerability has been resolved: x86/mce: use is_copy_from_user() to determine copy-from-user context Patch series "mm/hwpoison: Fix regressions in memory failure handling", v4. ## 1. What am I trying to do: This patchset resolves two critical regressions related to memory failure handling that have appeared in the upstream kernel since version 5.17, as compared to 5.10 LTS. - copyin case: poison found in user page while kernel copying from user space - instr case: poison found while instruction fetching in user space ## 2. What is the expected outcome and why - For copyin case: Kernel can recover from poison found where kernel is doing get_user() or copy_from_user() if those places get an error return and the kernel return -EFAULT to the process instead of crashing. More specifily, MCE handler checks the fixup handler type to decide whether an in kernel #MC can be recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space. - For instr case: If a poison found while instruction fetching in user space, full recovery is possible. User process takes #PF, Linux allocates a new page and fills by reading from storage. ## 3. What actually happens and why - For copyin case: kernel panic since v5.17 Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the extable fixup type for copy-from-user operations, changing it from EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS handling when posion found in get_user() or copy_from_user(). - For instr case: user process is killed by a SIGBUS signal due to #CMCI and #MCE race When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. ### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] Prior to Icelake memory controllers reported patrol scrub events that detected a previously unseen uncorrected error in memory by signaling a broadcast machine check with an SRAO (Software Recoverable Action Optional) signature in the machine check bank. This was overkill because it's not an urgent problem that no core is on the verge of consuming that bad data. It's also found that multi SRAO UCE may cause nested MCE interrupts and finally become an IERR. Hence, Intel downgrades the machine check bank signature of patrol scrub from SRAO to UCNA (Uncorrected, No Action required), and signal changed to #CMCI. Just to add to the confusion, Linux does take an action (in uc_decode_notifier()) to try to offline the page despite the UC*NA* signature name. ### Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] Having decided that CMCI/UCNA is the best action for patrol scrub errors, the memory controller uses it for reads too. But the memory controller is executing asynchronously from the core, and can't tell the difference between a "real" read and a speculative read. So it will do CMCI/UCNA if an error is found in any read. Thus: 1) Core is clever and thinks address A is needed soon, issues a speculative read. 2) Core finds it is going to use address A soon after sending the read request 3) The CMCI from the memory controller is in a race with MCE from the core that will soon try to retire the load from address A. Quite often (because speculation has got better) the CMCI from the memory controller is delivered before the core is committed to the instruction reading address A, so the interrupt is taken, and Linux offlines the page (marking it as poison). ## Why user process is killed for instr case Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not ---truncated---
In the Linux kernel, the following vulnerability has been resolved: smb: client: let recv_done verify data_offset, data_length and remaining_data_length This is inspired by the related server fixes.
In the Linux kernel, the following vulnerability has been resolved: vxlan: Fix NPD in {arp,neigh}_reduce() when using nexthop objects When the "proxy" option is enabled on a VXLAN device, the device will suppress ARP requests and IPv6 Neighbor Solicitation messages if it is able to reply on behalf of the remote host. That is, if a matching and valid neighbor entry is configured on the VXLAN device whose MAC address is not behind the "any" remote (0.0.0.0 / ::). The code currently assumes that the FDB entry for the neighbor's MAC address points to a valid remote destination, but this is incorrect if the entry is associated with an FDB nexthop group. This can result in a NPD [1][3] which can be reproduced using [2][4]. Fix by checking that the remote destination exists before dereferencing it. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] CPU: 4 UID: 0 PID: 365 Comm: arping Not tainted 6.17.0-rc2-virtme-g2a89cb21162c #2 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014 RIP: 0010:vxlan_xmit+0xb58/0x15f0 [...] Call Trace: <TASK> dev_hard_start_xmit+0x5d/0x1c0 __dev_queue_xmit+0x246/0xfd0 packet_sendmsg+0x113a/0x1850 __sock_sendmsg+0x38/0x70 __sys_sendto+0x126/0x180 __x64_sys_sendto+0x24/0x30 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] #!/bin/bash ip address add 192.0.2.1/32 dev lo ip nexthop add id 1 via 192.0.2.2 fdb ip nexthop add id 10 group 1 fdb ip link add name vx0 up type vxlan id 10010 local 192.0.2.1 dstport 4789 proxy ip neigh add 192.0.2.3 lladdr 00:11:22:33:44:55 nud perm dev vx0 bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10 arping -b -c 1 -s 192.0.2.1 -I vx0 192.0.2.3 [3] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] CPU: 13 UID: 0 PID: 372 Comm: ndisc6 Not tainted 6.17.0-rc2-virtmne-g6ee90cb26014 #3 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1v996), BIOS 1.17.0-4.fc41 04/01/2x014 RIP: 0010:vxlan_xmit+0x803/0x1600 [...] Call Trace: <TASK> dev_hard_start_xmit+0x5d/0x1c0 __dev_queue_xmit+0x246/0xfd0 ip6_finish_output2+0x210/0x6c0 ip6_finish_output+0x1af/0x2b0 ip6_mr_output+0x92/0x3e0 ip6_send_skb+0x30/0x90 rawv6_sendmsg+0xe6e/0x12e0 __sock_sendmsg+0x38/0x70 __sys_sendto+0x126/0x180 __x64_sys_sendto+0x24/0x30 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f383422ec77 [4] #!/bin/bash ip address add 2001:db8:1::1/128 dev lo ip nexthop add id 1 via 2001:db8:1::1 fdb ip nexthop add id 10 group 1 fdb ip link add name vx0 up type vxlan id 10010 local 2001:db8:1::1 dstport 4789 proxy ip neigh add 2001:db8:1::3 lladdr 00:11:22:33:44:55 nud perm dev vx0 bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10 ndisc6 -r 1 -s 2001:db8:1::1 -w 1 2001:db8:1::3 vx0
In the Linux kernel, the following vulnerability has been resolved: mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup() A soft lockup warning was observed on a relative small system x86-64 system with 16 GB of memory when running a debug kernel with kmemleak enabled. watchdog: BUG: soft lockup - CPU#8 stuck for 33s! [kworker/8:1:134] The test system was running a workload with hot unplug happening in parallel. Then kemleak decided to disable itself due to its inability to allocate more kmemleak objects. The debug kernel has its CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE set to 40,000. The soft lockup happened in kmemleak_do_cleanup() when the existing kmemleak objects were being removed and deleted one-by-one in a loop via a workqueue. In this particular case, there are at least 40,000 objects that need to be processed and given the slowness of a debug kernel and the fact that a raw_spinlock has to be acquired and released in __delete_object(), it could take a while to properly handle all these objects. As kmemleak has been disabled in this case, the object removal and deletion process can be further optimized as locking isn't really needed. However, it is probably not worth the effort to optimize for such an edge case that should rarely happen. So the simple solution is to call cond_resched() at periodic interval in the iteration loop to avoid soft lockup.
The WRMSR processing functionality in the KVM subsystem in the Linux kernel through 3.17.2 does not properly handle the writing of a non-canonical address to a model-specific register, which allows guest OS users to cause a denial of service (host OS crash) by leveraging guest OS privileges, related to the wrmsr_interception function in arch/x86/kvm/svm.c and the handle_wrmsr function in arch/x86/kvm/vmx.c.
In the Linux kernel, the following vulnerability has been resolved: smb: client: fix smbdirect_recv_io leak in smbd_negotiate() error path During tests of another unrelated patch I was able to trigger this error: Objects remaining on __kmem_cache_shutdown()
In nf_tables_updtable, if nf_tables_table_enable returns an error, nft_trans_destroy is called to free the transaction object. nft_trans_destroy() calls list_del(), but the transaction was never placed on a list -- the list head is all zeroes, this results in a NULL pointer dereference.
In the Linux kernel, the following vulnerability has been resolved: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock When netpoll is enabled, calling pr_warn_once() while holding kmemleak_lock in mem_pool_alloc() can cause a deadlock due to lock inversion with the netconsole subsystem. This occurs because pr_warn_once() may trigger netpoll, which eventually leads to __alloc_skb() and back into kmemleak code, attempting to reacquire kmemleak_lock. This is the path for the deadlock. mem_pool_alloc() -> raw_spin_lock_irqsave(&kmemleak_lock, flags); -> pr_warn_once() -> netconsole subsystem -> netpoll -> __alloc_skb -> __create_object -> raw_spin_lock_irqsave(&kmemleak_lock, flags); Fix this by setting a flag and issuing the pr_warn_once() after kmemleak_lock is released.
In the Linux kernel, the following vulnerability has been resolved: net: phy: transfer phy_config_inband() locking responsibility to phylink Problem description =================== Lockdep reports a possible circular locking dependency (AB/BA) between &pl->state_mutex and &phy->lock, as follows. phylink_resolve() // acquires &pl->state_mutex -> phylink_major_config() -> phy_config_inband() // acquires &pl->phydev->lock whereas all the other call sites where &pl->state_mutex and &pl->phydev->lock have the locking scheme reversed. Everywhere else, &pl->phydev->lock is acquired at the top level, and &pl->state_mutex at the lower level. A clear example is phylink_bringup_phy(). The outlier is the newly introduced phy_config_inband() and the existing lock order is the correct one. To understand why it cannot be the other way around, it is sufficient to consider phylink_phy_change(), phylink's callback from the PHY device's phy->phy_link_change() virtual method, invoked by the PHY state machine. phy_link_up() and phy_link_down(), the (indirect) callers of phylink_phy_change(), are called with &phydev->lock acquired. Then phylink_phy_change() acquires its own &pl->state_mutex, to serialize changes made to its pl->phy_state and pl->link_config. So all other instances of &pl->state_mutex and &phydev->lock must be consistent with this order. Problem impact ============== I think the kernel runs a serious deadlock risk if an existing phylink_resolve() thread, which results in a phy_config_inband() call, is concurrent with a phy_link_up() or phy_link_down() call, which will deadlock on &pl->state_mutex in phylink_phy_change(). Practically speaking, the impact may be limited by the slow speed of the medium auto-negotiation protocol, which makes it unlikely for the current state to still be unresolved when a new one is detected, but I think the problem is there. Nonetheless, the problem was discovered using lockdep. Proposed solution ================= Practically speaking, the phy_config_inband() requirement of having phydev->lock acquired must transfer to the caller (phylink is the only caller). There, it must bubble up until immediately before &pl->state_mutex is acquired, for the cases where that takes place. Solution details, considerations, notes ======================================= This is the phy_config_inband() call graph: sfp_upstream_ops :: connect_phy() | v phylink_sfp_connect_phy() | v phylink_sfp_config_phy() | | sfp_upstream_ops :: module_insert() | | | v | phylink_sfp_module_insert() | | | | sfp_upstream_ops :: module_start() | | | | | v | | phylink_sfp_module_start() | | | | v v | phylink_sfp_config_optical() phylink_start() | | | phylink_resume() v v | | phylink_sfp_set_config() | | | v v v phylink_mac_initial_config() | phylink_resolve() | | phylink_ethtool_ksettings_set() v v v phylink_major_config() | v phy_config_inband() phylink_major_config() caller #1, phylink_mac_initial_config(), does not acquire &pl->state_mutex nor do its callers. It must acquire &pl->phydev->lock prior to calling phylink_major_config(). phylink_major_config() caller #2, phylink_resolve() acquires &pl->state_mutex, thus also needs to acquire &pl->phydev->lock. phylink_major_config() caller #3, phylink_ethtool_ksettings_set(), is completely uninteresting, because it only call ---truncated---
In the Linux kernel, the following vulnerability has been resolved: media: venus: protect against spurious interrupts during probe Make sure the interrupt handler is initialized before the interrupt is registered. If the IRQ is registered before hfi_create(), it's possible that an interrupt fires before the handler setup is complete, leading to a NULL dereference. This error condition has been observed during system boot on Rb3Gen2.
In the Linux kernel, the following vulnerability has been resolved: ASoC: amd: acp: Fix incorrect retrival of acp_chip_info Use dev_get_drvdata(dev->parent) instead of dev_get_platdata(dev) to correctly obtain acp_chip_info members in the acp I2S driver. Previously, some members were not updated properly due to incorrect data access, which could potentially lead to null pointer dereferences. This issue was missed in the earlier commit ("ASoC: amd: acp: Fix NULL pointer deref in acp_i2s_set_tdm_slot"), which only addressed set_tdm_slot(). This change ensures that all relevant functions correctly retrieve acp_chip_info, preventing further null pointer dereference issues.
In the Linux kernel, the following vulnerability has been resolved: drm/amd/display: Avoid a NULL pointer dereference [WHY] Although unlikely drm_atomic_get_new_connector_state() or drm_atomic_get_old_connector_state() can return NULL. [HOW] Check returns before dereference. (cherry picked from commit 1e5e8d672fec9f2ab352be121be971877bff2af9)
In the Linux kernel, the following vulnerability has been resolved: crypto: caam - Prevent crash on suspend with iMX8QM / iMX8ULP Since the CAAM on these SoCs is managed by another ARM core, called the SECO (Security Controller) on iMX8QM and Secure Enclave on iMX8ULP, which also reserves access to register page 0 suspend operations cannot touch this page. This is similar to when running OPTEE, where OPTEE will reserve page 0. Track this situation using a new state variable no_page0, reflecting if page 0 is reserved elsewhere, either by other management cores in SoC or by OPTEE. Replace the optee_en check in suspend/resume with the new check. optee_en cannot go away as it's needed elsewhere to gate OPTEE specific situations. Fixes the following splat at suspend: Internal error: synchronous external abort: 0000000096000010 [#1] SMP Hardware name: Freescale i.MX8QXP ACU6C (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : readl+0x0/0x18 lr : rd_reg32+0x18/0x3c sp : ffffffc08192ba20 x29: ffffffc08192ba20 x28: ffffff8025190000 x27: 0000000000000000 x26: ffffffc0808ae808 x25: ffffffc080922338 x24: ffffff8020e89090 x23: 0000000000000000 x22: ffffffc080922000 x21: ffffff8020e89010 x20: ffffffc080387ef8 x19: ffffff8020e89010 x18: 000000005d8000d5 x17: 0000000030f35963 x16: 000000008f785f3f x15: 000000003b8ef57c x14: 00000000c418aef8 x13: 00000000f5fea526 x12: 0000000000000001 x11: 0000000000000002 x10: 0000000000000001 x9 : 0000000000000000 x8 : ffffff8025190870 x7 : ffffff8021726880 x6 : 0000000000000002 x5 : ffffff80217268f0 x4 : ffffff8021726880 x3 : ffffffc081200000 x2 : 0000000000000001 x1 : ffffff8020e89010 x0 : ffffffc081200004 Call trace: readl+0x0/0x18 caam_ctrl_suspend+0x30/0xdc dpm_run_callback.constprop.0+0x24/0x5c device_suspend+0x170/0x2e8 dpm_suspend+0xa0/0x104 dpm_suspend_start+0x48/0x50 suspend_devices_and_enter+0x7c/0x45c pm_suspend+0x148/0x160 state_store+0xb4/0xf8 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x38/0x48 kernfs_fop_write_iter+0xb4/0x178 vfs_write+0x118/0x178 ksys_write+0x6c/0xd0 __arm64_sys_write+0x14/0x1c invoke_syscall.constprop.0+0x64/0xb0 do_el0_svc+0x90/0xb0 el0_svc+0x18/0x44 el0t_64_sync_handler+0x88/0x124 el0t_64_sync+0x150/0x154 Code: 88dffc21 88dffc21 5ac00800 d65f03c0 (b9400000)
In the Linux kernel, the following vulnerability has been resolved: net_sched: gen_estimator: fix est_timer() vs CONFIG_PREEMPT_RT=y syzbot reported a WARNING in est_timer() [1] Problem here is that with CONFIG_PREEMPT_RT=y, timer callbacks can be preempted. Adopt preempt_disable_nested()/preempt_enable_nested() to fix this. [1] WARNING: CPU: 0 PID: 16 at ./include/linux/seqlock.h:221 __seqprop_assert include/linux/seqlock.h:221 [inline] WARNING: CPU: 0 PID: 16 at ./include/linux/seqlock.h:221 est_timer+0x6dc/0x9f0 net/core/gen_estimator.c:93 Modules linked in: CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:__seqprop_assert include/linux/seqlock.h:221 [inline] RIP: 0010:est_timer+0x6dc/0x9f0 net/core/gen_estimator.c:93 Call Trace: <TASK> call_timer_fn+0x17e/0x5f0 kernel/time/timer.c:1747 expire_timers kernel/time/timer.c:1798 [inline] __run_timers kernel/time/timer.c:2372 [inline] __run_timer_base+0x648/0x970 kernel/time/timer.c:2384 run_timer_base kernel/time/timer.c:2393 [inline] run_timer_softirq+0xb7/0x180 kernel/time/timer.c:2403 handle_softirqs+0x22c/0x710 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] run_ktimerd+0xcf/0x190 kernel/softirq.c:1043 smpboot_thread_fn+0x53f/0xa60 kernel/smpboot.c:160 kthread+0x70e/0x8a0 kernel/kthread.c:463 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK>
In the Linux kernel, the following vulnerability has been resolved: rcutorture: Fix rcutorture_one_extend_check() splat in RT kernels For built with CONFIG_PREEMPT_RT=y kernels, running rcutorture tests resulted in the following splat: [ 68.797425] rcutorture_one_extend_check during change: Current 0x1 To add 0x1 To remove 0x0 preempt_count() 0x0 [ 68.797533] WARNING: CPU: 2 PID: 512 at kernel/rcu/rcutorture.c:1993 rcutorture_one_extend_check+0x419/0x560 [rcutorture] [ 68.797601] Call Trace: [ 68.797602] <TASK> [ 68.797619] ? lockdep_softirqs_off+0xa5/0x160 [ 68.797631] rcutorture_one_extend+0x18e/0xcc0 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c] [ 68.797646] ? local_clock+0x19/0x40 [ 68.797659] rcu_torture_one_read+0xf0/0x280 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c] [ 68.797678] ? __pfx_rcu_torture_one_read+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c] [ 68.797804] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c] [ 68.797815] rcu-torture: rcu_torture_reader task started [ 68.797824] rcu-torture: Creating rcu_torture_reader task [ 68.797824] rcu_torture_reader+0x238/0x580 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c] [ 68.797836] ? kvm_sched_clock_read+0x15/0x30 Disable BH does not change the SOFTIRQ corresponding bits in preempt_count() for RT kernels, this commit therefore use softirq_count() to check the if BH is disabled.
In the Linux kernel, the following vulnerability has been resolved: net: dev_ioctl: take ops lock in hwtstamp lower paths ndo hwtstamp callbacks are expected to run under the per-device ops lock. Make the lower get/set paths consistent with the rest of ndo invocations. Kernel log: WARNING: CPU: 13 PID: 51364 at ./include/net/netdev_lock.h:70 __netdev_update_features+0x4bd/0xe60 ... RIP: 0010:__netdev_update_features+0x4bd/0xe60 ... Call Trace: <TASK> netdev_update_features+0x1f/0x60 mlx5_hwtstamp_set+0x181/0x290 [mlx5_core] mlx5e_hwtstamp_set+0x19/0x30 [mlx5_core] dev_set_hwtstamp_phylib+0x9f/0x220 dev_set_hwtstamp_phylib+0x9f/0x220 dev_set_hwtstamp+0x13d/0x240 dev_ioctl+0x12f/0x4b0 sock_ioctl+0x171/0x370 __x64_sys_ioctl+0x3f7/0x900 ? __sys_setsockopt+0x69/0xb0 do_syscall_64+0x6f/0x2e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> .... ---[ end trace 0000000000000000 ]--- Note that the mlx5_hwtstamp_set and mlx5e_hwtstamp_set functions shown in the trace come from an in progress patch converting the legacy ioctl to ndo_hwtstamp_get/set and are not present in mainline.
In the Linux kernel, the following vulnerability has been resolved: sched/ext: Fix invalid task state transitions on class switch When enabling a sched_ext scheduler, we may trigger invalid task state transitions, resulting in warnings like the following (which can be easily reproduced by running the hotplug selftest in a loop): sched_ext: Invalid task state transition 0 -> 3 for fish[770] WARNING: CPU: 18 PID: 787 at kernel/sched/ext.c:3862 scx_set_task_state+0x7c/0xc0 ... RIP: 0010:scx_set_task_state+0x7c/0xc0 ... Call Trace: <TASK> scx_enable_task+0x11f/0x2e0 switching_to_scx+0x24/0x110 scx_enable.isra.0+0xd14/0x13d0 bpf_struct_ops_link_create+0x136/0x1a0 __sys_bpf+0x1edd/0x2c30 __x64_sys_bpf+0x21/0x30 do_syscall_64+0xbb/0x370 entry_SYSCALL_64_after_hwframe+0x77/0x7f This happens because we skip initialization for tasks that are already dead (with their usage counter set to zero), but we don't exclude them during the scheduling class transition phase. Fix this by also skipping dead tasks during class swiching, preventing invalid task state transitions.
In the Linux kernel, the following vulnerability has been resolved: erofs: fix invalid algorithm for encoded extents The current algorithm sanity checks do not properly apply to new encoded extents. Unify the algorithm check with Z_EROFS_COMPRESSION(_RUNTIME)_MAX and ensure consistency with sbi->available_compr_algs.