In the Linux kernel, the following vulnerability has been resolved: net: marvell: prestera: fix port event handling on init For some reason there might be a crash during ports creation if port events are handling at the same time because fw may send initial port event with down state. The crash points to cancel_delayed_work() which is called when port went is down. Currently I did not find out the real cause of the issue, so fixed it by cancel port stats work only if previous port's state was up & runnig. The following is the crash which can be triggered: [ 28.311104] Unable to handle kernel paging request at virtual address 000071775f776600 [ 28.319097] Mem abort info: [ 28.321914] ESR = 0x96000004 [ 28.324996] EC = 0x25: DABT (current EL), IL = 32 bits [ 28.330350] SET = 0, FnV = 0 [ 28.333430] EA = 0, S1PTW = 0 [ 28.336597] Data abort info: [ 28.339499] ISV = 0, ISS = 0x00000004 [ 28.343362] CM = 0, WnR = 0 [ 28.346354] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100bf7000 [ 28.352842] [000071775f776600] pgd=0000000000000000, p4d=0000000000000000 [ 28.359695] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 28.365310] Modules linked in: prestera_pci(+) prestera uio_pdrv_genirq [ 28.372005] CPU: 0 PID: 1291 Comm: kworker/0:1H Not tainted 5.11.0-rc4 #1 [ 28.378846] Hardware name: DNI AmazonGo1 A7040 board (DT) [ 28.384283] Workqueue: prestera_fw_wq prestera_fw_evt_work_fn [prestera_pci] [ 28.391413] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--) [ 28.397468] pc : get_work_pool+0x48/0x60 [ 28.401442] lr : try_to_grab_pending+0x6c/0x1b0 [ 28.406018] sp : ffff80001391bc60 [ 28.409358] x29: ffff80001391bc60 x28: 0000000000000000 [ 28.414725] x27: ffff000104fc8b40 x26: ffff80001127de88 [ 28.420089] x25: 0000000000000000 x24: ffff000106119760 [ 28.425452] x23: ffff00010775dd60 x22: ffff00010567e000 [ 28.430814] x21: 0000000000000000 x20: ffff80001391bcb0 [ 28.436175] x19: ffff00010775deb8 x18: 00000000000000c0 [ 28.441537] x17: 0000000000000000 x16: 000000008d9b0e88 [ 28.446898] x15: 0000000000000001 x14: 00000000000002ba [ 28.452261] x13: 80a3002c00000002 x12: 00000000000005f4 [ 28.457622] x11: 0000000000000030 x10: 000000000000000c [ 28.462985] x9 : 000000000000000c x8 : 0000000000000030 [ 28.468346] x7 : ffff800014400000 x6 : ffff000106119758 [ 28.473708] x5 : 0000000000000003 x4 : ffff00010775dc60 [ 28.479068] x3 : 0000000000000000 x2 : 0000000000000060 [ 28.484429] x1 : 000071775f776600 x0 : ffff00010775deb8 [ 28.489791] Call trace: [ 28.492259] get_work_pool+0x48/0x60 [ 28.495874] cancel_delayed_work+0x38/0xb0 [ 28.500011] prestera_port_handle_event+0x90/0xa0 [prestera] [ 28.505743] prestera_evt_recv+0x98/0xe0 [prestera] [ 28.510683] prestera_fw_evt_work_fn+0x180/0x228 [prestera_pci] [ 28.516660] process_one_work+0x1e8/0x360 [ 28.520710] worker_thread+0x44/0x480 [ 28.524412] kthread+0x154/0x160 [ 28.527670] ret_from_fork+0x10/0x38 [ 28.531290] Code: a8c17bfd d50323bf d65f03c0 9278dc21 (f9400020) [ 28.537429] ---[ end trace 5eced933df3a080b ]---
A locally locally exploitable DOS vulnerability was found in pax-linux versions 2.6.32.33-test79.patch, 2.6.38-test3.patch, and 2.6.37.4-test14.patch. A bad bounds check in arch_get_unmapped_area_topdown triggered by programs doing an mmap after a MAP_GROWSDOWN mmap will create an infinite loop condition without releasing the VM semaphore eventually leading to a system crash.
mm/huge_memory.c in the Linux kernel before 2.6.38-rc5 does not prevent creation of a transparent huge page (THP) during the existence of a temporary stack for an exec system call, which allows local users to cause a denial of service (memory consumption) or possibly have unspecified other impact via a crafted application.
The epoll implementation in the Linux kernel 2.6.37.2 and earlier does not properly traverse a tree of epoll file descriptors, which allows local users to cause a denial of service (CPU consumption) via a crafted application that makes epoll_create and epoll_ctl system calls.
In the Linux kernel, the following vulnerability has been resolved: clk: bcm: dvp: Assign ->num before accessing ->hws Commit f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with __counted_by") annotated the hws member of 'struct clk_hw_onecell_data' with __counted_by, which informs the bounds sanitizer about the number of elements in hws, so that it can warn when hws is accessed out of bounds. As noted in that change, the __counted_by member must be initialized with the number of elements before the first array access happens, otherwise there will be a warning from each access prior to the initialization because the number of elements is zero. This occurs in clk_dvp_probe() due to ->num being assigned after ->hws has been accessed: UBSAN: array-index-out-of-bounds in drivers/clk/bcm/clk-bcm2711-dvp.c:59:2 index 0 is out of range for type 'struct clk_hw *[] __counted_by(num)' (aka 'struct clk_hw *[]') Move the ->num initialization to before the first access of ->hws, which clears up the warning.
net/unix/af_unix.c in the Linux kernel 2.6.31.4 and earlier allows local users to cause a denial of service (system hang) by creating an abstract-namespace AF_UNIX listening socket, performing a shutdown operation on this socket, and then performing a series of connect operations to this socket.
In the Linux kernel, the following vulnerability has been resolved: drm/i915/hwmon: Get rid of devm When both hwmon and hwmon drvdata (on which hwmon depends) are device managed resources, the expectation, on device unbind, is that hwmon will be released before drvdata. However, in i915 there are two separate code paths, which both release either drvdata or hwmon and either can be released before the other. These code paths (for device unbind) are as follows (see also the bug referenced below): Call Trace: release_nodes+0x11/0x70 devres_release_group+0xb2/0x110 component_unbind_all+0x8d/0xa0 component_del+0xa5/0x140 intel_pxp_tee_component_fini+0x29/0x40 [i915] intel_pxp_fini+0x33/0x80 [i915] i915_driver_remove+0x4c/0x120 [i915] i915_pci_remove+0x19/0x30 [i915] pci_device_remove+0x32/0xa0 device_release_driver_internal+0x19c/0x200 unbind_store+0x9c/0xb0 and Call Trace: release_nodes+0x11/0x70 devres_release_all+0x8a/0xc0 device_unbind_cleanup+0x9/0x70 device_release_driver_internal+0x1c1/0x200 unbind_store+0x9c/0xb0 This means that in i915, if use devm, we cannot gurantee that hwmon will always be released before drvdata. Which means that we have a uaf if hwmon sysfs is accessed when drvdata has been released but hwmon hasn't. The only way out of this seems to be do get rid of devm_ and release/free everything explicitly during device unbind. v2: Change commit message and other minor code changes v3: Cleanup from i915_hwmon_register on error (Armin Wolf) v4: Eliminate potential static analyzer warning (Rodrigo) Eliminate fetch_and_zero (Jani) v5: Restore previous logic for ddat_gt->hwmon_dev error return (Andi)
In the Linux kernel, the following vulnerability has been resolved: blk-cgroup: fix list corruption from reorder of WRITE ->lqueued __blkcg_rstat_flush() can be run anytime, especially when blk_cgroup_bio_start is being executed. If WRITE of `->lqueued` is re-ordered with READ of 'bisc->lnode.next' in the loop of __blkcg_rstat_flush(), `next_bisc` can be assigned with one stat instance being added in blk_cgroup_bio_start(), then the local list in __blkcg_rstat_flush() could be corrupted. Fix the issue by adding one barrier.
The setup_arg_pages function in fs/exec.c in the Linux kernel before 2.6.36, when CONFIG_STACK_GROWSDOWN is used, does not properly restrict the stack memory consumption of the (1) arguments and (2) environment for a 32-bit application on a 64-bit platform, which allows local users to cause a denial of service (system crash) via a crafted exec system call, a related issue to CVE-2010-2240.
The socket implementation in net/core/sock.c in the Linux kernel before 2.6.34 does not properly manage a backlog of received packets, which allows remote attackers to cause a denial of service (memory consumption) by sending a large amount of network traffic, as demonstrated by netperf UDP tests.
The socket implementation in net/core/sock.c in the Linux kernel before 2.6.35 does not properly manage a backlog of received packets, which allows remote attackers to cause a denial of service by sending a large amount of network traffic, related to the sk_add_backlog function and the sk_rmem_alloc socket field. NOTE: this vulnerability exists because of an incomplete fix for CVE-2010-4251.
The blk_rq_map_user_iov function in block/blk-map.c in the Linux kernel before 2.6.37-rc7 allows local users to cause a denial of service (panic) via a zero-length I/O request in a device ioctl to a SCSI device, related to an unaligned map. NOTE: this vulnerability exists because of an incomplete fix for CVE-2010-4163.
The wait_for_unix_gc function in net/unix/garbage.c in the Linux kernel before 2.6.37-rc3-next-20101125 does not properly select times for garbage collection of inflight sockets, which allows local users to cause a denial of service (system hang) via crafted use of the socketpair and sendmsg system calls for SOCK_SEQPACKET sockets.
fs/exec.c in the Linux kernel before 2.6.37 does not enable the OOM Killer to assess use of stack memory by arrays representing the (1) arguments and (2) environment, which allows local users to cause a denial of service (memory consumption) via a crafted exec system call, aka an "OOM dodging issue," a related issue to CVE-2010-3858.
In the Linux kernel, the following vulnerability has been resolved: bcachefs: Check for journal entries overruning end of sb clean section Fix a missing bounds check in superblock validation. Note that we don't yet have repair code for this case - repair code for individual items is generally low priority, since the whole superblock is checksummed, validated prior to write, and we have backups.
The KVM implementation in the Linux kernel before 2.6.36 does not properly reload the FS and GS segment registers, which allows host OS users to cause a denial of service (host OS crash) via a KVM_RUN ioctl call in conjunction with a modified Local Descriptor Table (LDT).
Rocket Software UniData versions prior to 8.2.4 build 3003 and UniVerse versions prior to 11.3.5 build 1001 or 12.2.1 build 2002 suffer from a memory-exhaustion issue, where a decompression routine will allocate increasing amounts of memory until all system memory is exhausted and the forked process crashes.
The sctp_auth_asoc_get_hmac function in net/sctp/auth.c in the Linux kernel before 2.6.36 does not properly validate the hmac_ids array of an SCTP peer, which allows remote attackers to cause a denial of service (memory corruption and panic) via a crafted value in the last element of this array.
Uncontrolled resource consumption in some Intel(R) Aptio* V UEFI Firmware Integrator Tools may allow an authenticated user to potentially enable denial of service via local access.
In the Linux kernel, the following vulnerability has been resolved: net: bridge: vlan: fix memory leak in __allowed_ingress When using per-vlan state, if vlan snooping and stats are disabled, untagged or priority-tagged ingress frame will go to check pvid state. If the port state is forwarding and the pvid state is not learning/forwarding, untagged or priority-tagged frame will be dropped but skb memory is not freed. Should free skb when __allowed_ingress returns false.
In the Linux kernel, the following vulnerability has been resolved: ASoC: codecs: wcd938x: fix incorrect used of portid Mixer controls have the channel id in mixer->reg, which is not same as port id. port id should be derived from chan_info array. So fix this. Without this, its possible that we could corrupt struct wcd938x_sdw_priv by accessing port_map array out of range with channel id instead of port id.
The IP stack in the Linux kernel before 4.6 allows remote attackers to cause a denial of service (stack consumption and panic) or possibly have unspecified other impact by triggering use of the GRO path for packets with tunnel stacking, as demonstrated by interleaved IPv4 headers and GRE headers, a related issue to CVE-2016-7039.
The bio_map_user_iov and bio_unmap_user functions in block/bio.c in the Linux kernel before 4.13.8 do unbalanced refcounting when a SCSI I/O vector has small consecutive buffers belonging to the same page. The bio_add_pc_page function merges them into one, but the page reference is never dropped. This causes a memory leak and possible system lockup (exploitable against the host OS by a guest OS user, if a SCSI disk is passed through to a virtual machine) due to an out-of-memory condition.
In the Linux kernel, the following vulnerability has been resolved: lan966x: Fix crash when adding interface under a lag There is a crash when adding one of the lan966x interfaces under a lag interface. The issue can be reproduced like this: ip link add name bond0 type bond miimon 100 mode balance-xor ip link set dev eth0 master bond0 The reason is because when adding a interface under the lag it would go through all the ports and try to figure out which other ports are under that lag interface. And the issue is that lan966x can have ports that are NULL pointer as they are not probed. So then iterating over these ports it would just crash as they are NULL pointers. The fix consists in actually checking for NULL pointers before accessing something from the ports. Like we do in other places.
In the Linux kernel, the following vulnerability has been resolved: KVM: Always flush async #PF workqueue when vCPU is being destroyed Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its completion queue, e.g. when a VM and all its vCPUs is being destroyed. KVM must ensure that none of its workqueue callbacks is running when the last reference to the KVM _module_ is put. Gifting a reference to the associated VM prevents the workqueue callback from dereferencing freed vCPU/VM memory, but does not prevent the KVM module from being unloaded before the callback completes. Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will result in deadlock. async_pf_execute() can't return until kvm_put_kvm() finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes: WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm] Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: events async_pf_execute [kvm] RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm] Call Trace: <TASK> async_pf_execute+0x198/0x260 [kvm] process_one_work+0x145/0x2d0 worker_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 </TASK> ---[ end trace 0000000000000000 ]--- INFO: task kworker/8:1:251 blocked for more than 120 seconds. Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/8:1 state:D stack:0 pid:251 ppid:2 flags:0x00004000 Workqueue: events async_pf_execute [kvm] Call Trace: <TASK> __schedule+0x33f/0xa40 schedule+0x53/0xc0 schedule_timeout+0x12a/0x140 __wait_for_common+0x8d/0x1d0 __flush_work.isra.0+0x19f/0x2c0 kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm] kvm_arch_destroy_vm+0x78/0x1b0 [kvm] kvm_put_kvm+0x1c1/0x320 [kvm] async_pf_execute+0x198/0x260 [kvm] process_one_work+0x145/0x2d0 worker_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 </TASK> If kvm_clear_async_pf_completion_queue() actually flushes the workqueue, then there's no need to gift async_pf_execute() a reference because all invocations of async_pf_execute() will be forced to complete before the vCPU and its VM are destroyed/freed. And that in turn fixes the module unloading bug as __fput() won't do module_put() on the last vCPU reference until the vCPU has been freed, e.g. if closing the vCPU file also puts the last reference to the KVM module. Note that kvm_check_async_pf_completion() may also take the work item off the completion queue and so also needs to flush the work queue, as the work will not be seen by kvm_clear_async_pf_completion_queue(). Waiting on the workqueue could theoretically delay a vCPU due to waiting for the work to complete, but that's a very, very small chance, and likely a very small delay. kvm_arch_async_page_present_queued() unconditionally makes a new request, i.e. will effectively delay entering the guest, so the remaining work is really just: trace_kvm_async_pf_completed(addr, cr2_or_gpa); __kvm_vcpu_wake_up(vcpu); mmput(mm); and mmput() can't drop the last reference to the page tables if the vCPU is still alive, i.e. the vCPU won't get stuck tearing down page tables. Add a helper to do the flushing, specifically to deal with "wakeup all" work items, as they aren't actually work items, i.e. are never placed in a workqueue. Trying to flush a bogus workqueue entry rightly makes __flush_work() complain (kudos to whoever added that sanity check). Note, commit 5f6de5cbebee ("KVM: Prevent module exit until al ---truncated---
A hash collision flaw was found in the IPv6 connection lookup table in the Linux kernel’s IPv6 functionality when a user makes a new kind of SYN flood attack. A user located in the local network or with a high bandwidth connection can increase the CPU usage of the server that accepts IPV6 connections up to 95%.
Multiple memory leaks in error paths in fs/xfs/xfs_attr_list.c in the Linux kernel before 4.5.1 allow local users to cause a denial of service (memory consumption) via crafted XFS filesystem operations.
fs/namespace.c in the Linux kernel before 4.9 does not restrict how many mounts may exist in a mount namespace, which allows local users to cause a denial of service (memory consumption and deadlock) via MS_BIND mount system calls, as demonstrated by a loop that triggers exponential growth in the number of mounts.
The shmem_delete_inode function in mm/shmem.c in the tmpfs implementation in the Linux kernel before 2.6.26.1 allows local users to cause a denial of service (system crash) via a certain sequence of file create, remove, and overwrite operations, as demonstrated by the insserv program, related to allocation of "useless pages" and improper maintenance of the i_blocks count.
IBM Sterling Partner Engagement Manager 6.1.2, 6.2.0, and 6.2.1 could allow an authenticated user to exhaust server resources which could lead to a denial of service. IBM X-Force ID: 229705.
In the Linux kernel, the following vulnerability has been resolved: enetc: Fix illegal access when reading affinity_hint irq_set_affinity_hit() stores a reference to the cpumask_t parameter in the irq descriptor, and that reference can be accessed later from irq_affinity_hint_proc_show(). Since the cpu_mask parameter passed to irq_set_affinity_hit() has only temporary storage (it's on the stack memory), later accesses to it are illegal. Thus reads from the corresponding procfs affinity_hint file can result in paging request oops. The issue is fixed by the get_cpu_mask() helper, which provides a permanent storage for the cpumask_t parameter.
In the Linux kernel, the following vulnerability has been resolved: net: ipv4: fix memory leak in ip_mc_add1_src BUG: memory leak unreferenced object 0xffff888101bc4c00 (size 32): comm "syz-executor527", pid 360, jiffies 4294807421 (age 19.329s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 01 00 00 00 00 00 00 00 ac 14 14 bb 00 00 02 00 ................ backtrace: [<00000000f17c5244>] kmalloc include/linux/slab.h:558 [inline] [<00000000f17c5244>] kzalloc include/linux/slab.h:688 [inline] [<00000000f17c5244>] ip_mc_add1_src net/ipv4/igmp.c:1971 [inline] [<00000000f17c5244>] ip_mc_add_src+0x95f/0xdb0 net/ipv4/igmp.c:2095 [<000000001cb99709>] ip_mc_source+0x84c/0xea0 net/ipv4/igmp.c:2416 [<0000000052cf19ed>] do_ip_setsockopt net/ipv4/ip_sockglue.c:1294 [inline] [<0000000052cf19ed>] ip_setsockopt+0x114b/0x30c0 net/ipv4/ip_sockglue.c:1423 [<00000000477edfbc>] raw_setsockopt+0x13d/0x170 net/ipv4/raw.c:857 [<00000000e75ca9bb>] __sys_setsockopt+0x158/0x270 net/socket.c:2117 [<00000000bdb993a8>] __do_sys_setsockopt net/socket.c:2128 [inline] [<00000000bdb993a8>] __se_sys_setsockopt net/socket.c:2125 [inline] [<00000000bdb993a8>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2125 [<000000006a1ffdbd>] do_syscall_64+0x40/0x80 arch/x86/entry/common.c:47 [<00000000b11467c4>] entry_SYSCALL_64_after_hwframe+0x44/0xae In commit 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link down"), the ip_mc_clear_src() in ip_mc_destroy_dev() was removed, because it was also called in igmpv3_clear_delrec(). Rough callgraph: inetdev_destroy -> ip_mc_destroy_dev -> igmpv3_clear_delrec -> ip_mc_clear_src -> RCU_INIT_POINTER(dev->ip_ptr, NULL) However, ip_mc_clear_src() called in igmpv3_clear_delrec() doesn't release in_dev->mc_list->sources. And RCU_INIT_POINTER() assigns the NULL to dev->ip_ptr. As a result, in_dev cannot be obtained through inetdev_by_index() and then in_dev->mc_list->sources cannot be released by ip_mc_del1_src() in the sock_close. Rough call sequence goes like: sock_close -> __sock_release -> inet_release -> ip_mc_drop_socket -> inetdev_by_index -> ip_mc_leave_src -> ip_mc_del_src -> ip_mc_del1_src So we still need to call ip_mc_clear_src() in ip_mc_destroy_dev() to free in_dev->mc_list->sources.
In the Linux kernel, the following vulnerability has been resolved: net: Only allow init netns to set default tcp cong to a restricted algo tcp_set_default_congestion_control() is netns-safe in that it writes to &net->ipv4.tcp_congestion_control, but it also sets ca->flags |= TCP_CONG_NON_RESTRICTED which is not namespaced. This has the unintended side-effect of changing the global net.ipv4.tcp_allowed_congestion_control sysctl, despite the fact that it is read-only: 97684f0970f6 ("net: Make tcp_allowed_congestion_control readonly in non-init netns") Resolve this netns "leak" by only allowing the init netns to set the default algorithm to one that is restricted. This restriction could be removed if tcp_allowed_congestion_control were namespace-ified in the future. This bug was uncovered with https://github.com/JonathonReinhart/linux-netns-sysctl-verify
In the Linux kernel, the following vulnerability has been resolved: net: sched: fix memory leak in tcindex_partial_destroy_work Syzbot reported memory leak in tcindex_set_parms(). The problem was in non-freed perfect hash in tcindex_partial_destroy_work(). In tcindex_set_parms() new tcindex_data is allocated and some fields from old one are copied to new one, but not the perfect hash. Since tcindex_partial_destroy_work() is the destroy function for old tcindex_data, we need to free perfect hash to avoid memory leak.
In the Linux kernel, the following vulnerability has been resolved: tracing: Restructure trace_clock_global() to never block It was reported that a fix to the ring buffer recursion detection would cause a hung machine when performing suspend / resume testing. The following backtrace was extracted from debugging that case: Call Trace: trace_clock_global+0x91/0xa0 __rb_reserve_next+0x237/0x460 ring_buffer_lock_reserve+0x12a/0x3f0 trace_buffer_lock_reserve+0x10/0x50 __trace_graph_return+0x1f/0x80 trace_graph_return+0xb7/0xf0 ? trace_clock_global+0x91/0xa0 ftrace_return_to_handler+0x8b/0xf0 ? pv_hash+0xa0/0xa0 return_to_handler+0x15/0x30 ? ftrace_graph_caller+0xa0/0xa0 ? trace_clock_global+0x91/0xa0 ? __rb_reserve_next+0x237/0x460 ? ring_buffer_lock_reserve+0x12a/0x3f0 ? trace_event_buffer_lock_reserve+0x3c/0x120 ? trace_event_buffer_reserve+0x6b/0xc0 ? trace_event_raw_event_device_pm_callback_start+0x125/0x2d0 ? dpm_run_callback+0x3b/0xc0 ? pm_ops_is_empty+0x50/0x50 ? platform_get_irq_byname_optional+0x90/0x90 ? trace_device_pm_callback_start+0x82/0xd0 ? dpm_run_callback+0x49/0xc0 With the following RIP: RIP: 0010:native_queued_spin_lock_slowpath+0x69/0x200 Since the fix to the recursion detection would allow a single recursion to happen while tracing, this lead to the trace_clock_global() taking a spin lock and then trying to take it again: ring_buffer_lock_reserve() { trace_clock_global() { arch_spin_lock() { queued_spin_lock_slowpath() { /* lock taken */ (something else gets traced by function graph tracer) ring_buffer_lock_reserve() { trace_clock_global() { arch_spin_lock() { queued_spin_lock_slowpath() { /* DEAD LOCK! */ Tracing should *never* block, as it can lead to strange lockups like the above. Restructure the trace_clock_global() code to instead of simply taking a lock to update the recorded "prev_time" simply use it, as two events happening on two different CPUs that calls this at the same time, really doesn't matter which one goes first. Use a trylock to grab the lock for updating the prev_time, and if it fails, simply try again the next time. If it failed to be taken, that means something else is already updating it. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212761
In the Linux kernel, the following vulnerability has been resolved: scsi: megaraid_sas: Fix resource leak in case of probe failure The driver doesn't clean up all the allocated resources properly when scsi_add_host(), megasas_start_aen() function fails during the PCI device probe. Clean up all those resources.
In the Linux kernel, the following vulnerability has been resolved: nexthop: Fix memory leaks in nexthop notification chain listeners syzkaller discovered memory leaks [1] that can be reduced to the following commands: # ip nexthop add id 1 blackhole # devlink dev reload pci/0000:06:00.0 As part of the reload flow, mlxsw will unregister its netdevs and then unregister from the nexthop notification chain. Before unregistering from the notification chain, mlxsw will receive delete notifications for nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw will not receive notifications for nexthops using netdevs that are not dismantled as part of the reload flow. For example, the blackhole nexthop above that internally uses the loopback netdev as its nexthop device. One way to fix this problem is to have listeners flush their nexthop tables after unregistering from the notification chain. This is error-prone as evident by this patch and also not symmetric with the registration path where a listener receives a dump of all the existing nexthops. Therefore, fix this problem by replaying delete notifications for the listener being unregistered. This is symmetric to the registration path and also consistent with the netdev notification chain. The above means that unregister_nexthop_notifier(), like register_nexthop_notifier(), will have to take RTNL in order to iterate over the existing nexthops and that any callers of the function cannot hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN driver. To avoid a deadlock, change the latter to unregister its nexthop listener without holding RTNL, making it symmetric to the registration path. [1] unreferenced object 0xffff88806173d600 (size 512): comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s) hex dump (first 32 bytes): 41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff A..`......sa.... 08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00 ..sa............ backtrace: [<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522 [<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline] [<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline] [<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231 [<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline] [<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239 [<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83 [<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline] [<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306 [<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244 [<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline] [<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline] [<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913 [<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572 [<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504 [<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590 [<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340 [<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929 [<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline ---truncated---
In the Linux kernel, the following vulnerability has been resolved: x86/xen: Drop USERGS_SYSRET64 paravirt call commit afd30525a659ac0ae0904f0cb4a2ca75522c3123 upstream. USERGS_SYSRET64 is used to return from a syscall via SYSRET, but a Xen PV guest will nevertheless use the IRET hypercall, as there is no sysret PV hypercall defined. So instead of testing all the prerequisites for doing a sysret and then mangling the stack for Xen PV again for doing an iret just use the iret exit from the beginning. This can easily be done via an ALTERNATIVE like it is done for the sysenter compat case already. It should be noted that this drops the optimization in Xen for not restoring a few registers when returning to user mode, but it seems as if the saved instructions in the kernel more than compensate for this drop (a kernel build in a Xen PV guest was slightly faster with this patch applied). While at it remove the stale sysret32 remnants. [ pawan: Brad Spengler and Salvatore Bonaccorso <carnil@debian.org> reported a problem with the 5.10 backport commit edc702b4a820 ("x86/entry_64: Add VERW just before userspace transition"). When CONFIG_PARAVIRT_XXL=y, CLEAR_CPU_BUFFERS is not executed in syscall_return_via_sysret path as USERGS_SYSRET64 is runtime patched to: .cpu_usergs_sysret64 = { 0x0f, 0x01, 0xf8, 0x48, 0x0f, 0x07 }, // swapgs; sysretq which is missing CLEAR_CPU_BUFFERS. It turns out dropping USERGS_SYSRET64 simplifies the code, allowing CLEAR_CPU_BUFFERS to be explicitly added to syscall_return_via_sysret path. Below is with CONFIG_PARAVIRT_XXL=y and this patch applied: syscall_return_via_sysret: ... <+342>: swapgs <+345>: xchg %ax,%ax <+347>: verw -0x1a2(%rip) <------ <+354>: sysretq ]
A lack of CPU resource in the Linux kernel tracing module functionality in versions prior to 5.14-rc3 was found in the way user uses trace ring buffer in a specific way. Only privileged local users (with CAP_SYS_ADMIN capability) could use this flaw to starve the resources causing denial of service.
In the Linux kernel, the following vulnerability has been resolved: jfs: fix slab-out-of-bounds Read in dtSearch Currently while searching for current page in the sorted entry table of the page there is a out of bound access. Added a bound check to fix the error. Dave: Set return code to -EIO
A flaw was found in the Linux kernel. Measuring usage of the shared memory does not scale with large shared memory segment counts which could lead to resource exhaustion and DoS.
The IPv6 implementation in the Linux kernel before 6.3 has a net/ipv6/route.c max_size threshold that can be consumed easily, e.g., leading to a denial of service (network is unreachable errors) when IPv6 packets are sent in a loop via a raw socket.
In the Linux kernel, the following vulnerability has been resolved: pipe: wakeup wr_wait after setting max_usage Commit c73be61cede5 ("pipe: Add general notification queue support") a regression was introduced that would lock up resized pipes under certain conditions. See the reproducer in [1]. The commit resizing the pipe ring size was moved to a different function, doing that moved the wakeup for pipe->wr_wait before actually raising pipe->max_usage. If a pipe was full before the resize occured it would result in the wakeup never actually triggering pipe_write. Set @max_usage and @nr_accounted before waking writers if this isn't a watch queue. [Christian Brauner <brauner@kernel.org>: rewrite to account for watch queues]
The tcp_rcv_state_process function in net/ipv4/tcp_input.c in the Linux kernel before 3.2.24 allows remote attackers to cause a denial of service (kernel resource consumption) via a flood of SYN+FIN TCP packets, a different vulnerability than CVE-2012-2663.
A memory leak flaw was found in the Linux kernel's ccp_run_aes_gcm_cmd() function that allows an attacker to cause a denial of service. The vulnerability is similar to the older CVE-2019-18808. The highest threat from this vulnerability is to system availability.
The d_walk function in fs/dcache.c in the Linux kernel through 3.17.2 does not properly maintain the semantics of rename_lock, which allows local users to cause a denial of service (deadlock and system hang) via a crafted application.
arch/x86/kvm/vmx.c in the KVM subsystem in the Linux kernel before 3.17.2 on Intel processors does not ensure that the value in the CR4 control register remains the same after a VM entry, which allows host OS users to kill arbitrary processes or cause a denial of service (system disruption) by leveraging /dev/kvm access, as demonstrated by PR_SET_TSC prctl calls within a modified copy of QEMU.
The pivot_root implementation in fs/namespace.c in the Linux kernel through 3.17 does not properly interact with certain locations of a chroot directory, which allows local users to cause a denial of service (mount-tree loop) via . (dot) values in both arguments to the pivot_root system call.
cipso_v4_validate in include/net/cipso_ipv4.h in the Linux kernel before 3.11.7, when CONFIG_NETLABEL is disabled, allows attackers to cause a denial of service (infinite loop and crash), as demonstrated by icmpsic, a different vulnerability than CVE-2013-0310.
The sctp_assoc_lookup_asconf_ack function in net/sctp/associola.c in the SCTP implementation in the Linux kernel through 3.17.2 allows remote attackers to cause a denial of service (panic) via duplicate ASCONF chunks that trigger an incorrect uncork within the side-effect interpreter.