-===================- Kernel Janitor TODO list. $Id: TODO,v 1.23 2003/05/01 04:15:43 rdunlap Exp $ -===================- None of the following items are in any order of importance or difficulty. Where possible related items have been grouped together. Before attacking any of these, we suggest that you ask about it (them) on the mailing list (kernel-janitor-discuss@lists.sf.net). -------------------------------------------------------------------------- NOTE: scripts/kj.pl was a weekend hack. Don't take it too seriously. -------------------------------------------------------------------------- Audit return codes (and handle failure correctly) for.. - misc_register() - request_region() - register_reboot_notifier() - request_irq() - kmalloc(), vmalloc(), skb_alloc(), etc - register_netdev() has to be checked as well - misc_register() (yes, it can fail, murphy's law applies here as well) - scsi_register() - proc_*_create() - pci_map_* might return 0 for a valid mapping. Some code tests mapping for a non-zero value, which is incorrect. - get_free_pages() - init_etherdev() drivers allocating net_device with init_etherdev doesn't need zeroing it (init_etherdev does this for us) - ioremap() -- Some are using this as a pointer, which is wrong. SUSPECTS: - drivers/scsi/ips.c for resource leaks (ips_release doesn't seems to release all the kmalloc memory it got in ips_detect. (quick look) - drivers/char/ip2main.c (init_etherdev) ----------------------------------------------------------------------------- Balancing functions. Make sure calls to certain functions are matched by the relevant function at other end of the function, and also during ALL failure/early return paths. Examples: - pci_alloc_consistent() and pci_free_consistent() - ioremap() must be balanced by an iounmap() (this causes a memory leak). - check for balanced *_lock_* and *_unlock_* along all paths in a function (can cause dead-lock if not). - Make sure no-one is freeing skbs with kfree instead of kfree_skb - check that net_device interrupt functions use dev_kfree_skb_irq and not just dev_kfree_skb SUSPECTS: - check isapnp.c: doesn't release regions on failure ----------------------------------------------------------------------------- Remove unneeded historic code / New API conversions. - checking for NULL on probe routines for net drivers - convert drivers to new PCI API - proc_register() is dead. Use create_proc_read_entry() instead. (from Al Viro on lkml) - Replace uses of suser() and fsuser() with capability checks. - get rid of save_flags_cli, use local_irq_save instead - get rid of check_region, use just request_region checking its return (2.2 request_region returned void) and now the driver init sequence is not to be serialized anymore, so races are possible (look at cardbus/pcihotplug code) - converting cli to spinlocks (look at net/netrom/*.c, etc) - ALL PCI drivers should call pci_enable_device --before-- reading pdev->irq or pdev->resource[]. irq and resource[] may not have correct values until after PCI hotplug setup occurs at pci_enable_device() time. Many PCI drivers need to be evaluated and checked for this. Many of these fixed already at: http://www.lib.uaa.alaska.edu/linux-kernel/archive/2001-Week-11/0150.html - Net drivers should call SET_MODULE_OWNER instead of MOD_{INC,DEC}_USE_COUNT - Net drivers should set dev->last_rx immediately after netif_rx - There are still kernel threads which don't use up_and_exit() for dying. - Purely cosmetic, but far nicer to read.. - for (list = ymf_devs.next; list != &ymf_devs; list = list->next) { + list_for_each(list, &ymf_devs) { - Use pci_set_drvdata() to set dev->driver_data, likewise use pci_get_drvdata() instead of reading directly. - dev->driver_data = pcigame; + pci_set_drvdata(dev, pcigame); - Take end_request() definition out of blk.h and move it into the drivers that need it. IOW, ones that contain calls of end_request() and don't have LOCAL_END_REQUEST defined. once that's done, kill LOCAL_END_REQUEST... (suggested by Al Viro) SUSPECTS: - de4x5.c doesn't pci_enable_device() ----------------------------------------------------------------------------- Copying to/from userspace issues. - The copy_*_user() routines may sleep. For this reason, they shouldn't be called with locks held, or with interrupts disabled. - get rid of verify_area with copy_*_user get/put_user, only needed if using __copy_*_user et al - make sure that copy_*_user return values are checked - use "return copy_*_user(...) ? -EFAULT : ok_value;" SUSPECTS: - look at drivers/char/generic_serial.c - look at drivers/char/n_tty.c - check drivers/char/dz.c wrt return copy*user(...); has to be ... ? -EFAULT : 0; ----------------------------------------------------------------------------- - get rid of panic function in drivers (watchdogs need to use machine_restart instead of panic -- several char drivers do this happily :( ----------------------------------------------------------------------------- - get rid of isa_read/write[bwl], use ioremap instead - unsigned long addr = 0xC0000; - isa_writeb (0xC0, addr); + void *cookie = ioremap (0xC0000, region_size); + if (!cookie) { fail... } + writeb (0xC0, cookie); + iounmap (cookie); Note there is a ioremap step at driver init time, and an iounmap step at driver shutdown time. isa_xxx simply doesn't have the remap/unmap stuff. Read Documentation/IO-mapping.txt for more details. ----------------------------------------------------------------------------- [Al Viro] * try and convince XFS folks that they want to use negative numbers for error values (will be tricky - they really don't like to diverge from IRIX codebase) [They know this but it's not an issue for them.--rdd] * ditto for JFS - again, a bunch of functions use positive error values. [They know this and are slowly working on it.--rdd] ----------------------------------------------------------------------------- Date: Wed, 14 Feb 2001 12:51:53 +0100 From: Manfred Spraul bug in network drivers: * dev->mem_start: NULL means "not command line configuration" 0xffffffff means "default". several drivers only check for NULL, not for 0xffffffff. And then: Date: Wed, 14 Feb 2001 05:54:34 -0600 (CST) From: Jeff Garzik netdev->mem_start is unsigned long... Should the test be for ~0 instead? The value 0xFFFFFFFF seems wrong for 64-bit machines. ----------------------------------------------------------------------------- - drivers that try to find multiple boards, possibly successfully allocating for the first ones, then failing for, lets say, the third board, then it just returns failure for init_module, the module is unloaded but the resources remain allocated... in these cases we need to rollback the allocations, freeing it before returning from init_module or equivalent. - sound/maestro3.c doesn't pci_free_consistent any buffers if one allocation fails, but others succeeded. ----------------------------------------------------------------------------- - prumpf suggested: - make sure drivers never read loops_per_sec - it might change under them (prumpf did this in 2.2.18pre series, need forward port to 2.4) - fix watchdog drivers to use link order rather than explicit initialization calls (i810 is particularly broken) - get rid of init_module / cleanup_module (softdog in particular) - make sure BUG() is used correctly (i.e. if(function()) BUG(); is evil) i.e. even when no-op-ing BUG we still have an if (See also: BUG_ON) ----------------------------------------------------------------------------- From: Hans Grobler - audit an ioctl function to make sure there is no way a user can crash the machine or do sensitive stuff. for example, check for the necessary capabilities. check that no userspace pointers are accessed directly. be carefull with this... it takes a while to figure out which ioctls are covered by the parent ioctl function (these can be nested to great depths). - check for dev_close calls without rntl_lock held (cases assertion failures). - check that a freed pointer (*kfree_*) is not used again. ----------------------------------------------------------------------------- From: David Woodhouse - Anything which uses sleep_on() has a 90% chance of being broken. Fix them all, because we want to remove sleep_on() and friends in 2.5. Explanation by Andi Kleen lkml thread at: http://boudicca.tux.org/hypermail/linux-kernel/2001week05/0305.html > don't even know WHY it's bad. Not only that, but what am I supposed to use > instead? You can miss wakeups. The standard pattern is: get locks add_wait_queue(&waitqueue, &wait); for (;;) { if (condition you're waiting for is true) break; unlock any non sleeping locks you need for condition __set_task_state(current, TASK_UNINTERRUPTIBLE); schedule(); __set_task_state(current, TASK_RUNNING); reaquire locks } remove_wait_queue(&waitqueue, &wait); When you want to handle signals you can check for them before or after the condition check. Also use TASK_INTERRUPTIBLE in this case. When you need a timeout use schedule_timeout(). For some cases you can also use the wait_event_* macros which encapsulate that for you, assuming condition can be tested/used lockless. An alternative is to use a semaphore, although that behaves a bit differently under load. From: David Woodhouse > Ok, so how should this code have been written? DECLARE_WAIT_QUEUE(wait, current); while(1) { do_something_important() set_current_state(TASK_INTERRUPTIBLE); add_wait_queue(&q, &wait); /* Now if something arrives, we'll be 'woken' immediately - - that is; our state will be set to TASK_RUNNING */ if (!stuff_to_do()) { /* If the 'stuff' arrives here, we get woken anyway, so the schedule() returns immediately. You can use schedule_timeout() here if you need a timeout, obviously */ schedule(); } set_current_state(TASK_RUNNING); remove_wait_queue(&q, &wait); if (signal_pending(current)) { /* You've been signalled. Deal with it. If you don't want signals to wake you, use TASK_UNINTERRUPTIBLE above instead of TASK_INTERRUPTIBLE. Be aware that you'll add one to the load average all the time your task is sleeping then. */ return -EINTR; } } Alternatively, you could up() a semaphore for each task that's do be done, and down() it again each time you remove one from the queue. ----------------------------------------------------------------------------- From: Andrew Morton Here - have about 300 bugs: http://www.uwsg.iu.edu/hypermail/linux/kernel/0005.3/0269.html A lot of the timer deletion races are hard to fix because of the deadlock problem. ----------------------------------------------------------------------------- Date: Mon, 26 Feb 2001 16:53:50 -0800 (PST) From: "David S. Miller" Andrew Morton writes: > Could this be a driver problem? > netif_rx(skb); ... > new_skb = skb; This is illegal and broken and will never work. Once you give an skb to netif_rx() it is not yours to reference any longer. Donald Becker added: Easier fix: - np->stats.rx_bytes += skb->len; + np->stats.rx_bytes += pkt_len; Grouping the writes to np->stats results in better cache usage. Jeff Garzik added: It makes use of the existing local 'pkt_len', and it checks off another item that should probably be on the janitor's todo list: Set 'dev->last_rx=jiffies' immediately after netif_rx. Jeff ----------------------------------------------------------------------- From: Jeff Garzik > It is highly recommended to always compile with CONFIG_ISAPNP=y due > to these differences. If you grep around for CONFIG_ISAPNP versus > CONFIG_ISAPNP_MODULE, you'll see that many drivers are woefully > unprepared for isapnp support compiled as a module. Yep.. grep for CONFIG_ISAPNP, look at the code, and evaluate it to make sure that isapnp works for that drivers regardless of whether CONFIG_ISAPNP -or- CONFIG_ISAPNP_MODULE is defined. ------------------------------------------------------------------------ From: Jeff Garzik Audit all memcpy and memmove calls. If calling memmove, see if the areas ever overlap. If not, use memcpy instead. If calling memcpy, see if the areas ever overlap. If so, use memmmove instead. ------------------------------------------------------------------------ From: Andrey Panin - check printk() calls (should include appropriate KERN_* constant). ------------------------------------------------------------------------ From Dave Jones - Lots of drivers are doing evil looking things writing to PCI_CACHE_LINE_SIZE This should be done during PCI initialisation. - Checking LINUX_VERSION_CODE & KERNEL_VERSION for specific versions of 2.3 should be checked. - Lots of unnecessary casts in drivers can be removed. Example: - struct netdev_private *np = (struct netdev_private *)dev->priv; + struct netdev_private *np = dev->priv; Patch for a lot of these is available from ftp.suse.com/pub/people/davej/kernel/2.4/2.4.2ac20/unneeded-net-casts.diff - drivers/net/aironet*.c could use some functions being made statics. - Many drivers have alignments such as.. dev->priv = (void *)(((unsigned long)dev->priv + 7) & ~7); jgarzik: What they are doing here is DMA'ing into their private board (dev->priv), which is wrong -- they should be using PCI DMA instead. They can't use init_etherdev because it doesn't pass GFP_DMA to kmalloc for dev->priv, and since they are DMA'ing into dev->priv, they have to do things manually. - drivers/net/rcpci45.c Quiten debug printk's in RCioctl() - Check kmallocs for things like GFP_DMA without a memtype. - Signedness issues. We have _lots_ of these. Adding a -W to the kernel build line shows them. We have lots of code paths that are not executed because comparisons <0 on an unsigned int always evaluates to true. We also need to check for things like... char foo[4]={1,2,3,133}; /* 133 in signed char!*/ ------------------------------------------------------------------------ From: Jeff Garzik read_eeprom can be marked __init in other net drivers also. akpm: well, not blindly, of course. Need to follow the call graph, make sure it's safe. ------------------------------------------------------------------------ From: Jeff Garzik The string form [const] char *foo = "blah"; creates two variables in the final assembly output, a static string, and a char pointer to the static string. The alternate string form [const] char foo[] = "blah"; is better because it declares a single variable. For variables marked __initdata, the "*foo" form causes only the pointer, not the string itself, to be dropped from the kernel image, which is a bug. Using the "foo[]" form with regular 'ole local variables also makes the assembly shorter. Look at the diff between include/linux/etherdevice.h in pre7 versus pre8. The assembly for the pre8 version is ~10 lines shorter, with 6 of those removed lines being the string constant stored separately. Maybe this should go into CodingStyle? ------------------------------------------------------------------------ From: Jeff Garzik 1) "unsigned int" is preferred to "int", it generates better asm code on all platforms except sh5. This replacement needs to be done manually, because often 'int' is required due to negative values -Exxx commonly passed as error values. 2) someone who knows DocBook, or is willing to learn, should go through and clean up Documentation/DocBook to kill all the warnings that occur during "make pdfdocs" and generally make the documents look nicer, and render smaller PDFs. See, not everyone needs to be a kernel hacker to really help out :) ------------------------------------------------------------------------ - when using sizeof consider using the variable and not the type, that way if the type of the variable is changed we don't have to go seaching all instances of sizeof(type) Comments by jgarzik: That's kind of misleading... it should be sizeof(*var) not sizeof(var) in most cases. acme: yes, better clarify, if its a pointer you have to follow Jeff tip above and use sizeof(*var) ------------------------------------------------------------------------ strtok() isn't SMP/thread safe. Replace with strsep() where possible since it is considered safer. Read lib/string.c:14 + completed for 2.5; possibly not worth doing for 2.4. - for (opt = strtok(options, ","); - opt != NULL; - opt = strtok(NULL, ",")) { + while (opt = strsep(&options, ",")) { + if (!*opt) + continue; ----------------------------------------------------------------------------- - uninitialize static variables initialized to 0, to make it go to the .bss, instead of .data __initdata has to be initialized, even if to 0 tough ----------------------------------------------------------------------------- drivers/char/i8k.c duplicates arch/i386/kernel/dmi_scan.c Better would be to have dmi_scan.c set a flag which the i8k driver checks. ------------------------------------------------------------------------ Date: Fri, 21 Feb 2003 16:27:02 -0600 From: Eric Bresie Check here: http://www.osdl.org/archive/cherry/stability/ for additional kernel work. Also see here: http://www.osdl.org/projects/26lnxstblztn/results/ ------------------------------------------------------------------------ From several people: clean up deprecated functions/interfaces: Convert users (callers) of deprecated functions (interfaces) to new interfaces: - call SET_MODULE_OWNER or use .owner = THIS_MODULE instead of using MOD_{INC,DEC}_USE_COUNT - convert cli/sti/save_flags/save_flags_cli/restore_flags usage to accepted locking primitives; see Documentation /cli-sti-removal.txt for details. - These drivers need DMA usage attention: ./drivers/net/defxx.c:#error Please convert me to Documentation/DMA-mapping.txt ./drivers/net/rcpci45.c:#error Please convert me to Documentation/DMA-mapping.txt ./drivers/scsi/scsiiom.c:#error Please convert me to Documentation/DMA-mapping.txt ./drivers/scsi/ini9100u.c:#error Please convert me to Documentation/DMA-mapping.txt ./drivers/scsi/AM53C974.c:#error Please convert me to Documentation/DMA-mapping.txt ./drivers/scsi/gdth.c:#error Please convert me to Documentation/DMA-mapping.txt ./drivers/scsi/dpt_i2o.c:#error Please convert me to Documentation/DMA-mapping.txt ./drivers/scsi/pci2220i.c:#error Convert me to understand page+offset based scatterlists ------------------------------------------------------------------------ From Linus: Check stack usage and reduce it in the worst cases. See: http://bugme.osdl.org/show_bug.cgi?id=386 There is another script that checks function stack usage at http://kernelnewbies.org/scripts/ . --------------------------------end-----------------------------------