Friday, June 16, 2017

Linux device model

The Linux device model

Linux device model is a general abstraction, describing the structure of the system. It is used within the kernel to support a wide variety of tasks, including:
  • Power management and system shutdown
  • Communications with user space - sysfs is part of the device model.
  • Hot pluggable devices - handling plug/unplug events and communication with user space.
  • Device classes - Each device is associated to some class which describes its type/functionality. Many parts of system may check which types of devices are available by looking at their classes.
  • Object life cycles - dealing with object life cycles, their relationships to each other, and their representation in user space.

Top-down presentation of the device model

If you look at the Linux device model design from the top, you will see:
Buses, Frameworks, Classes, Devices, Device Drivers, where all of them have a sysfs representation.

Bus

A bus is a channel between the processor and one or more devices, for example - USB, PCI, SPI, MMC, ISA. Buses primarily exist to gather similar devices together and coordinate initialization, shutdown and power management.

There are two types of drivers attached to the bus:
  • Adapter drivers: main chip that manage the bus, for example: USB controllers, I2C adapters, etc.
  • Device drivers : external chip devices, for example: USB devices, I2C devices, PCI devices, etc.
All devices are connected via a bus, which should be able to detect when they are plugged and manage them.

Framework

In Linux, many device drivers are not implemented directly as a character drivers, they are implemented under a specific type of framework.
A framework implements a common parts of drivers which are of the same type (framebuffer, V4L, serial, etc...), and from the user space they are still seen as a character drivers.

Framework example - framebuffer framework

If you write an LCD driver, which usually gets its images from a framebuffer, you should use the framebuffer framework.

You will need to impliment a set of framebuffer specific operations defined by the struct fb_ops, and register the new frame buffer. This will create the character device that can be used by userspace applications with a generic framebuffer API.


#include <include/linux/fb.h>

static int __devinit xxxfb_probe(struct pci_dev *dev, 
                        const struct pci_device_id *ent)
{
    struct fb_info *info;
    ...
    info = framebuffer_alloc(sizeof(struct xxx_par), device);
    ...
    info­>fbops = &xxxfb_ops;
    ...
    if (register_framebuffer(info) < 0)
        return ­EINVAL;
    ...
}

 

Device

A device is a specific piece of hardware, an IC, a collection of an ICs or could represent something that has no real physical equivalent at all.
The compelling reason to use a "device" to represent some "thing" seems to be the interfaces, like sysfs representation, power management, resource management, etc.
All devices are connected via a bus.

Platform device

On embedded systems, devices are often not connected through a bus but directly to the CPU. We can not detect this kind of devices automatically and provide a proper driver for them. To fit this kind of devices into device model, we create them as platform devices which are connected to a special platform bus.

Class

A class is a "device interface" in object oriented manner, it exposes a common API, while the real implementation which is different for each device, is in hidden in devices logic.
For example for a 'led' class, exposes blinking, flashing, and brightness functionality. The class requires an underlying device to be available (such as a TCA6507 or GPIO...), which must support fully or partially functions exposed by the class. A class may provide more then one interface outside, for example the gpio class provides both the "gpiochip" devices and the individual "gpio" devices.

Device driver

Device drivers talk directly to hardware, and may be associated to many devices, because same driver may be compatible with many similar ICs.
A device driver defines a table with a list of device identifier it is able to manage:


static const struct pci_device_id rhine_pci_tbl[] = {
{ 0x1106, 0x3043, PCI_ANY_ID, PCI_ANY_ID, },    /* VT86C100A */
{ 0x1106, 0x3053, PCI_ANY_ID, PCI_ANY_ID, },    /* VT6105M */
{ }     /* terminate list */
};
MODULE_DEVICE_TABLE(pci, rhine_pci_tbl);


This list is passed to the bus, so it can locate that driver, for a compatible device.

sysfs

sysfs is a virtual file system which is mounted on /sys and offers a mechanism to export device model information to the user space.
A special locations inside sysfs which should be mentioned are:
  • /sys/bus - contain the list of all buses
  • /sys/devices - contains a list of all devices
  • /sys/class - enumerate devices by class (for example: net, input, block)
 You may use the systool utility to list devices by bus, class and topology.

Bottom-up presentation of the device model

kobject

The kobject is the fundamental structure that holds the device model together. Tasks which are handled by kobject include:
  • Reference counting of objects - used for receiving a notification if no one is using the object anymore.
  • Sysfs representation - kobjects used to create a representation of kernel objects in sysfs.
  • Hot plug event handling - handles the generation of events that notify user space about the comings and goings of hardware on the system.
The purpose of kobject is extending kernel objects providing their capabilities to object which contains them. Kernel object are represented as structure which embed kobject structure inside them.
 
When a kernel object which is a structure contains a kobject pointer, it actually being extended, because, now it has an access to all kobject capabilities when needed.

In sysfs, each directory is represented as a kobject, and all files in that directory are 'attributes' of that kobject.

ktype

A ktype is the type of the object which embeds a kobject. The ktype controls what happens to the kobject when its destroyed and also its default representation in sysfs.

ktype is represented by the structure below:

struct kobj_type {
        void (*release)(struct kobject *);
        struct sysfs_ops        * sysfs_ops;
        struct attribute        ** default_attrs;
};

default_attrs - each file in sysfs is considered an attribute and is related to some kobject.
sys_ops -
struct sysfs_ops {
        ssize_t (*show)(kobj, struct attribute *attr, char *buf);
        ssize_t (*store)(kobj, struct attribute *attr, const char *, size_t);
}; 

These are pointers to user specific handlers, allowing to handle data write/read
to/from kobject's attribures.

kset

A kset is a container type of a collection of a kobjects which of the same type.
When you see a sysfs directory full of other directories, generally each of those directories corresponds to a kobject in the same kset.

subsystem

A subsystem is a representation for a high-level portion of the kernel as a whole.
Subsystems tend to correspond to toplevel directories in the sysfs hierarchy.

$ ls /sys
block  bus  class  devices  firmware  module

Their names in the source tend to end in _subsys (produced by the macro decl_subsys()).

You can find in the source:  system_subsys, block_subsys, bus_subsys, class_subsys, devices_subsys, firmware_subsys, class_obj_subsys, acpi_subsys, edd_subsys, vars_subsys, efi_subsys, cdev_subsys, module_subsys, power_subsys, pci_hotplug_slots_subsys.

Some are not visible in sysfs.

Each kset is contained by a subsystem. A subsystem handles a special semaphore which is used to letting only one process at a time adding/removing kobjects to its kset.

struct subsystem {
        struct kset             kset;
        struct rw_semaphore     rwsem;
};

So, now after reading all these, check out kobject and kset structs:

struct kobject {
       const char              *name;
       struct kobject          *parent;
       struct kset             *kset;
       struct kobj_type        *ktype;
       struct kref             kref;

       ...
};

name - each kobject has a name, this will be the directory name on sysfs. 
parent -  a pointer used to position the object in the sysfs hierarchy.
kref -  One of the key functions of a kobject is to serve as a reference counter for the object in which it is embedded. As long as references to the object exist, it must continue to exist.


struct kset {
      struct subsystem *subsys;
      struct kobj_type *ktype;
      struct list_head list;
      struct kobject kobj;
      struct kset_hotplug_ops *hotplug_ops;
};

hotplug_ops - Ksets can support the "hotplugging" of kobjects and influence how uevent events are reported to user space.


A Kset contain its own kobject, which is managed by kset core code automatically, and thats why it is also a directory in sysfs.

The relation between a kset and kobjects which belong to that kset:


A kset can be used by the kernel to track "all block devices" or "all PCI device drivers.", because they are represented by kobjects related to a subsystems.

After a kobject has been registered with the kobject core, you need to announce to the world that it has been created using:

int kobject_uevent (kobj, KOBJ_ADD);

This will notify the device manager at user space (udev) to handle the new changes.
When kobject is removed, kobject_uevent(kobj, KOBJ_REMOVE) will be called automatically.

Thats it for today...

 

Monday, December 12, 2016

Interrupt Handling

Basic Interrupt explanation

Interrupt is a signal to the processor emitted by hardware (for example: mouse moved) or software (for example: exceptions in CPU like divide by zero, page fault) indicating an event that needs immediate attention.

This is the flow of handling an interrupt:
  • When an interrupt arrives, the CPU looks if it is masked, if so, it ignores it.
  • There is a special location called - the interrupt vector, in which it locates, where in memory, located the interrupt handler function for the current interrupt number.
  • The CPU masks interrupts and saves the contents of some registers in some place and execute the interrupt handler.
  • When the handler finishes executing, it executes a special return-from-interrupt instruction that restores the saved registers and unmasks interrupts.

Interrupts in Linux

In Linux, each interrupting device gets an interrupt request number (IRQ). When the processor detects that an interrupt has been generated on an IRQ, it stops what it's doing and invokes an interrupt service routine (ISR) registered for the corresponding IRQ.
To compensate for interrupting the current thread of execution, ISRs are executed in a restricted environment called interrupt context (or atomic context).

Process Context and Interrupt Context

Kernel code that services system calls issued by user applications runs on behalf of the corresponding application processes and is said to execute in process context. Interrupt handlers, on the other hand, run asynchronously in interrupt context.

Kernel code running in process context is preemptive by the scheduler. An interrupt context, however, always runs to completion and is not preemptive.
Because scheduler only handle process context, if you go to sleep, nothing will resume the code later on. Thats why, you may not use mutexes or any other code that may sleep inside ISR.

More about ISRs

  • Different instances of the same ISR will not run simultaneously on multiple processors, because while ISR run, the IRQ is disabled.
  • Different IRQs may be handled at the same time by different CPUs.
  • Interrupt handlers can be interrupted by handlers associated with IRQs that have higher priority. To prevent this, you must declare your ISR as fast handler. Fast handlers run with all interrupts disabled on the local processor.

How to request an IRQ

int request_irq (
    unsigned int irq,
    irq_handler_t handler,
    unsigned long irqflags,
    const char * devname,
    void * dev_id);
 


irq - Interrupt line to allocate
handler - Function to be called when the IRQ occurs
irqflags - Interrupt type flag, for example:
  • IRQF_TRIGGER_RISING - interrupt is generated on rising edge.
  • IRQF_TRIGGER_HIGH -  interrupt is generated if level is high.
  • IRQF_SHARED - specify that this IRQ is shared among multiple devices, each such a device should specify that flag.
  • SA_INTERRUPT - mark the ISR as fast, which tells the kernel to disable other interrupts on current CPU until it finished (otherwise, kernel may jump to another ISR, if its IRQ has higher priority). Other CPUs may still handle other IRQs.   
devname - Name for the claiming device, used in /proc/interrupts for identification.
dev_id - A cookie passed back to the handler function
 

Top half vs bottom half

ISRs must they work fast, because of the following reasons:
  • While ISR run, it doesn't let other interrupts to run (interrupts with higher priority will run).
  • Interrupts with same type will be missed.
In Linux interrupts are design in two part:
  1. Top half - the ISR part which interact with hardware and should exit as soon as possible.
  2. Bottom half - The relaxed part that does most of the processing with all interrupts enabled. The kernel decides when to execute them.
If the interrupt handler function could process and acknowledge interrupts within few microseconds consistently, there is absolutely no need for top-half/bottom-half delegation.

Bottom halves

There are 4 bottom half mechanizes available in Linux: Threaded IRQs, softirqs, tasklets, work-queue.  

threaded IRQs

A driver that wishes to request a threaded interrupt handler will use

int request_threaded_irq (

unsigned int irq,
irq_handler_t handler,
irq_handler_t thread_fn,
unsigned long irqflags,
const char * devname,
void * dev_id);

So, as you may notice it is a pretty similar to request_irq() function discussed already.
handler
Function to be called when the IRQ occurs. This is the top half and should return IRQ_WAKE_THREAD which will wake up the handler thread and run thread_fn.
thread_fn
This is the bottom half handler which runs in a process context.
Threaded IRQ handlers are preferred for bottom-half processing that would spill over half a jiffy consistently (e.g., more than 500 microseconds if CONFIG_HZ is set to 1000). 
When you use the request_threaded_irq() function passing two functions as arguments, this would create a kthread which would be visible in the 'ps ax' process listing, for example:

$ ps -ax
  465 ?        S      4:16 [irq/34-iwlwifi]
6334 ?        S      0:00 [irq/35-mei_me]

The numbers 34,35 are irq numbers and 'iwlwifi', 'mei_me' are device name related to the IRQs.

When the thread_fn completes, the associated kthread would take itself out of the runqueue and remain in blocked state until woken up again by the top half again.

Softirqs

They are used only by a few performance-sensitive subsystems such as the networking layer, SCSI layer, and kernel timers. 

Different instances of a softirq can run simultaneously on different processors, thats why their content must be protected with spinlocks. 

To define a softirq, you must statically add an entry to include/linux/interrupt.h, which means compiling the kernel.

Softirq run at a high-priority in an interrupt context, with scheduler preemption being disabled, not letting CPU to handle other processes/threads until it complete. If functions registered in SoftIRQs fail to finish within a jiffy (1 to 10 milliseconds based on CONFIG_HZ of the kernel), in which case it impact the responsiveness of the kernel, any new SoftIRQs raised by ISRs would be delegated to run in process context via ksoftirqd thread, thus making SoftIRQs compete for their CPU share along with other processes and threads on the runqueue. In general, SoftIRQs are preferred for bottom-half processing that could finish consistently in few 100 microseconds (well within a jiffy).

void __init mymodule_init()
{
 open_softirq(SOFT_IRQ_NUM, bottom_half_handler, NULL);
}

/* The bottom half */
void bottom_half_handler()
{
 //DO SOME BOTTOM-HALF WORK HERE
}

/* The top half */
static irqreturn_t some_interrupt(int irq, void *dev_id)
{
 //DO SOME TOP-HALF WORK HERE
 
 raise_softirq(SOFT_IRQ_NUM);
 return IRQ_HANDLED;
}
 

Tasklets

Tasklets are built on top of softirqs and are easier to use. It's recommended to use tasklets then softirqs unless you have crucial scalability or speed requirements. Tasklet can't be run in parallel on different CPUs, but different types of tasklets may be run simultaneously.

Tasklet can be statically allocated using DECLARE_TASKLET(name, func, data) or can also be allocated dynamically and initialized at runtime using tasklet_init(name, func, data) 

A tasklet can be scheduled to execute at normal priority or high priority. The latter group is always executed first. 

Tasklets are a bottom-half mechanism built on top of softirqs, they are represented by two softirqs: HI_SOFTIRQ and TASKLET_SOFTIRQ. Tasklets are actually run from a softirq. The only real difference in these types is that the HI_SOFTIRQ based tasklets run prior to the TASKLET_SOFTIRQ tasklets.

struct roller_device_struct { /* Device-specific structure */
/* ... */
struct tasklet_struct tsklt;
/* ... */
}
void __init roller_init()
{
struct roller_device_struct *dev_struct;
/* ... */
/* Initialize tasklet */
tasklet_init(&dev_struct->tsklt, roller_analyze, dev);
}
/* The bottom half */
void
roller_analyze()
{
/* Analyze the waveforms and switch to
polled mode if required */
}
/* The interrupt handler */
static irqreturn_t
roller_interrupt(int irq, void *dev_id)
{
struct roller_device_struct *dev_struct;
/* Capture the wave stream */
roller_capture();
/* Mark tasklet as pending */
tasklet_schedule(&dev_struct->tsklt);
return IRQ_HANDLED;
}

Work queues

 https://arkadiviner.blogspot.co.il/2016/11/work-queues.html

 

Summary



Threaded IRQ
(bottom-half)
Softirqs
Tasklets
Work Queues
Execution
context
process context
interrupt context
interrupt context
process context
Reentrancy
cannot run
simultaneously on different CPUs.
can run simultaneously on different CPUs.
cannot run
simultaneously on different CPUs.
Different tasklets may run on different CPUs.
cannot run
simultaneously on different CPUs.
Sleep
semantics
May go to sleep.
Cannot go to sleep.
Cannot go to sleep.
May go to sleep.
Preemption
The IRQ thread has a higher priority than work-queue thread.
They run at a high-priority with scheduler preemption being disabled, so processes/threads wait for them. If softirq doesn’t release the CPU in more than 1 jiffy, and another softirq pending execution, they will be executed in a kthread.


Cannot be
preempted/scheduled.
May be
preempted/scheduled.
When to use
Threaded IRQs are preferred if interrupt processing can take consistently long periods of time (exceeding a jiffy in most cases)
If you must do work at real-time, without sleeps.
If you don’t need to sleep.
If you go to sleep.

In general, tasklets are less recommended bottom-half interface because they constrained not to sleep and to execute on the same CPU.


Saturday, December 10, 2016

CHIP - my new linux toy!

Few days ago I have received my CHIP board.
It is a 9$ board which has very nice set of features:
  1. Runs Linux.
  2. Open source hardware.
  3. Good documentation.
  4. WiFi B/G/N Built-in.
  5. 1GHz Processor
  6. 4GB of Storage
  7. 512MB of RAM
  8. Bluetooth 4.0

The first thing I did is downloading a debian server image and installed it via CHIP flasher utility. I work with Linux Ubuntu which is supported very well, just had to install the flasher utility which is made as Google Chrome plugin.

Next, I connected to the board via serial emulation:

minicom --baudrate 115200 --device /dev/ttyACM0

(user name: chip, password: chip)

To turn it off safely, do:
$ sudo shutdown now

I will present you many experiences which I will do on that board, so be tuned ;-)

Friday, November 25, 2016

Completions

Many parts of the kernel initiate certain activities as separate execution threads and then wait for those threads to complete. The completions interface made just for that.

Look at the example:

static DECLARE_COMPLETION(my_thread_completion);
static DECLARE_WAIT_QUEUE_HEAD(my_wait_queue);
int request_to_end_thread = 0;

static int my_thread(void *unused)
{
 DECLARE_WAITQUEUE(wait, current);
 daemonize("my_thread");
 add_wait_queue(&my_wait_queue, &wait);

 while (1)
 {
  //Request to opt out of the scheduler run queue.
  set_current_state(TASK_INTERRUPTIBLE);

  //asks the scheduler to choose and run a new task from its run queue.
  schedule();

        /* The thread is put back into the scheduler run queue.
  This could only happend because of an event over the wait queue. */
  if (request_to_end_thread) 
  {
   break;
  }
 }

 //Bail out of the wait queue
 __set_current_state(TASK_RUNNING);
 remove_wait_queue(&my_wait_queue, &wait);

 //Atomically signal completion and exit
 complete_and_exit(&my_thread_completion, 0);
}

static int __init my_init(void)
{
 kernel_thread(my_thread, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);
}

static void __exit my_release(void)
{
 request_to_end_thread = 1;
 wake_up(&my_wait_queue);
 
        //this function blocks until my_thread exit. 
        wait_for_completion(&my_thread_completion);
}


Note: There may be more then 1 waiter for the same completion, so the thread which notify its completion may use:
void complete_all(struct completion *c);
to notify all waiters.

Saturday, November 19, 2016

Notifier Chains

A Linux kernel consists from many different subsystems.
Often one subsystem wants to be notified of something happening in another subsystem. Notifier Chains solve this problem by providing a way for different subsystems to subscribe on asynchronous events from other subsystems.

Examples of such notifications include the following:
  1. Die notification - sent when a kernel function triggers a trap or a fault.
  2. Net device notification - sent when a network interface goes up or down.
  3. Internet address notification - Sent when IP is changed on a network interface.
A custom notifier may be written for some kind of a new event, example will be shown later...

To be able to receive notification, a user must register its handler function with a specific notifier.

A handler function signature is:
int some_handler(struct notifier_block *self, unsigned long val, void *data)
  • val - type of event
  • data - pointer to some data structure
Both value and data have different kind of content and it depends on the kind of event notifier.

For example, for - die event, val equals - 0 if it is an "oops" event, and data points to a structure which contains all CPU registers status.

Some example code:

#include <linux/notifier.h>
#include <asm/kdebug.h>
#include <linux/netdevice.h>
#include <linux/inetdevice.h>

/* Net Device notifier definition */
static struct notifier_block my_net_dev_notifier = 
{
 .notifier_call = my_net_dev_event_handler,
};

/* Net Device notification event handler */
int my_net_dev_event_handler(struct notifier_block *self, 
    unsigned long val, void *data)
{
 printk("my_net_dev_event: status=%ld, Interface=%s\n", val, 
        ((struct net_device *) data)->name);
 return 0;
}

static int __init my_init(void)
{
 // Register Net Device Notifier
 register_netdevice_notifier(&my_net_dev_notifier);
}

You can generate this event by changing a state of a network interface, for example:
$sudo ifconfig eth0 up

You should receive a message:
my_net_dev_event_handler: status=1, Interface=eth0

When status=1, it means NETDEV_UP event.

Some example of a custom of a custom notifier:

#include <linux/notifier.h>
#include <asm/kdebug.h>

/* User-defined notifier chain implementation */
static BLOCKING_NOTIFIER_HEAD(my_notif_chain);

static struct notifier_block my_notifier = 
{
 .notifier_call = my_event_handler,
};

/* User-defined notification event handler */
int my_event_handler(struct notifier_block *self, 
                        unsigned long val, void *data)
{
 printk("my_event_handler: status=%ld\n", val);
 return 0;
}

/* Driver Initialization */
static int __init my_init(void)
{
 // Register a user-defined Notifier
 blocking_notifier_chain_register(&my_notif_chain,  
                                    &my_notifier);
} 

You may invoke this new custom event anywhere in the code by calling:
blocking_notifier_call_chain(&my_notif_chain, 1000, NULL);
You should receive a message:
my_event_handler: status=1000

Note:
We have declared our notification chain using - BLOCKING_NOTIFIER_HEAD
and registered our notifier using blocking_notifier_chain_register() function. By doing that, we declared that the notifier will be called from a process and because of that the handler function is allowed to sleep.
Now, if your notifier can be called from interrupt context, declare the notifier chain using ATOMIC_NOTIFIER_HEAD(), and register it via atomic_notifier_chain_register().

Thats all for now... Thanks for reading!


Work queues

Hi,

Today I want to talk about work queues.
Work queues are used in situations when the caller cannot do the intended action itself, for instance because it is an interrupt service routine and the work is too long for an interrupt, or is otherwise inappropriate to run in an interrupt (because it requires a process context).

Your code that should run in a later time is called a "work".
"Work" is some action that should complete in a reasonable time, because multiple work items share the the same worker thread.

When you want to use the work queue mechanism, you have 3 options to create a kernel thread:
  1. singlethread_workqueue() - creates a work queue on a single CPU.
  2. create_workqueue() - To create one worker thread per CPU in the system.
  3. Use the default work queue. These are per-CPU worker threads, which were created at the time of boot up, that you can timeshare.
A work can be submitted to a dedicated queue using queue_work(), or to the default kernel worker thread using schedule_work().

Example code:

#include <linux/workqueue.h>
 
struct workqueue_struct *wq;
 
static int __init mymodule_init(void)
{
    /* ... */
    wq = create_singlethread_workqueue("my_wq");
    return 0;
}

int add_work_to_my_wq(void (*func)(void *data), void *data)
{
    struct work_struct *some_work;
    some_work = kmalloc(sizeof(struct work_struct), GFP_KERNEL);
    /* Init the work structure */
    INIT_WORK(some_work, func, data);
    /* Enqueue Work */
    queue_work(wq, some_work);
    return 0;
}

User may submit work to a work queue with a delay request, using:
int queue_delayed_work(struct workqueue_struct *queue,
    struct work_struct *work, unsigned long delay);

NOTE: Work queues API is only available to modules which are declared under GPL license, using:

MODULE_LICENSE("GPL");

Friday, November 11, 2016

Kernel threads

So, today we will talk about Kernel Threads.

What are Kernel threads

The purpose of a kernel threads is to make a tasks which will run in the background. Usually kernel threads wait for some asynchronous events and then wake up to serve them. For example, the kswapd is kernel thread which runs in the background and wait for expiration of a swap timer, then it handles swapping memory to disk if the number of free pages in the system is low.
If you type the following in your terminal:
$ ps -aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0 185512  6100 ?        Ss   אוק06   0:04 /lib/systemd/systemd --system --deserialize 28
root         2  0.0  0.0      0     0 ?        S    אוק06   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    אוק06   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   אוק06   0:00 [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S    אוק06   0:23 [rcu_sched]
root         8  0.0  0.0      0     0 ?        S    אוק06   0:00 [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    אוק06   0:00 [migration/0]
root        10  0.0  0.0      0     0 ?        S    אוק06   0:00 [watchdog/0]
root        11  0.0  0.0      0     0 ?        S    אוק06   0:00 [watchdog/1]
root        12  0.0  0.0      0     0 ?        S    אוק06   0:00 [migration/1]
root        13  0.0  0.0      0     0 ?        S    אוק06   0:00 [ksoftirqd/1]
...

All this processes which name is surrounded by square brackets are kernel threads.

Lets see some code:

Low level kernel thread usage

I will begin, by showing you a low level thread API, then we look at higher level API which should be usually used.

This is how to create a linux thread:

ret = kernel_thread(mythread, NULL,
CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);

There parameters we pass here tells which resources should be shared between the parent thread and the child thread. For example - CLONE_FILES, share opened files, CLONE_SIGHAND, share signal handlers.
Each thread in Linux has a single parent and usually the new created threads are re-parented to kthreadd thread, to avoid the new thread to become a zombie if a parent process dies without waiting for its child to exit.



static DECLARE_WAIT_QUEUE_HEAD(myevent_waitqueue);
rwlock_t myevent_lock;

static int mythread(void *unused)
{
    unsigned int event_id = 0;
    DECLARE_WAITQUEUE(wait, current);
 
    /* performs initial housekeeping and changes the parent of the calling thread 
    to a kernel thread called kthreadd */
    daemonize("mythread");
 
    /* all signals are blocked by default, so we need to enable a particular signal. */
    allow_signal(SIGKILL);
 
    /* The thread sleeps on this wait queue until it's woken up */
    add_wait_queue(&myevent_waitqueue, &wait);
 
    for (;;) 
    {
        /* Request to opt out of the scheduler run queue */
        set_current_state(TASK_INTERRUPTIBLE); 

        /* asks the scheduler to choose and run a new task from its run queue */ 
        schedule();
 
        /* At this point scheduler executes another task and it will not execute
           the current kthread anymore. */
 
        /* Die if SIGKILL received, it is the only signal that may be received... */
        if (signal_pending(current)) 
        { 
            break;
        } 
 
        /* The thread is put back into the scheduler run queue.
           This could only happend because of an event over the wait queue. */
        read_lock(&myevent_lock); 
 
        /* Do something here... */
 
        read_unlock(&myevent_lock);
    } 
 
    /* 
    Changed to TASK_RUNNING, so there is no race condition even if the wake up occurs 
    between the time the task state is set to TASK_INTERRUPTIBLE and the time schedule() 
    is called.
    */
    set_current_state(TASK_RUNNING);
 
    remove_wait_queue(&myevent_waitqueue, &wait);
 
    return 0;
}

Notes about kthread

Kernel threads may be preempted if the kernel was compiled with CONFIG_PREEMPT flag.A code may prevent preemption, even of the CONFIG_PREEMPT flag, by disabling local interrupts.

A kernel thread can be in any of the following process states:

  • TASK_RUNNING: In the scheduler run queue, so should run in future.
  • TASK_INTERRUPTIBLE: Waiting for an event and is not in the
    scheduler run queue.
  • TASK_UNINTERRUPTIBLE: Receipt of a signal will not put the task back into the run queue.
  • TASK_STOPPED: Stopped execution as a result of receiving a signal.
  • TASK_TRACED: If traced by application such as strace.
  • EXIT_ZOMBIE: His parent didn't wait for it to complete.
  • EXIT_DEAD: Exited gracefully.
 
set_current_state()
Selects kthread's state.

To wake up a thread which waits on a wait queue, we use wake_up_interruptible(&myevent_waitqueue) function.

The higher level thread API

This API is actually one of several kernel helper interfaces which exist to make a kernel programmer's life a little easier...



#include <linux/kthread.h>

static int my_thread(void *unused)
{
    DECLARE_WAITQUEUE(wait, current);

    while (!kthread_should_stop()) 
    {
        /* ... */
    }

    __set_current_state(TASK_RUNNING);
    remove_wait_queue(&my_thread_wait, &wait);

    return 0;
}

void my_thread_init(void)
{
    my_task = kthread_create(my_thread, NULL, "%s", "my_thread");
}

void my_thread_wakeup(void)
{
    if (my_task)
    {
        wake_up_process(my_task);
    }
}

void my_thread_release(void)
{
    kthread_stop(my_task);
}

This is a more proper way to handle threads.

This is for today, thanks for reading!