Guides to Kernel Hacking.

Guides to Kernel Hacking
By Kumar Gaurav,B.Tech(E.C.E),New Delhi.

#1: Introduction to the kernel

What is a Kernel?

In the simplest of terms, a kernel can be thought of as a central component that provides basic services for every part of the operating system. It is the thing that loads first when your computer boots (obviously after the boot loader) and does memory management, process management, and device management. It acts as a bridge between your applications and the hardware.

Newbie note: We say a computer “boots up” from the old saying – a joke, really – “pull yourself up by your bootstraps.” A boot loader is the first software program that runs when a computer starts. The boot loader looks at the information in the BIOS (basic input/output system) chip on your motherboard in order to learn what hardware is available to it. Then it launches the kernel software, which the boot loader usually will have found on a hard drive.

The existence of a kernel can be attributed to the design of computer system as a series of abstraction layers, which rely on each other. So a kernel can also be termed as the lowest level of the abstraction implemented in the software.

Newbie note: What the heck are abstraction layers? You could think of abstraction layer5s as being like levels of a tall office building. The boot loader is the very bottom of the foundation of this building, the base upon which all else depends. The kernel is the rest of the foundation, sitting on the boot loader and it underlies all else in this building. Other abstraction layers are more like towers and wings of a building. They all depend upon the kernel but they don't all pile up in a single stack of layers.

Why is a kernel essential?

A kernel becomes an absolute necessity – the foundation for all software running on your computer – as it performs critical related-to-hardware functions, which can mainly be divided as:

i) Memory Management

Memory here refers to the RAM (random access memory) installed on the users system and is used to store instructions and data related to a program. It is the part where an application is loaded before it is executed. The kernel manages memory between different processes and is responsible for how much memory a process can use. If the memory installed on a users system is not sufficient to hold the number of running processes, the kernel decides what to do (like allocating 'virtual memory for idle processes or switching memory from RAM to a hard drive and back into RAM.

Newbie note: RAM is contained on one or more chips on your motherboard. Your computer runs much faster when it is using only RAM instead of moving memory onto and off of a hard drive.

ii) Process Management

The kernel is also responsible for managing the CPU (central processing unit) between the different applications.

Newbie note: There is a chip on your mother board that holds one or more processing units. For example, a “dual core” CPU actually holds two processing units. Nowadays all CPUs also contain some memory, and because this memory is on the same chip with the CPU it is even faster than RAM. This is why some people are willing to pay much more for a CPU chip that has a large amount of memory on the chip. Depending upon how the kernel is designed, your computer might be about to use dozens of processing units. That is an advantage of some Linux kernels. Windows, by contrast, can't make much use of more than four processing units on a single computer.

The kernel decides about how the running applications must be allocated to the processor(s) as a processor usually supports one process at a time (multi threading processors are an exception to this rule), but the kernel may also behave as if it is running more threads that a computer is physically able to support, a term more commonly known as multitasking. The kernel decides which processes to run and for how much time using a scheduling algorithms, the discussion of which is out of this guide's scope.

Newbie note: Your computer seems as if it is running many processes at once, but in reality each of its processors take turns running them so fast you usually won't notice it. For example, as I write this Newbie Note, my computer is running a word processor, keeping up an Ethernet connection through a router to the Internet, and is running the Chrome browser which I am using to look things up for this Guide. The kernel manages all this swapping back and forth among tasks.

iii) Device Management

The applications need the peripherals attached to a system for various types of features it may provide, like printing, displaying etc. The kernel controls all the devices using device drivers, if a particular device driver is missing the kernel is unable to provide you the services that the device provides.

The kernel may know all the devices attached in advance (embedded systems), or it may auto detect them (plug and play, which is the standard in use today), or sometimes you might have to configure it yourself (non plug and play).

#2: Kernel Bloat

Here we discuss kernel bloat, not just in regards to the Linux kernel, but also in regards to every kernel that exists.

Newbie note: Each Windows operating system gets bigger than the last, and in my opinion, Windows 7 has the most bloated kernel in history.

The term "kernel bloat" is a debated and VERY subjective one.

So what does kernel bloat actually mean? Does it mean the presence of 'things' that you do not use or simply what others think is bloat for you too? The answer lies somewhere in middle of these extremities.

Suppose you are using your computer to browse the internet, are you using every single line of code present and compiled with the kernel? You might be using only a faction of that stuff. Bloat can be the presence of unnecessary modules in the kernel or anything that you think it is (We told you before, it is a VERY subjective term).

For developers the presence of a large number of driver modules is something they wish as they might need it to run on many systems that they might encounter, but for a performance enthusiast it is a mere waste of system resources. So you see how things differ as we move from a particular group of people to another.

As a security threat

Here we will see how running a lot of code inside the supervisor mode poses a security threat and efficiency problems.

As you know the kernel itself runs in a dedicated memory space, and it has the power to control each and every system resource present in the system. So running a lot of code inside that dedicated memory resources poses a big security threat. More code means more bugs and more bugs means more chances of something getting wrong. If something in that part of RAM undergoes a buffer underflow (the most common type of problem) it can bring the whole system down or might actually lead to data loss.

Also if a kernel is compiled with a lot of modules, it acquires a large chunk of your memory which ultimately leads to a less amount available for applications and directly affecting your system resources as NOTHING is ever allowed to enter the 'kernel space'.

#3: Memory Management

Memory Management

Each process has its own private address space. The address space is initially divided into three logical segments: text, data, and stack. The text segment is read-only and contains the machine instructions of a program. The data and stack segments are both readable and writeable. The data segment contains the initialized and uninitialized data portions of a program, whereas the stack segment holds the application's run-time stack.

On most machines, the stack segment is extended automatically by the kernel as the process executes. A process can expand or contract its data segment by making a system call, whereas a process can change the size of its text segment only when the segment's contents are overlaid with data from the file system, or when debugging takes place. The initial contents of the segments of a child process are duplicates of the segments of a parent process.

The entire contents of a process address space do not need to be resident for a process to execute. If a process references a part of its address space that is not resident in main memory, the system pages the necessary information into memory. When system resources are scarce, the system uses a two-level approach to maintain available resources. If a modest amount of memory is available, the system will take memory resources away from processes if these resources have not been used recently.

Should there be a severe resource shortage, the system will resort to swapping the entire context of a process to secondary storage. The demand paging and swapping done by the system are effectively transparent to processes. A process may, however, advise the system about expected future memory utilization as a performance aid.

Memory Management Inside the Kernel

The kernel often does allocations of memory that are needed for only the duration of a single system call. In a user process, such short-term memory would be allocated on the run-time stack. Because the kernel has a limited run-time stack, it is not feasible to allocate even moderate-sized blocks of memory on it.

Consequently, such memory must be allocated through a more dynamic mechanism. For example, when the system must translate a pathname, it must allocate a 1-Kbyte buffer to hold the name. Other blocks of memory must be more persistent than a single system call, and thus could not be allocated on the stack even if there was space. An example is protocol-control blocks that remain throughout the duration of a network connection.

Demands for dynamic memory allocation in the kernel have increased as more services have been added. A generalized memory allocator reduces the complexity of writing code inside the kernel. Thus, the 4.4BSD kernel has a single memory allocator that can be used by any part of the system. It has an interface similar to the C library routines malloc and free that provide memory allocation to application programs. Like the C library interface, the allocation routine takes a parameter specifying the size of memory that is needed. The range of sizes for memory requests is not constrained; however, physical memory is allocated and is not paged. The free routine takes a pointer to the storage being freed, but does not require the size of the piece of memory being freed.

The memory management subsystem is one of the most important parts of the operating system. Since the early days of computing, there has been a need for more memory than exists physically in a system. Strategies have been developed to overcome this limitation and the most successful of these is virtual memory. Virtual memory makes the system appear to have more memory than it actually has by sharing it between competing processes as they need it.
Virtual memory does more than just make your computer's memory go further. The memory management subsystem provides:

Large Address Spaces
The operating system makes the system appear as if it has a larger amount of memory than it actually has. The virtual memory can be many times larger than the physical memory in the system,
Protection
Each process in the system has its own virtual address space. These virtual address spaces are completely separate from each other and so a process running one application cannot affect another. Also, the hardware virtual memory mechanisms allow areas of memory to be protected against writing. This protects code and data from being overwritten by rogue applications.
Memory Mapping
Memory mapping is used to map image and data files into a processes address space. In memory mapping, the contents of a file are linked directly into the virtual address space of a process.
Fair Physical Memory Allocation
The memory management subsystem allows each running process in the system a fair share of the physical memory of the system,
Shared Virtual Memory
Although virtual memory allows processes to have separate (virtual) address spaces, there are times when you need processes to share memory. For example there could be several processes in the system running the bash command shell. Rather than have several copies of bash, one in each processes virtual address space, it is better to have only one copy in physical memory and all of the processes running bash share it. Dynamic libraries are another common example of executing code shared between several processes.
Shared memory can also be used as an Inter Process Communication (IPC) mechanism, with two or more processes exchanging information via memory common to all of them. Linux supports the Unix TM System V shared memory IPC.

Swapping Out and Discarding Pages

When physical memory becomes scarce the Linux memory management subsystem must attempt to free physical pages. This task falls to the kernel swap daemon (kswapd).

The kernel swap daemon is a special type of process, a kernel thread. Kernel threads are processes have no virtual memory, instead they run in kernel mode in the physical address space. The kernel swap daemon is slightly misnamed in that it does more than merely swap pages out to the system's swap files. Its role is make sure that there are enough free pages in the system to keep the memory management system operating efficiently.

The Kernel swap daemon (kswapd) is started by the kernel init process at startup time and sits waiting for the kernel swap timer to periodically expire.

Every time the timer expires, the swap daemon looks to see if the number of free pages in the system is getting too low. It uses two variables, free_pages_high and free_pages_low to decide if it should free some pages.

So long as the number of free pages in the system remains above free_pages_high, the kernel swap daemon does nothing; it sleeps again until its timer next expires. For the purposes of this check the kernel swap daemon takes into account the number of pages currently being written out to the swap file. It keeps a count of these in nr_async_pages; this is incremented each time a page is queued waiting to be written out to the swap file and decremented when the write to the swap device has completed. free_pages_low and free_pages_high are set at system startup time and are related to the number of physical pages in the system.

If the number of free pages in the system has fallen below free_pages_high or worse still free_pages_low, the kernel swap daemon will try three ways to reduce the number of physical pages being used by the system:

Reducing the size of the buffer and page caches
Swapping out System V shared memory pages
Swapping out and discarding pages.
If the number of free pages in the system has fallen below free_pages_low, the kernel swap daemon will try to free 6 pages before it next runs. Otherwise it will try to free 3 pages.
Each of the above methods are tried in turn until enough pages have been freed. The kernel swap daemon remembers which method it was using the last time that it attempted to free physical pages. Each time it runs it will start trying to free pages using this last successful method.
After it has free sufficient pages, the swap daemon sleeps again until its timer expires. If the reason that the kernel swap daemon freed pages was that the number of free pages in the system had fallen below free_pages_low, it only sleeps for half its usual time. Once the number of free pages is more than free_pages_low the kernel swap daemon goes back to sleeping longer between checks.

Reducing the Size of the Page and Buffer Caches

The pages held in the page and buffer caches are good candidates for being freed into the free_area vector. The Page Cache, which contains pages of memory mapped files, may contain unnecessary pages that are filling up the system's memory. Likewise the Buffer Cache, which contains buffers read from or being written to physical devices, may also contain unneeded buffers. When the physical pages in the system start to run out, discarding pages from these caches is relatively easy as it requires no writing to physical devices (unlike swapping pages out of memory). Discarding these pages does not have too many harmful side effects other than making access to physical devices and memory mapped files slower. However, if the discarding of pages from these caches is done fairly, all processes will suffer equally.
Every time the Kernel swap daemon tries to shrink these caches
it examines a block of pages in the mem_map page vector to see if any can be discarded from physical memory. The size of the block of pages examined is higher if the kernel swap daemon is intensively swapping; that is if the number of free pages in the system has fallen dangerously low. The blocks of pages are examined in a cyclical manner; a different block of pages is examined each time an attempt is made to shrink the memory map. This is known as the clock algorithm as, rather like the minute hand of a clock, the whole mem_map page vector is examined a few pages at a time.
Each page being examined is checked to see if it is cached in either the page cache or the buffer cache. You should note that shared pages are not considered for discarding at this time and that a page cannot be in both caches at the same time. If the page is not in either cache then the next page in the mem_map page vector is examined.
Pages are cached in the buffer cache (or rather the buffers within the pages are cached) to make buffer allocation and deallocation more efficient. The memory map shrinking code tries to free the buffers that are contained within the page being examined.
If all the buffers are freed, then the pages that contain them are also be freed. If the examined page is in the Linux page cache, it is removed from the page cache and freed.
When enough pages have been freed on this attempt then the kernel swap daemon will wait until the next time it is periodically woken. As none of the freed pages were part of any process's virtual memory (they were cached pages), then no page tables need updating. If there were not enough cached pages discarded then the swap daemon will try to swap out some shared pages.

The Swap Cache

When swapping pages out to the swap files, Linux avoids writing pages if it does not have to. There are times when a page is both in a swap file and in physical memory. This happens when a page that was swapped out of memory was then brought back into memory when it was again accessed by a process. So long as the page in memory is not written to, the copy in the swap file remains valid.

Linux uses the swap cache to track these pages. The swap cache is a list of page table entries, one per physical page in the system. This is a page table entry for a swapped out page and describes which swap file the page is being held in together with its location in the swap file. If a swap cache entry is non-zero, it represents a page which is being held in a swap file that has not been modified. If the page is subsequently modified (by being written to), its entry is removed from the swap cache.

When Linux needs to swap a physical page out to a swap file it consults the swap cache and, if there is a valid entry for this page, it does not need to write the page out to the swap file. This is because the page in memory has not been modified since it was last read from the swap file.

The entries in the swap cache are page table entries for swapped out pages. They are marked as invalid but contain information which allow Linux to find the right swap file and the right page within that swap file.

Harmless Hacking

Welcome to Technology Talks

Guides to Kernel Hacking.

Memory Management

Memory Management Inside the Kernel

Swapping Out and Discarding Pages

Reducing the Size of the Page and Buffer Caches

The Swap Cache

0 comments

Post a Comment

Featured Video :Ghost Recon Future Soldier

Tags

SUBSCRIBE TO TECHNOLOGY TALKS FOR FREE

The TOP Windows XP Tricks & Secrets!!

Popular Posts