Hands-on Guide to the Debian GNU Operating System

Davor Ocelic

Last update: Jan, 2012. — Maintain sections up to date, document new developments

Copyright (C) 2002-2012 Davor Ocelic, Spinlock Solutions

This documentation is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

It is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You can find a copy of the GNU General Public License in the common-licenses directory on your system. If not, visit the GNU website.

The latest copy of this Guide can be found at http://techpubs.spinlocksolutions.com/dklar/.

This article is part of Spinlock Solutions's practical 5-piece introductory series containing the Debian GNU Guide, MIT Kerberos 5 Guide, OpenLDAP Guide, OpenAFS Guide and the FreeRADIUS Guide.



Preface

Welcome. The following Guide should help you make the first steps with Debian GNU, an Unix-like operating system.

If you encounter any unexpected problems in using Debian, have patience. Unix is a colorful collection of more than 50 years of professional research and development applied to computer hardware and software and there's a certain learning curve involved.

You need to learn the basics of Unix architectures and operating systems properly and efficiently, so that you can easily chew on a broad range of advanced Unix topics.

Linux has been progressing with an ever-increasing pace over the years, both technically and in widespread adoption, and it is easier than ever to download, install and start using a Linux-based operating system. Many underlying technologies have been successfully wrapped into graphical control panels, obscuring the technical workings of the system. Our aim here is to show the technical aspects of a Linux system that will give you a level of understanding beyond graphical screens and management interfaces.

The Linux kernel and almost all other software are free ("free" as in freedom). This has allowed many projects or companies to package the Linux kernel and tens of thousands of applications in easily-installable and functional wholes called "distributions". Our distribution of choice will be Debian GNU/Linux.

And of all Linux, why exactly Debian? Well, the technical solutions, immediate brain-power available and community organization in Debian GNU outperform all competition by a wide margin. There are some Linux distributions, Unix operating systems (such as Sun Solaris/OpenSolaris, SGI IRIX, QNX...) or kernels that do some specific tasks better, but Debian GNU is definitely a general-purpose winner with an enormous customization potential.

Also, we see that all interesting and exciting new developments are happening in the Linux arena; just focusing on Linux and the popular technologies (i.e. the LAMP -- Linux, Apache, Mysql, Perl, or Git/RoR -- Git revision control and Ruby on Rails) may turn out your cool steady job and source of income.

You should read this Guide after you successfully install some variant of the Debian GNU system to your computer (with or without the help from the Debian installation manual).

The Guide is a balanced mix bewteen the administrator's and the user's guide; it is probably too broad for those who belong to either of the two extreme categories. The approach used should fit home users best — people who do have a Debian installation at hand, and want to learn and experiment.

Our end goal is that you develop the mindset to solve further problems on your own; the basic understanding and general logic matter, not the exact implementation or usage details. In most general terms, we could say we'll try to explain the principles of Unix system design, and how they work in practice using command line and text processing utilities; we will not bother describing point-and-click GUIs and menus; such documentation is available elsewhere (on say, Gnome, XFCE, CDE or KDE websites).

In a Debian guide, we will not hesitate to use Debian-specific features and commands, but note that most of our discussion will, at least generally, apply to other Linux or Unix systems as well. Additionally, by saying this is a beginner's guide, we definitely won't restrict ourselves to system basics; this Guide is hiding many details even experienced users would find useful or amusing.

Please read the following two sections (the Section called Conventions and the Section called Pre-requisites) carefully, as they explain some basic ideas and assumptions followed throughout the Guide.


Conventions

  • Application names are specified in "application" style, which just happens to be plain text with no special formatting. The executables (system commands that you can run) are given in strong, command mode (such as free, top, or ps) and are, at places where it helps readability, enclosed in single quotes (say, 'rm'). At places where we make explicit references to manual pages (program documentation), we'll use the "name(section)" style, such as man(1).

  • All file and directory names are given in "filename" style and, when the context requires exact location on the system, start with a slash("/") or tilde("~") character (/etc/syslog.conf, ~/.bash_profile, or /etc/init.d/). Although not mandated by system behavior, we consistently include "/" at the end of every directory name to make directories directly distinguishable from regular files.

  • Symbols that need to be replaced with your specific value use the REPLACEABLE style — they're simply given in all uppercase, and their final display depends on the enclosing element they're part of. In addition, they're slightly italicized. For example, in a shell command kill -9 PID, "PID" should be substituted for an appropriate value.

  • User session examples (consisting of user input and the corresponding system output) use the "screen" mode. User input is visually prefixed with "$" for user, and "#" for administrator commands. Program output is edited for brevity and has no prefix.

  • Unix, GNU, Debian and Linux are words that can sometimes, depending on a context, be used interchangeably. Throughout the Guide, we are consistent and, on each occasion, use the word with the broadest scope. For instance, we would talk about the Unix command line, GNU development tools, the Debian infrastructure, and Linux process management.

  • When a concrete system username will be needed to illustrate an example, mirko will play the role of an innocent user. In rare cases where two concrete usernames will be needed, ante will show up to help us out.

  • You'll notice that the Guide contains many links to external resources. This can make you unhappy if you find a lot of them interesting, and get distracted from your primary goal — this Guide namely. Therefore:

    • We will always include the minimum of text, the part that is crucial to understanding the subject, directly in the Guide. External links will only provide more detailed information. As a result, it will be possible to read the Guide without following any external links.

    • We will group all links appearing in the Guide in a separate appendix, along with proper descriptions. As a result, it will allow you to focus on the Guide exclusively, and think about all the additional resources later.


Pre-requisites

To make sure you can successfully follow this guide, there are few key points we need to agree upon. Let's assume that you:

  • installed Debian from scratch (scratch = new, clean install). If you did not, and you're reading this on an existing, usable Debian system, have in mind that most of our basic configuration steps were already be performed on your system. The easiest way to install is by using a bootable USB stick; obtain and run program unetbootin which will allow you to create a ready-to-boot USB stick in a matter of seconds

  • have the network properly configured. This is important if you have a decent Internet link and want to install software directly from the Debian repositories on the Internet or your local LAN. You were given the choice to do that at the installation phase

  • have just the base system installed (around 150 megabytes in total). To get a system like this, don't run dselect or tasksel at installation time. But if you do, and end up with a larger set of installed packages, have in mind that most of the basic packages are already installed on your system

  • start working with the system by logging in to the superuser account (your login name is root, and the password is whatever you defined at installation time)


Chapter 1. Overview

 

"On our campus the UNIX system has proved to be not only an effective software tool, but an agent of technical and social change within the University."

 
--in memoriam, John Lions (University of New South Wales) 

Debian GNU software distribution

Free Software distributions, such as Debian GNU, are — as the name implies — comprised of a large number of Free Software programs. That's why, when you pick Debian GNU, you get not only a basic operating system, but a complete environment with more (much more) than 15,000 precompiled, prepackaged and ready to use software programs. Exactly how this complex job gets done in practice is a subject for itself, but the important thing is it's only possible because all of the included programs are released under one of the free, DFSG-compliant licenses.

For you, the end user, this means all the software you'll ever need is already prepared, tested and waiting for you in the Debian repositories. Debian repositories consist of a standard directory hierarchy, software packages (*.deb files), and the accompanying meta data which describes the repository contents. Repositories themselves are not tied to a particular access method or storage medium, so you can access them from almost anywhere: the Internet, LAN, USB sticks, local hard disks or CD/DVD Roms.


Advanced Package Tool

Most computer environments recognize the concept of software installation, and so does Debian GNU. Computer programs that you can install are, in a packaged form, archived in Debian repositories. Before use, they need to be retrieved from the repository, installed (unpacked onto the system and registered with the software management tool) and configured.

Debian specifically has developed a very sophisticated tool for this high level package management called APT (the Advanced Package Tool) that leaves any competition far behind. Combined with an easy and straightforward configuration process, apt is surely one of the winning Debian ideas.


Configuring APT

There's very little you need to do to configure your Debian package management system, but it's a critical step and you need to get it right.

  • remove the /etc/apt/sources.list file, if it exists (run rm /etc/apt/sources.list)

  • if you have any Debian GNU CD/DVD Roms, run apt-cdrom add for each CD you place in the drive

  • Add the following to the file, which will configure the Internet repositories from which you'll install the software:

    Example 1-1. /etc/apt/sources.list

    		deb     http://ftp.CC.debian.org/debian stable main
    		deb-src http://ftp.CC.debian.org/debian stable main
    		deb http://security.debian.org/ stable/updates main
    
    		deb     http://ftp.CC.debian.org/debian testing main
    		deb-src http://ftp.CC.debian.org/debian testing main
    		deb http://security.debian.org/ testing/updates main
    
    		deb     http://ftp.CC.debian.org/debian unstable main
    		deb-src http://ftp.CC.debian.org/debian unstable main
    		
  • run apt-get update to make apt read /etc/apt/sources.list, retrieve package indexes and prepare local cache. Watch out for any error output; if it tells you to run the command one more time, do so

Having configured apt, you will now be able to install all the additional software you'll need. Just make sure to run apt-get update to start off with the up-to-date package lists (running it twice does no harm).


Chapter 2. First steps around the system

 

"Now I know someone out there is going to claim, 'Well then, UNIX is intuitive, because you only need to learn 5000 commands, and then everything else follows from that! Har har har!'"

 
--Andy Bates in comp.os.linux.misc, on "intuitive interfaces", slightly defending Macs 

Base Debian GNU installation

If you have ever tried any of the commercial Linux distributions (such as Ubuntu, Red Hat, Mandrake or SuSE), then the first thing you'll notice about Debian is that you don't get the X Window System or a fancy desktop during the installation (although the new "d-i" installer will perform all hardware auto-detection for you). In essence, you don't get any more than a minimum of system software installed.

The commercial Linux companies have set, would you guess it, their commercial interests as a priority, and the whole auto detection, out of the box and instant graphics ideas are there just to make their products more appealing on the market.

Those companies have played, do play, and will be playing positive roles in both the acceptance and development of free software (Note: today, in 2010, 8 years after I initially wrote this, we can see how much this influenced the Linux world in a positive way), but some of them provide a pretty poor learning or professional platform.

Debian, on the other hand, is the only non-profit organization that sticks to it's original manifesto and can easily afford itself the luxury of pursuing technical excellence before "market time". For those and other reasons that will come to surface along our journey, you'll see that Debian does things the way things should be done.

With Debian GNU and the basic installation, you get a minimal system with the black console and a Login: prompt. We'll move from there on.


Shell and filesystem

When you log in to the system (authenticate typically with your user name and password at the Login: prompt), you'll be confronted with a text command line, something that might remind you of DOS, but that's where their similarity ends (so don't bother with inappropriate comparisons). What you are actually seeing is bash, one of the popular shell programs. In general, shell programs serve as agents between the user and the system (accept commands, return output) and are all, in fact, more or less sophisticated programming languages.

Computer software is based on files. Files live in directories (folders), which are located on virtual or physical, local or network disk partitions. In Unix, every system has a root partition which is mounted (think "associated") as the root directory (denoted by /) and serves as an entry point to the filesystem. For example, /home/mirko/.profile (called the pathname, absolute path, or full path) tells there's a file named .profile which resides in the directory /home/mirko/. In this example, mirko/ is obviously a subdirectory of home/, and home/ is a subdirectory of / — the root directory.

Note

Files beginning with a dot (such as .profile from just above) are called dotfiles. They usually contain configuration settings for various programs and are — as such — considered of secondary interest and usually omitted in directory listings. It's the magic dot (.) at the beginning that makes them "invisible", but only for the purpose of not adding noise to the output, and definitely not in hope of "hiding" files.

Each file in Unix belongs to one user and one user group. Additionally, there are file modes (or file permissions) that control which operations are allowed to file's owner, file's group, and everyone else. The only directories that regular system users can modify are their home directory (usually /home/USERNAME) and system temporary directories (/tmp and /var/tmp).

NoteUnix filesystem permissions
 

Although, in the traditional concept, a file or directory belongs to one user and one group, various advancements have become available over time. Most notably, these include the POSIX Access Control Lists or Role-based Access Control solutions.

ACLs allow files and directories to have potentially different permissions for every system user or group. RBAC systems allow users to be assigned "roles", and thus granted access to role-related files.

It's important to note, however, that the traditional concept that doesn't use ACLs or RBAC still still offers quite acceptable flexibility and satisfies general needs.

Concerning the filesystem "navigation", there's a concept of current directory which tells the active (working) position in the filesystem. You can discover the working directory at any time by typing pwd in your shell. When you log in to the system and the shell starts up, it drops you to your home directory — that is thus your starting point.

In addition, with Unix, you generally don't need to invent the directory locations for the system software you want to install. Software is installed to pre-determined locations, which is a superior scheme. Some of the directories you need to know about are:

  • /home - users' home directories

  • /etc - system-wide configuration files

  • /bin, /usr/bin, /usr/local/bin - directories with executable files (that is, programs you can run). bin comes from binary, not a dust bin

  • /lib, /usr/lib, /usr/local/lib - shared libraries needed to support the applications

  • /sbin, /usr/sbin, /usr/local/sbin - directories with executables supposed to be run by the Superuser

  • /tmp, /var/tmp - temporary directories, watch out as /tmp is, by default, cleaned out on each reboot

  • /usr/share/doc, /usr/share/man - complete system documentation

  • /dev - system device files. In Unix, hardware devices are represented as files

  • /proc - "virtual" directory (it doesn't really exist on the disk) containing files through which you can query or tune Linux kernel settings

Why software is installed to pre-determined locations is easy to explain: Unix and Unix filesystems allow seemingly standard directories and subdirectories to be mounted to completely different places — local disk partitions, network partitions, or RAM disks, to name a few.

When you need to manage location of files and directories with finer granularity, then Unix symbolic and hard links, GNU Stow, or dpkg-divert come to play (but they're all advanced concepts that we'll yet return to).

Note

In the above list of directories, you might have noticed a pattern. For example, there's bin/ directory in all /, /usr/ and /usr/local/.

Directories and files found in the root directory (/) are all official distribution files (those from Debian GNU or other packages, depending what Unix are we talking about), and infact those that are essential for the system to boot up. This is kind of a historic leftover, from times when systems were booted using the "root" tape, and then had another, "user" tape mounted on /usr/. This technique is, for its usefulness, still used today with all properly set-up Unix systems (well, except maybe the Hurd, if you consider it being part of the same league).

Directories and files in /usr/ are also official distribution files, but are not essential for the system to boot up. A large majority of software belongs to this group.

Finally, directories in /usr/local/ contain locally installed software (which does not come from the official Operating System distribution), hence the separate location and name "local".

There's also the share/ directory that you can see floating around; it contains files which are hardware platform-independent and are the same on all supported architectures.


Manipulating software packages

Now, recall apt we mentioned some paragraphs above. As you're left with a minimal system, you need to install few packages to make your Unix life easier. Let's start by typing and executing apt-get install less man-db manpages in your shell.

To see what exactly is each of the programs used for, you could run apt-cache show PACKAGE NAMES. If the output scrolls out of your screen too fast, help yourself with the Shift+PageUP/DOWN keys. If the retrieveable area is still too small, run the command with a buffer: apt-cache show PACKAGE NAMES | less (to exit the "pager" program less, you press q). Anyhow, it will soon become pretty obvious what the above programs are used for, but don't let that prevent you from picking up the implicitly presented tips (apt-cache, Shift+PageUP/DOWN, less).

For a list of all installed packages, run dpkg -l. To see a list of files installed by specific packages, run dpkg -L PACKAGE NAMES, such as dpkg -L sysvinit. To find out which package does a file belong to, run dpkg -S FULLPATH, such as dpkg -S /bin/bash.

To remove a package, run apt-get --purge remove PACKAGE NAMES. The --purge switch deletes the eventual program config files (which are normally left on the system) as well as the program itself.

There's also a program called aptitude which is nowadays the preferred method for software installation. It is generally more sophisticated and has better dependency resolution that the traditional apt-get. To try it out and install a package we'll need, run aptitude install sudo.

Sometimes it's not that easy to guess package names, so you'll need to search for them (based on key words). To do so, use apt-cache search KEY WORDS, such as apt-cache search console ogg player.

Let's now make use of those programs we installed a moment ago.


Administrative account wrapper

Run command id to determine your user name. You should see the UID 0 (unique User ID zero) and the username root. Unix traditionally maintains the concept of a superuser or root, a special account with UID 0, who is free to perform any administration tasks. This is a little different in practice with Linux because it uses a lot more flexible capabilities mechanism, but the principle stays the same. Logging in as root is strongly discouraged, so we will now see how to avoid it. You will always use a regular system account, and only execute specific privileged commands by using a sudo program that will serve as a wrapper for administrative commands.

Create a non-privileged account now by running the adduser mirko command. Add it to the sudo group by running adduser mirko sudo. The sudo config file, /etc/sudoers, is configured to grant admin privileges to the group members.

Alternatively, you can run echo "mirko ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers to grant admin privileges to specific user "mirko". The same effect of this oneline command can, of course, be achieved by opening the file in a text editor, adding the line and saving it.

Having done that, log out completely (by exiting all shells, which means typing logout, exit, or pressing Ctrl+d in all active terminals). Then log back in under the newly created, non-privileged username.

To see the sudo system at work, run id; it should tell mirko is your active user ID. Then run sudo id, and see how the id program is ran with root privileges. You will use sudo and you will not log-in as the superuser any more.

We will use sudo extensively; to edit system configuration files for example, you will use sudo nano FILENAME.


Unix text editors

Most of Unix system configuration is kept in plain text files, so you'll definitely want to pick your favourite text editor (out of a little myriad available ones). You could start with joe, nano or pico (nano is probably the best, as it is included in the base Debian GNU system). These editors are simple and their common keystrokes are listed at the bottom of the screen. Try running nano now to see its simplicity for yourself. (Keep in mind that the "^" character, needed to access nano's menus, represents the Control / Ctrl key on your keyboard.)

The category of professional text editors, however, is reserved for the two old rivals - Richard Stallman's GNU Emacs and Bram Moolenaar's Visual IMproved (or the old traditional vi). Apart from having ultra fast keystrokes, macros, abbreviations, editing modes, syntax highlighting and keyword completion, Vim can literally solve it's way out of a maze (take a look at /usr/share/doc/vim/macros/maze/ once). Another vi's advantage is it's installed on about every Unix system you can think of. References to the relevant Vim tutorial pages are given at the end of the Guide.

If you have no idea which editor to use, try using nano. Then, once later, you might get around to installing Vim and running vimtutor to learn its basics. But don't take this as a joke; Unix is about text (lots of text, even if you'll probably prefer GUIs at first), and learning Vim or GNU Emacs is an absolute necessity.

So invoke sudo update-alternatives --config editor now, and select your favourite text editor of the moment. (Once selected, this editor will be invoked when the generic editor command is run).

From now on, we'll assume you know how to open a file, change it and save the changes back to disk.


Basic system commands

You can't start wandering around the Unix system without being introduced to the basic available commands first. Now that you're using an unprivileged account and can't ruin your Debian GNU installation any more, we'll take a tour of system file and directory manipulation commands.

This section has turned into an action-packed bunch; get into the right mindset and let's roll!

To change directory, use the cd command. Run few variants of it and verify current directory with pwd or echo $PWD. Try and see where each of the commands cd /etc, cd -, cd .., cd $OLDPWD, cd ../.., and cd ~ position you.

The special tilde character (~) in file names expands to the home directory of the user invoking the command. For user mirko, cd, cd ~, cd $HOME or cd /home/mirko would be equivalent. In addition, if mirko wanted to reach ante's home directory, he would only have to type cd ~ante. (This ability to reach user's home directory using a syntax independent of actual system setup is another winning concept in Unix).

The dash(-) and the $OLDPWD environment variable reposition you to the previous working directory. Running the command multiple times would cycle between two most recently used paths.

You can check out the contents of a directory with the ls (list) command; try ls, ls -al /etc, or ls -al / (-a is needed to make ls display the dotfiles — those that start with a dot and are considered "hidden", remember? -l displays the output in list format).

Create new files in your home directory using a text editor — use cd to reposition to your home directory and then run say, nano test. Alternatively, you could use a variant of the command which is not dependent of the current working directory — nano ~/test for example. Try removing (deleting) the created files with rm. Create directories with mkdir, delete them with rm -r or rmdir.

Briefly note that the previous examples used your keyboard as the input and your monitor as the output device. It's important to understand that Unix can redirect such data streams to arbitrary locations (files, pipes, printers, network sockets and more). For example, run ls -al / | grep bin. The output of the ls -al / command will be piped or redirected to the next command, grep, which will filter the input and only display lines which contain the string "bin". This piping (a special case of redirection, where output is redirected to a program) is, you guess it, denoted by the pipe ("|") character.

If you'd like to redirect the output of a command to a file (instead of to the next command, as the above pipe would do), use the > or >> redirects; the difference is the double >> appends content to a file without truncating (erasing, roughly) it first. Try say, ls -al / > /tmp/dirlisting. Note that some interesting variants are possible with this that are equal in effect, such as > /tmp/dirlisting ls -al.

Important thing to say is that each process started usually starts with three data streams open; they're called the standard input, standard output and standard error, identified by system "file descriptor" numbers 1, 2 and 3 respectively. This is where the ">" redirects and "|" pipes come to play -- — they influence those descriptors. For example, ls -al / > /tmp/dirlisting (or "1>" instead of just ">") redirects standard output from the screen into a named file; ls -al 2> /tmp/errors redirects any error output to a different file; ls -al / | grep bin redirects standard input from a keyboard to the output of the previous command. In addition, you can of course mix these, such as ls -al / | grep bin > /tmp/stdout 2> /tmp/stderr, and, depending on the shell you use, you can use ">&" to redirect all output (stdout and stderr) at once. In a typical, non-redirected session, the location for all three channels is /dev/tty, a "magical" file that always corresponds to your current console or terminal.

Sometimes you want the output to appear on both the screen, and saved in a file. This is Unix and it's easily done: ps aux | tee /tmp/ps-aux.listing. We can mention that, in a same way, the output can be "duplicated" to another console (yours or, subject to access permissions, one belonging to a different user). For those who wonder how to get the output on a printer, let's just say ps aux | lp is enough once the printer is configured.

Some other times, you need to create an empty file. For example, try touch test2, > test2 or >> test2. The difference is that, if the file already existed, touch would just update it's modification time, while the single redirection symbol would truncate it as well. Some people also use echo > test2 but they don't figure their file isn't exactly empty — it is of size 1 byte and contains 1 ASCII character which represents "newline" (this happens due the echo command appending a newline to the end, unless -n option is given to it).

You now know four command line methods to create a file; try testing your friends' knowledge.

To save a copy of your interactive (user input-output) session in a file named typescript, just run script. To finish the typescript, just exit the shell (by say, pressing Ctrl+d on an empty line. As we've said earlier, Ctrl+d is a standard combination for "End of Input").

When you type say, ls, an instance of the /bin/ls program becomes a process (a running program). We'll cover processes thoroughly in a separate chapter but, for the moment, let's just say the program you run is assigned a unique process ID (the PID), it starts running on the processor (the CPU) and competes for the processor time with other running programs. To see your current processes, run ps. To see a complete list of processes on the system, run ps aux. To see process list including threads, run ps auxH. When the job is complete, the corresponding process terminates, and the PID number is reclaimed to the pool (On Linux, the process table contains 32768 available entries). For an interactive process monitor, see top. Within top, you can press "1" to see all processor cores, "M" to sort processes based on memory usage, or "P" to sort based on processor usage.

Check system memory information with free. If you get into total/free memory mathematics, just keep in mind a lot of used memory is not a bad thing and it doesn't mean your system is wasting it; we'll discuss this later in the Guide.

Additional and interesting programs related to system and memory status that you could install are sysstat, memstat and htop.

To see a list of mounted filesystems, their mount points and mount options, simply run mount. To see disk usage statistics, run df.

Note

Note that mount only reads a file, /etc/mtab, and "prettifies" it a little before display. In turn, /etc/mtab should contain content that is essentially the same as /proc/mounts, but this is not always the case (for example, if /etc/ was mounted read-only at time when mount wanted to update the file).

Unless you're really adventureous, you won't happen to see output from the mount command being incorrect, but remember that only /proc/mounts tells the definite truth.

Now, let's say you wanted to execute commands uptime, free and df at once. You could do this simply by running uptime; echo; free; echo; df. (It is the semicolon ";" important here, echo command without arguments only serves to produce empty lines in between).

Use w or who to see the list of current system users. Run last -20 to see last 20 logins to the system. If you'd like an interactive variant of w (in the top fashion), use a general purpose watch command: watch -n 1 w (finish by pressing Ctrl+c, the generic "break" signal). watch is pretty interesting, by using quotes, you could run watch -n 1 'uptime; echo; free; echo; df' as well.

Run uname -a to see the most general system information - the hardware and kernel types and versions. For the run time statistics, run uptime; the program reports the current time, time since the machine boot (the machine uptime), the number of users logged in and the load (processor usage) averages for the last 1, 5 and 15 minutes.

To bring the idea to yet another level, let's say you had a complex chain of commands saved in a file, and wanted to execute the whole file at once (in "batch" mode, as people would say). This is, of course, easily doable with the piping principle mentioned above. Let's first run echo -e "uptime; echo\nfree; echo\ndf" > commands to create our "batch" file. (Additionally, run cat commands to see how the file really looks like and to understand the effect of the newline character \n). To "batch-execute" it, all we would have to do is run cat commands | sh, or sh < commands, or just sh commands. What's more, if you set an executable bit on the batch file (chmod +x commands) and added an appropriate "shebang" line to the file (by opening it and inserting #!/bin/bash at the top), you could run it by invoking ./commands or, more generally, /path/to/commands.

Let's cover some more stuff. One of them is the shell history. The commands you type, provided that they do not start with a space, are saved in shell history. The history can be viewed with history, navigated with keys Arrow-Up/Arrow-down and searched with Ctrl+r and typing in some part of the searched command. Also, the output of history numbers the previously executed commands. You can re-execute them by simply typing "!NUMBER", such as !1, !14, !-2, !ls or !!. The first two execute the command corresponding to the number; !-2 executes the command before last, !ls executes last command that started with "ls" and "!!" is a synonym for !-1 (re-execution of last command). Results of the "!" expansion can be combined with other input, so a command like ls; !! -al would in fact execute ls; ls -al .

Pressing Alt+. or Esc+_ would insert the last argument from the previous line into the current line. For example, if you ran mkdir DIR_NAME, then pressing cd Esc+_ would expand to cd DIR_NAME.

There's another cool trick allowing you to correct previous line (that is, replace "- " with " -"). For example, if you typed ls- al and got a "bash: ls-: command not found", one obvious thing to correct it would be to press Arrow-Up and replace "- " with " -". The other solution though is correcting the previous line using carrets: Try running the shown invalid command (ls- al) and immediately after that run ^- ^ -^. The carrets ("^") will make the specified correction in the previous line and execute it.

The shell history of executed commands is saved to a history file (usually ~/.bash_history) when the shell program cleanly exits, and is thus preserved between sessions. To prevent history from being written, you can terminate the shell forcibly with kill -9 0 or kill -9 $$ (the "0" or "$$" are synonyms for the current process, and kill -9 will forcibly terminate it).

Shells also support a nice model called "aliases". You can alias any text to a shorter form, and you can then access the alias, the unaliased command (in case of the same name) and append extra arguments. Here's how: you define an alias with a syntax alias NAME='COMMAND ARGS....', such as alias today='find -maxdepth 1 -mtime -1', which would find all files in the current directory that have been modified within one day. To invoke this, you could simply call the new aliased command, today. You can also alias existing functions. for example, to alias the remove command ("rm") to always prompt for confirmation before delete, you would run alias rm='rm -i'. They you could invoke rm FILE..., and it'd automatically mean rm -i FILE.... In case you want to use the original "rm" while the alias is defined, prepend the command with a backslash, \rm FILE.... You can see the list of defined aliases by simply typing alias, and you can check exactly what gets executed when you run a command with type COMMAND.... Aliases do not persist accross sessions, so you have to define them in the shell startup files, such as ~/.bash_profile for the Bash shell. Note that aliases are not the only way of packing multiple or long commands in an easily-accessible shorter form — you can do the same (and more) with shell script files, in which you type commands exactly as if you were executing them on the command line. Your ~/bin/ directory is the right place for this, as the programs copied there will be automatically searched by the shell when you log in (see ~/.bash_history for how it happens).

Shells also support the function called "backticks", invoked with the `` quotes. Text within backticks is executed as a separate command first, and its output is inserted in the original command, which is then executed. For example, to display current date, you can simply run date. However, to demonstrate the point, you can also write echo `date`. In that case, date would be run first, and its output would not be printed to the screen but inserted in the parent command as plain text, such as echo Tue Feb 2 11:19:50 CET 2010. The echo command would then print it to the screen. In this simple example, the final effect is the same, but the principle is not.

Another thing to mention is searching for files and executing commands on them. Let's say you wanted to search for all MP3 files in your home directory. You'd do this with cd; find . -name '*.mp3'. The first argument to "find" is the starting directory. In case it's the current directory, you can use "." or omit the argument altogether. To perform a case insensitive search, use -iname instead of -name. Another thing you want to do on the files found is run some command on them. One (lame) way to move all MP3 files to a separate directory would be mv `find -name '*.mp3'` /some/directory. However, most MP3 files have spaces and various non-standard characters in their names, so this command would crash and burn. Besides just not working (because it would treat "abc def" as two files, "abc" and "def"), a specially crafted filename could cause wreak and havoc. For example, part of the MP3 filename " -f " would actually be understood as the --force argument to "mv". So the first line of defense is to use "--" after "mv", which instructs the command that everything that follows is just the list of filenames, not command options. But the problem of spaces and special characters in the filename would still bite us, because the shell has a variable called "IFS" which treats spaces and newlines as separators. So to eliminate the whole issue, the proper and superior way to do this, one that is not vulnerable to file names or anything else is as follows: find -iname '*mp3' -print0 | xargs -0 mv -t /some/directory. The "-print0" option to "find" will use "\0" (the null character) as record separator; option "-0" to xargs will make xargs understand that, and xargs will then run the specified command with found filenames as arguments. Xargs would append all filenames at the end of the specified command, even honoring maximum line lengths (so you avoid the "Argument list too long" errors). If the command you're invoking does not support multiple filenames, or you want to run it on a file by file basis, pass "-n 1" option to xargs. Finally, if the filename is supposed to be inserted in the middle of the xargs command line (and not automatically at the end), use "{}" to mark the point of insertion.

OK, fine. Unix programs come in many variants, and with many options (as you have been demonstrated in this section ;-). They usually have a large set of supported options and successfully interact with each other, effectively multiplying their feature lists. I've read somewhere an interesting thought — that a Unix program can be considered successful if it becomes used in situations never predicted by its author. All of the programs I mentioned above are crucial to Unix and are successful. Their full potential greatly exceeds basic usage examples we provided, and it is implicitly assumed you should look up their documentation (man PROG_NAME) for all more detailed information, which is also the topic of our next chapter.


System documentation and reference

Debian GNU is a free and open system. All the issues that might pose a problem for you are already documented and explained somewhere. Surely a milestone in your Unix experience is learning where those information sources are, how to interpret their contents, and generally, how to help yourself in predicted and unpredicted situations without someone handholding you.

Unix documentation is extensive and easily available. Most of it is written by the software authors themselves (and not by the nearest marketing department) so you actually have the privilege of communicating to the authors' themselves. The pool of people who write Free Software programs and documentation includes a large community of technology professionals.

The process starts with yourself reading the provided documentation to see what do the software authors have to say to you, and to pick a little of their mindset. Debian pays special attention to the documentation; if it's not available from the upstream (original) author, then the Debian developers write the missing pieces themselves.

In Debian, each package you install creates a /usr/share/doc/PACKAGE/ directory and places at least the changelog files there. The directory in addition often contains the INSTALL, README and other upstream files which are sometimes irrelevant for you, as the tasks described there have already been performed by the Debian developers (but the rest of the notes, of course, still provide good information about the program). What you definitely are looking for are Debian-specific notes (in README.Debian files). This is the first place you visit for more information about a package. It is not uncommon that Debian packages see an enormous "added value" from maintainer-provided README files and practical recipes. Sometimes, if the documentation is big, a standard naming convention is followed and the documentation is distributed in a separate PACKAGE-doc package.

The other, very often used, part of the documentation are the system manual pages, accessible with the man and info commands. Man follows the traditional Unix manpage approach while info is the GNU-style texinfo collection. Man pages are sorted by volumes or sections which include user commands, system calls, subroutines, devices, file formats, games, misc and system administration topics. The man and info systems don't read each other's manual pages, but coexist peacefully on your system, mostly in a way that the info pages are ignored by your part. One of the reasons for this, in my opinion, is the very annoying info user interface (unless you're accustomed to GNU Emacs — then you'd describe the feeling as normal). To remedy the problem a little, try installing the alternative pinfo browser (sudo aptitude install pinfo).

To get a feeling of manpages, try running man mkdir. You'll notice all manpages follow a pretty standard structure; they often include NAME, SYNOPSIS (Usage), DESCRIPTION, AUTHOR, BUGS, COPYRIGHT and SEE ALSO sections. The specific mkdir page you opened tells you that all program options (those starting with "-" or "--") are optional (denoted by angle brackets — [] — in the synopsis line), but at least one directory name to create is mandatory (required). In some manpages, mandatory options are enclosed in <less/greater-than> symbols (like <-s size>).

See the man ps or pinfo ls commands. It is absolutely necessary to develop the habit of reading manpages. Whenever we mention a system command or a config file, we implicitly expect you to skim through its manpage. Novices have trouble understanding the manual pages; even though all the information they want to learn is right there, in the manpage, they have a hard time connecting heads to tails. If you happen to have this problem, keep reading — and given enough material you'll naturally come to understanding!

Sometimes you only want a general and short, one-line description of a program. See whatis cp or whatis df du. If you're looking for a particular functionality but don't know the actual command name, try using apropos. As it searches for manpages that satisfy any of the key words you enter, it tends to return large sets of results, so restrict your searches to a single keyword, like apropos usage or apropos rename (or ideally, run man apropos and learn how to specify search mode).

If you feel you need to ask the community a question, you can either use the various mailing lists (MLs) or the "real time" Internet Relay Chat (IRC). Probably a few mailing lists exist for about any program or project you may have a question about, but the mailing lists are not suitable for informal discussion and amateur help requests (to be correct, no one says they're not — but in a few years time, you surely won't appreciate Google or AltaVista first linking your name to such content).

IRC is your best bet to go over "runtime" issues (although on some IRC channels, there are now bots that collect logs and publish them online, and thus having the same problem as mailing lists). Install the ircii (text-mode) or xchat (graphical) package, run it, type server irc.gnu.org to connect and join the Debian channel by typing /join #debian once you're connected to the server. There are always hundreds of people present on the channel; if your questions are meaningful and don't require people to answer in essays, they'll probably be helpful to you. Spend time on the channel, learn from other people's questions and answers, and don't add to the channel noise. Exit ircii by typing /quit. Mind you, the sole possibility of presenting your question before the technically proficient audience is a great privilege and, of course, you must follow some minimum of the protocol: it's not required that you are familiar with the subject (if that was the case, you wouldn't be asking a question in the first place), but do read Eric Raymond's How to Ask Questions the Smart Way to increase your chances of recieving useful answers.


Chapter 3. Basic system administration tasks

 

"The learning and knowledge that we have, is, at the most, but little compared with that of which we are ignorant."

 
--Plato, 427-347 BC 

Booting the machine; runlevels and system services

The process of booting a machine starts with the computer loading system BIOS (or PROM, on Unix architectures) code from a known and fixed address in memory. Once that is done, BIOS tries to run user-specified code which is usually a bootloader (the thing that lets you choose the Operating System you want to boot).

In most common scenarios, you have Grub (Ground Unified Boot Loader) or LILO (Linux LOader) installed as the bootloader. Both Grub and LILO accept parameters on the command line, but in Debian the bootloaders are configured not to show the boot prompt. To make it appear, hold the Alt, Ctrl or Shift (depending on the bootloader!) key at the 'LILO' or 'GRUB' message (during boot, just before it continues with the 'Loading linux ....'), and you'll be able to pass arbitrary parameters to the kernel. For Grub, this is achieved by pressing "e" to edit entry, then "e" to edit the "kernel" line, and "b" to boot.

You can play and pass anything to the kernel via this command line; it won't cause harm unless you happen to choose a name that some part of the system actuall uses, such as acpi, mem, root, hda or panic. Your value will be visible in file /proc/cmdline later when the system boots.

After the kernel is loaded, it will start the init program which is the first process started on almost all Unix systems (as such, it has a PID of 1), and is active as long as the system is running.

Note

As soon as the kernel initializes the keyboard driver, you will be able to pause terminal output by pressing Ctrl+s. This will allow you to stop boot messages scrolling and peacefully examine the output on the screen. Ctrl+q will have the effect of "releasing" the terminal.

Keep in mind that you can always use this trick in a terminal (it's not related to the boot-up phase), and the Scroll Lock key on your keyboard has the same effect as Ctrl+s/Ctrl+q.

And then, in relation to init, we come to system runlevels. Runlevels are simply agreed states of the machine. Entering a runlevel means starting some services (while stopping others) to make sure the system looks as it is specified by the corresponding runlevel. init first executes tasks defined in the /etc/rcS.d/ directory — the Single-user runlevel. It then enters the default Debian GNU runlevel 2 and — consequently — executes the tasks defined in the /etc/rc2.d/ directory. (Note that other Linux distributions mostly use runlevel 3 as their default; runlevel 2 is the same as 3, but doesn't start any of the X Window System stuff).

Debian GNU uses so called SysV (System V, read as "system five") init system by default. It means that runlevels are represented as directories (/etc/rc?.d/), and directories consist of symbolic links to files in the "main" /etc/init.d/ directory; here's an example:

$ ls -la /etc/rc2.d/ | cut -b 57-
...
S20net-acct -> ../init.d/net-acct
S20openldapd -> ../init.d/openldapd
S20postgresql -> ../init.d/postgresql
...

The 'S' prefix starts a service, while 'K' stops it (for the given runlevel). The numbers determine the order in which the scripts are run (0 being the first).

init then excutes local scripts from /etc/rc.boot/ and performs the rest of init tasks specified in /etc/inittab (in older versions, /etc/bootmisc.sh was also ran). At that point, the system is booted, you see the Login: prompt, and life is great even more than usual.

Debian GNU provides a convenient tool to manage runlevels (to control when services are started and shut down); it's called update-rc.d and there are two commonly used invocation methods:

# update-rc.d -f cron remove
# update-rc.d cron defaults

The first line shows how to remove the cron service from startup; the second sets it back. Cron is very interesting, it's a scheduler that can automatically run your tasks at arbitrary times (even when the machine is completely unattended, of course). It definitely deserves some paragraph in this Guide and indeed, we'll get back to it later.

So, all files in /etc/init.d/ share a common invocation syntax (which is defined by Debian GNU Policy) and can, of course, be run manually - you don't have to wait for init to call them. All system services have their init scripts in the /etc/init.d/ directory (which are usually named after the services themselves), and which accepts generic arguments. Let's see an example:

# ls -al /etc/init.d/s* | cut -b 55- 
/etc/init.d/sendsigs
/etc/init.d/setserial
/etc/init.d/single
/etc/init.d/skeleton
/etc/init.d/sudo
/etc/init.d/sysklogd

# /etc/init.d/sysklogd stop
Stopping system log daemon: syslogd.

# /etc/init.d/sysklogd start
Starting system log daemon: syslogd.

# /etc/init.d/sysklogd invalid
Usage: /etc/init.d/sysklogd {start|stop|reload|restart|force-reload|reload-or-restart}

NotePlease Note:
 

  • A generic init.d script template, /etc/init.d/skeleton, can be used as a starting point for your own scripts. For elements that do not require a full-fledged startup script, see files /etc/init.d/bootmisc.sh and /etc/rc.local.

  • On a side note, Debian supports a number of different init mechanisms. At some later point, you might take a look at file-rc and runit.

Now that we've covered the basics of a system boot process, we can move on to a subject that just logically follows.


Virtual consoles

Almost all Free Software distributions ship with predefined 'virtual terminals' - completely separate text screens or consoles which are available with left Alt + F1-F6 keystrokes (only about 6 consoles are enabled by default). Keep in mind that it is also possible to use command-line method to switch between the consoles (see the chvt command) and that you can open new consoles automatically (from scripts or otherwise) with the open command. Some proprietary Unices use the Ctrl + Alt + 1-6 combination (standard numeric keys instead of Function keys).

To enable more virtual consoles than what you get by default, run sudo editor /etc/inittab and add more lines like those:

5:23:respawn:/sbin/getty 38400 tty5
6:23:respawn:/sbin/getty 38400 tty6

[You can see which fields have to be incremented]. For changes in that file to take effect, exit the text editor and type telinit q.

If you create more than 12 consoles, you won't be able to access them with left Alt (since the last F key you have is 12), so use Right Alt key to reach consoles 13 - 24. You can also use Alt + left_arrow or Alt + right_arrow to cycle through open consoles. Alt + Print_Screen key switches between two last used virtual consoles.

We did not cover any of the X Window System (Unix graphical interface) stuff yet, but just remember that you'll need to use Ctrl+Alt instead of just Alt to switch from an X window to the console.

The deallocvt command frees memory still associated with virtual terminals which are no longer in use [by applications, not you of course], although this is probably not so important nowadays due to insane amounts of RAM in personal computers.

Some more useful stuff that you can do with the consoles include changing the VGA font size. This can be quickly achieved by running something like consolechars -f lat1-08, where the available fonts are in the /usr/share/consolefonts/ directory.

The other way is to pass "vga=ask" at the GRUB boot prompt (or using lilo -R 'linux vga=ask' for LILO before reboot), upon which the system will give a menu during boot and you'll be able to select the font size (size of 6 is small and fine). This line above would set up LILO parameters only for just the next boot (linux vga=ask). So when you find a nice VGA mode, you should edit /etc/lilo.conf and make it permanent there:

image=/vmlinuz
label=Linux
read-only
# Just add the line below
append="vga=X"

[X is replaced with the actual value you like, 6 for example]. Then, run lilo to apply changes (not forgetting sudo, of course).

In case of GRUB, you would do this by editing the commented "kopt=" line in /boot/grub/menu.lst and running update-grub.

It's also possible to drive the console in the high-resolution VESA mode, but that's quite tricky to set up and doesn't justify for inclusion in our Guide.

Furthermore, if you see the penguin in the upper left corner of your screen while the system is booting, or the console text pointer is a blinking rectangle (instead of just a blinking underscore), then you are using a "framebuffer" graphics mode. In that case, there are more screen modes available to you, but they are just not listed in the boot-time selection menu; see the table on the Framebuffer HOWTO page for the full list. There's also fbset command available (from the fbset package) to control the framebuffer behavior.

Note

In most simple terms, framebuffers directly map a portion of RAM memory onto the graphics display device. The amount of memory occupied equals horizontal resolution * vertical resolution * bits per pixel. All you need to do in order to draw to the screen then, is modify appropriate locations in RAM.

Even though this framebuffer idea is very nice (and framebuffers play much more important role on other architectures than they play on PCs), and even though framebuffers can be really fast, they never gained much popularity on GNU/Linux PCs. They are generally too hard to tune properly, and the Linux framebuffer documentation is terribly poor (the best startup point is the Programming Linux Games book by John R. Hall, Loki software).

However, since all VESA2 cards support framebuffers, during one period (from year 1998 to 2001, rougly) framebuffers were often used to run X graphical interfaces on graphics cards for which no free software drivers existed yet; they were slow and all (of course, because they didn't employ any acceleration techniques), but still allowed to run displays with normal resolution and color depths (opposite to the also-supported VGA mode, but which only allowed a 640x480 resolution in 16 colors).

Nowadays, framebuffers are mostly used in Operating System installation programs that hope to achieve greater compatibility on a wide range of hardware.

Besides the mentioned font size, you can also change font types. Install the fonter package and you will be able to edit and create your own fonts by running fonter, or use some of the standard ones you get:

$ consolechars -f /usr/share/fonter/crakrjak.fnt
$ consolechars -f /usr/share/fonter/elite.fnt
$ consolechars -f iso01.f16

Also nice to know is that you can easily change the mapping of keyboard keys. To see current keyboard mappings, simply run dumpkeys > keymap; editor keymap.

After you tune the keymap file to your needs, load it back with the loadkeys ./keymap command.

To see just how great the console is, run the loadkeys program, and type the following in its prompt:

string F1 = "Hello, World!"
[Ctrl+d]

Then just press the F1 key to see the consequences.

Note that this is not the best you can do with the console, it's just a small collection of quick and useful tricks to show you the direction to look into. If you're interested in more console tricks, take a look at (some of) setterm(1), stty(1), tput(1), tset(1) and possibly tic(1).


System messages and log files

Unix systems have a standard message logging interface, and all programs can freely use it. Besides having the advantage of being unified and easily parseable by the log monitoring programs, syslog messages offer a very convenient way to manually monitor overall system status and learn a lot about the system in general.

There are many actual implementations available but they're all commonly known as syslog daemons (daemon = server). In essence, each message contains the facility (category) and priority (importance) information (along with the message text itself, of course). The facilities (message categories) recognized are auth, authpriv, cron, daemon, ftp, kern, lpr, mail, mark, news, syslog, user, uucp or local0 - local7, and the priorities are debug, info, notice, warning, err, crit, alert or emerg.

Messages can be generated by all the kernel, computer programs or system users. When the message reaches the syslog daemon, it is:

  • prepended with date, time and source information,

  • matched against the syslog config file rules, and

  • distributed accordingly. Common actions include writing messages to log files or named pipes, echoing to all (or selected) users' consoles, or forwarding to another computer.

The default Debian syslog daemon used to be a variant of the traditional BSD (Berkeley Software Distribution) called sysklogd. Nowadays Debian uses rsyslog.

NoteOn a BSD Note
 
 

"There are two major products that came from Berkeley: LSD and Unix. We don't believe this to be a coincidence."

 
--Jeremy S. Anderson 

Note, however, that this is not technically correct; morons.org say LSD is not a Berkeley product (it's from Sandoz), and J.S.Anderson is an anonymous, but the quote is still widely cited and worth mentioning.

So, following the consistent naming scheme, the sysklogd config file resides in /etc/rsyslog.conf. If you open it, you'll recognize a simple structure: selectors (facility.severity pairs) associated with actions (output destinations). For the rest of the config file details, see the rsyslog.conf(5) man page.

For both an educational example and a practical result, we're going to make two simple changes to the syslog configuration file:

  • move the ppp (Point-to-Point Protocol) messages to a separate config file, /var/log/ppp.log, and

  • make all messages also appear on one of our text consoles (for easy log monitoring).

To accomplish our first goal, simply run echo "local2.* TAB /var/log/ppp.log" >> /etc/syslog.conf (where you replace "TAB" with the actual Tab character by pressing Ctrl+v, Tab). As the pppd logs to facility local2, it'll redirect all messages (regardless of the severity) to a separate file.

For the second part, run echo "*.* TAB /dev/tty12" >> /etc/syslog.conf. That rule will output a copy of every message to your 12th console — /dev/tty12. That console should be empty and unused by other software, but note that technically you can have both a valid login console and syslog messages on the same terminal; the output would just clutter up eventually. To clear it up, you could use Ctrl+l, the standard "clear screen" key combination.

Now, since changes to the config files are generally never automatically detected by the programs that use them, we need to tell the syslog daemon to reload its configuration. We will use the standard /etc/init.d/ interface, which we've talked about already. Simply run sudo invoke-rc.d rsyslog reload or sudo invoke-rc.d rsyslog restart, for the changes to take effect. (Using invoke-rc.d is even better than using things like /etc/init.d/rsyslog reload directly as it does not depend on a particular init scheme).

Even by observing the logs of a seemingly idle (inactive) system, you'll see there actually are periodic jobs ran by the cron daemon (system scheduler). Besides that, try running any privileged command (something as simple as sudo ls) and switch to your 12th console to see how it gets logged.

If you want to send your own messages to syslog, use the logger program (part of the bsdutils package). Try running logger -i -p user.info -- This is a test message.

If you are planing to use the X graphical interface, switching to the 12th console might not be the most convenient way to monitor system messages; your monitor or an LCD display needs to adjust to new pixel frequency every time you switch console; it takes a second or two to do that and it starts getting annoying after the initial amusement. It is possible to solve that by making the frequencies match, but that's out of the scope of this Guide. Our solution to the problem will consist of running X applications such as root-tail instead, which monitor log files and print messages to your root window (the X background).

To round up the section, we could just mention that all the log files are usually kept under the /var/log/ directory, and all of the messages you watched appear on your 12th console were also saved in one (or even more) of those files. You could figure out the purpose of each log file by seeing /etc/syslog.conf.

Particularly interesting is the /var/log/dmesg file - it keeps a copy of the messages that scrolled by at system boot time. You can also use the dmesg command, but instead of the bootup messages, it will display the last few kilobytes of kernel messages (which might or might not be the same as /var/log/dmesg contents, depending on the activity the system saw in the meantime).

Actually, newer systems also sport the bootlog daemon that takes proper care of saving bootup messages. If bootlog is enabled in the /etc/default/bootlogd file, complete boot log will be saved to /var/log/boot.


Deeper look at the Debian package tools

Earlier in the Guide, we mentioned some of the truly basic package management commands along with their most used options. However, Debian offers much wider range of packaging-related tools.

dpkg, the medium-level package manager for Debian, offers some more low-level functions than apt-get, and roughly corresponds to the rpm command on RPM-based (Red Hat Package Manager) Linux distributions.

Most notably, dpkg does not have any automatic package retrieval methods. To install a package with dpkg (say, package vim), you would first have to download the .deb package yourself and then run something like dpkg -i vim_6.0.093-1.deb. dpkg doesn't even check for dependencies, so in this example, package vim could be unpacked but its configuration would be delayed until you first install all the packages it depends on (which is a boring and uneducated way to install software; use apt-get). dpkg is, however, still indispensable for lower-level management and definitely worth the tour.

dpkg -r vim would remove the package if there are no installed programs that depend on it. Configuration files for the package (those listed as conffiles in the package control files) are left on the system. dpkg --purge vim would remove vim along with configuration. dpkg --configure --pending would configure all pending packages (for example, those that were left waiting for processing after an unsuccessful dpkg run).

Sometimes it's useful to copy the package list from one machine to the other, and get all the same software installed on the other system (or simply keep a list somewhere for future reference). Use dpkg --get-selections > list to retrieve the list, and dpkg --set-selections list; apt-get dselect-upgrade later to load the list and trigger installation.

It's also possible to put packages on hold, meaning you don't want the system to touch them. Use echo vim hold | dpkg --set-selections. To make it available for upgrading again, run echo vim install | dpkg --set-selections.

Sometimes a dpkg action can't be performed because of missing dependencies, duplicate files or something like that. Use the --force-all with dpkg to ignore the problem and continue. This option can be used everywhere with dpkg but it often leads to package database corruption (specifically, version mismatches) and total dependency chaos. If you later plan to use apt-get, never use this option as it instantly breaks apt (you can, however, try apt-get -f install and apt will do its best to clean up the mess).

dpkg-reconfigure can be used to reconfigure debconf-enabled packages (those which use debconf to ask questions and get answers about the local configuration). Use dpkg-reconfigure vim to reconfigure vim. dpkg-reconfigure debconf would reconfigure debconf itself. You can choose between a few types of interactive or non-interactive package configuration modes. Non-interactive mode is very useful if you are performing mass or automated installations.

TipTip
 

Sometimes (due to a bug in a specific package's debconf interface), you won't be able to successfuly configure the package; this is very likely to happen from time to time if you use the Debian unstable tree. Common example would be 'Accept' buttons which don't actually accept any input, or text fields which are (again, by mistake) always considered empty. A possible hack solution for this kind of problem is to reconfigure debconf to non-interactive, then configure the problematic package and finally reconfigure back to some sort of interactive mode. Packages have matured over the years though, and we couldn't remember any relevant occurrences of this problem.

TipTip
 

You will most probably be using this command to reconfigure the X Window System every now and then, so just remember this command, which is the elegant Debian-specific way to deal with the configuration: dpkg-reconfigure xserver-xfree86

Sometimes it's also useful to see a recursive dependency listing for a package. This feature is provided by the apt-rdepends package.

To reinstall a package, use either the above dpkg -i or apt-get --reinstall install PACKAGE NAMES. To install the specific version or branch of a package, run apt-get install vim=6.0.093-1 or apt-get install vim/testing.

To upgrade the system, you usually run apt-get update; apt-get upgrade. To upgrade only specific packages, run apt-get install PACKAGE NAMES or debfoster -u PACKAGE NAMES.

grep-dctrl is another tool in the stash. It can answer questions such as "What is the Debian package foo?", "Which version of the Debian package vim is now current?", "Which Debian packages does John Doe maintain?", "Which Debian packages are somehow related to the Scheme programming language?", and "Who maintains the essential packages of a Debian system?". See its manual page for more information.

To remove unnecessary Debian packages (unused libraries left on the system etc.) from your system, run debfoster or deborphan.

dpkg-repack package provides us with a tool to bundle installed packages back into the .deb format. If any changes have been made to the package while it was unpacked (such as files in /etc modified), the new package would, of course, inherit the changes. This utility makes it easy to copy packages from one computer to another, or to recreate packages that are installed on your system, but no longer available elsewhere.

Use dpkg-divert to override a package's version of a specific file. You could use it to override some package's configuration file, or whenever some files (which aren't marked as 'conffiles' in the Debian package) need to be preserved by dpkg when installing a newer version of the package. In addition to (or instead of) dpkg-divert, you can use dpkg-statoverride to override ownership and permissions (and suid bits, of course) of installed files. Using this technique, you could also allow program execution only to a restricted user group.


Debian package files format

Even though Debian GNU packages are best manipulated using appropriate Debian package tools, it's quite useful to be introduced to their internal "constitution".

Debian package files (.deb files) need no special tools to be manipulated; they are simple ar archives consisting of two files: data.tar.gz and control.tar.gz. In other words, the generic tools needed to extract .deb contents are ar, tar and gzip, and are all present on just about every Unix system.

To extract package data, run dpkg -x package.deb /tmp/PACKAGE. To extract package control section, run dpkg -e package.deb.

If you have no dpkg at your disposal, you can extract the data section using ar to extract the data tarball, and tar/gzip to unpack it: ar x package.deb data.tar.gz; tar zxf data.tar.gz. The same way, you can extract the control.tar.gz section.

NotePlease Note:
 

You might need to use the above procedure in practice if, while upgrading gnu libc package, you do something silly and end up in half-installed state with no /sbin/ldconfig command (so all of the "heavier" programs start refusing to run). If that's why you are reading this, then one solution is to unpack the libc6 package manually and copy the ldconfig command back in place (to /sbin/). The other (and easier) thing you can do is temporarily create an empty /sbin/ldconfig file which would simply return success:

# echo "#!/bin/sh" > /sbin/ldconfig
# chmod 755 /sbin/ldconfig
This way, however, you need to add --force-overwrite switch when you go reinstalling libc, such as dpkg -i --force-overwrite libc6*.deb.


Useful extra packages

Two useful programs worth mentioning are vrms and popularity-contest. vrms notifies you of non-free packages installed on your system (ideally, there should be none!). popularity-contest produces weekly package usage statistics (frequency of use, etc.) and anonymously e-mails them to Debian, thus automating part of the feedback from the user base. The statistics are, for example, used to decide on the distribution of Debian GNU packages on CD-Roms.


Monitoring installed files for correctness

Each Debian package inserts control information into the package database (/var/lib/dpkg/ directory). One of the values are MD5 sums of all installed files (File "sums", "MAC"s or "digests" are results of a one-way function — MD5 in our case — and uniquely identify file contents). When the file digest is compared to a previous good value from the database, we can immediately notice if the file contents (and contents alone, not other attributes like mode or ownership) have been changed, either as a consequence of legal system operation, software/hardware bug, or a successful break-in).

For packages that do not have MD5 sums already generated (there are few cases), the sums can be generated directly at your site, during installation. (Debconf will present you with an appropriate question when you install debsums.)

Most common use is to run debsums PACKAGE to verify individual package, or debsums -s to verify all packages and only display checksum mismatches. See debsums man page for more information on available options and possible use.

If you wish to change a file's checksum, you no longer need to develop your own tools to edit /var/lib/dpkg/info/PACKAGE.md5sums files, newer debsums packages ship with the debsums_gen command.

Also, two programs worth mentioning are changetrack and etckeeper. Etckeeper may be a bit more advanced, and it is used to put your whole /etc directory under revision control. To install and initialize it, run sudo aptitude install etckeeper; etckeeper init; cd /etc && git commit -am Initial. After that, you can see pending changes in /etc by cd-ing into it and running git status or git diff at any time, and you can see previous, committed changes by running git log or git log -p. You can override pending changes to any file with the last committed version with git checkout FILENAME.


Shutting down the system

Recall that we have mentioned and briefly explained runlevels above. In Unix, system halt (shutdown) is simply runlevel 0, system reboot is runlevel 6.

To shut down the machine, any of shutdown -h now, halt, poweroff or init 0 will do. To reboot the system, shutdown -r now, reboot or init 6 are okay. Additionally, you can also use Ctrl+Alt+Del (in the console) to reboot, and this behavior is controlled by the /etc/inittab file (run init q to reload the file if you change it). Sometimes you want to cancel the ongoing shutdown; you can do it as long as your console is active. In that case, run shutdown -c or say, init 2.


Chapter 4. Interaction with system hardware

 

"In the UNIX world, people tend to interpret `non-technical user' as meaning someone who's only ever written one device driver."

 
--Daniel Pead 

Introduction

There are many computer architectures available. Most of them have a cleaner design and stronger characteristics than the nowadays-standard Intel-compatible Personal Computer, and a number of them were in existence before the PC was "invented". Today, though, we see that the PC-compatible processor series has taken over the workstation and server market.

On a side note, as the quote from the "Autodesk file" by John Walker in the 80s, about the PC predecessor would say:

Coming to Terms with the 8086

It's become clear that the plague called the 8086 architecture has sufficiently entrenched itself that it's not going to go away. For the last month or more, Mike Riddle, John Walker, Keith Marcelius, and Greg Lutz have been bashing their collective heads against it. The following is collected information on this unfortunate machine.

I think we'd be wise to diffuse our 8086 knowledge among as many people as possible. The main reference for the 8086 is a book called, imaginatively enough, The 8086 Book published by Osborne. This is the architecture and instruction set reference, but does not give sufficient information to write assembly code (of which, more later). However, it is the starting point to understand the machine. AI will reimburse the cost of your buying this book, which is available at computer and electronic stores.

I have never encountered a machine so hard to understand, one where the most basic decisions in designing a program are made so unnecessarily difficult, where the memory architecture seems deliberately designed to obstruct the programmer, where the instruction set seems contrived to induce the maximum confusion, and where the assembler is so bizarre and baroque that once you've decided what bits you want in memory you can't figure out how to get the assembler to put them there.

This list of other general-purpose architectures (besides the Intel-compatible) would include Motorola 68000 (m68k), Sun Microsystems Sparc (sparc), Digital Equipment Alpha (alpha), Motorola/IBM PowerPC (ppc or powerpc), Silicon Graphics/DEC MIPS (mips and mipsel), Intel IA-64 (ia64) and AMD 64 (amd64). This list is by no means complete, it's only a random selection of architectures already supported by Debian GNU.

So, given different architectures which are not binary-compatible (that is, where programs are not compatible across architecture bounds), how do you get the same software running on all of them? We are, of course, interested in re-using existing applications — those that have a tradition, more features, and more hours in production than anything we would be able to create ourselves. The key to the problem is namely porting; changing existing code base in a way that it makes provisions for the new target platform (at places where any specific handling is necessary). Given proper education, writing portable software is easy; porting ("behaving" existing software), however, can be a source of great despair — sometimes you first have to deal with unacceptably poor programming practices and missed design, before you even come to portability issues.

Let's see what would you have to do if you had a completely usable but new architecture, for which no high-level kernels and application programs were developed yet.

The first step in getting software run on our hypothetic architecture would come down to porting an existing Operating System kernel (first piece implemented in software and not hardware), so that the kernel is able to recognize basic system components and initialize hardware-dependent subsystems. When we'd get that done, we'd have to write kernel drivers to support additional hardware, such as network, audio and graphic cards, or your custom boards. Modern kernels support Loadable Kernel Modules (LKMs) which you can load into the kernel at runtime, so this task would not be as pressing as if you had to put everything in a single kernel image — but nevertheless, the system is not very usable without it.

Once we'd get the above done, we'd have to get the build-chain (development tools) supported for a platform, or we wouldn't be able to get any user applications running. (In reality, this step would probably be more to the top of the list.) And once we'd have all that done, there would still be more work waiting for individual software programs. We'd have to try compiling them (converting to a binary, machine-runnable form) on new hardware. Depending on how much portability issues the original authors addressed themselves, this could be an easy, difficult or impossible task. Sometimes doing a complete rewrite (and avoding other design mistakes the original authors made) is easier than insisting on code base that was written by poor programmers, or at time when no standardized portability techniques were widespread.

To sum up, there's much more to programming that just the physical position of sitting in front of a computer and punching letters. (Even though we all probably know a person or two who seem to be using their existence to prove the contrary).

When you get to understand the basics of computer hardware, Unix will start looking like the only reasonable and natural extension to hardware, and its concepts and design elements will be easy to understand.


Operating system kernels

As we've mentioned in the Introduction, a basic software element of an usable, general purpose computer is the kernel. Among the most widely known free kernels today is Linux, made by Linus Torvaldsen and first released in 1991. Linux is an extraordinary thing in a true meaning of the word; you can find Linus' USENET post from October 5, 1991 in the Google Groups' 20 Year Usenet Timeline. Another excellent site, ComputerHope, maintains a record of Linus saying "Hello everybody out there using minix - I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones."

Nowadays, many (or all) Free Software bundles sold by various companies are commonly called Linux distributions. As the name implies, of course, they contain the Linux kernel, but that is only a tiny piece of the complete bundle. In addition to the kernel, bundles contain loads of Free Software packages which have been ported and compiled for your platform, and have been pre-configured to work nicely with other packages from the "bundle".

A variety and freedom of choice always existed in the Free Software applications arena. For example, if there were 10 usable (not even popular, just usable) music player or web server programs in existence, chances are you'd find all of them already packaged and waiting for installation in your Debian package archive (and probably in majority of the Linux distributions as well). Kernels, however, have a completely different story. Today, Linux is by far the most popular Free Software kernel. Other kernels (such as FreeBSD or NetBSD) are getting up to speed, but they have a life of their own and are based on a different license, not GNU GPL.

Now you can understand why the authoritative people insist on calling distributions GNU/Linux instead of just Linux. "Linux" identifies the kernel flavour (since now you know there are many, not just Linux), and "GNU" identifies the so-called userland (collection of user and system software, mostly the parts crucial for any Unix to be usable). Kernel is only a tiny part, what you actually "feel" when you use the system is the "spirit" of the userland - behavior of basic commands and tools. Omitting "GNU" from the name is ignorant behavior; GNU/Linux should be called GNU/Linux, at least in writing.

I hope you will understand that most of the things you'll appreciate in "Linux" will actually be standard and decades old Unix features that have nothing to do with the Linux kernel itself.

Now, fitting the section title and this introduction together, we can reach the expected conclusion — since Debian is aiming to be an universal Operating System, it offers the choice of kernels!

Yes, that's right! Besides the most widely known Debian GNU/Linux, there are also FreeBSD, NetBSD and The Hurd available. The Hurd (visit The Hurd Wiki) microkernel itself is still in development and not ready for use in production, but NetBSD and FreeBSD are mature kernels, and the usability of their Debian ports should increase rapidly. (Currently the main issue with *BSD kernels and Debian is getting the kernels play along with GNU userland instead of the expected BSD). Just for the record, Linux 2.6 kernel series seems to outperform the rival cavalry on just about every test performed (at least according to Gregory McGarry and Felix von Leitner), but the BSD people are coming up with no-less smarter ideas and valuable features.

In practice, this means that you'll be able to choose any hardware platform (machine type and architecture) and any of the supported kernels that best match your exact needs, while from the user perspective it will always be Debian with no visible and incompatible differences!


Kernel drivers

Free Software kernels come with all supported drivers already included in the kernel source packages. Loading or unloading a driver for any piece of hardware is an instant, one-command action.

Free Software drivers are often better than their proprietary counterparts (paradoxically, since the proprietary drivers were made by the hardware manufacturers themselves). The only notable exceptions to this rule happen when the manufacturers keep their hardware internals secret and don't publish openly accessible specifications, so the technical people can't easily find their way with the piece of hardware (but this is easy to deal with — don't buy such products and you will directly communicate the message to the manufacturers, in the only language that they understand without thinking twice).

The situation today looks MUCH better than it looked 8 years ago when I first wrote this. Today Linux works on just about everything, it supports accelerated graphics and DRI (Direct Rendering Infrastructure) on all major graphics cards (nVidia and AMD/ATI), and the drivers for those cards come from the manufacturers. Not all of them are completely open about it, though, but shiny examples include companies Atheros (network cards) and ATI (after takeover by AMD) who directly talk to the Free Software developers and have opened their specifications and drivers. And good times are yet ahead! (On a side note, for a completely open graphics card design and drivers, although not usable for 3D yet, see OpenGraphics.)

So in general, when the hardware is supported, the whole process of integrating it into your system consists of loading appropriate kernel or user-space drivers, performing any necessary configuration tasks, and making sure the hardware configures itself automatically on each system boot.

To save you the trouble, we will present an introduction to the general hardware identification techniques, then provide a series of sections describing different kinds of hardware and giving actual instructions how to properly integrate them in your system.

Note that this chapter will be losing on importance, as new Debian installers perform the autodetection well, and nowadays there are generally no manual steps required to set up drivers on commodity hardware.

(As mentioned, this section is losing on practical relevance as almost all hardware is nowadays auto-detected, auto-configured and everything else auto- that you can imagine. But it's still how things work, so it's a great source of knowledge that would today be considered "old school").

We are going to provide the general techniques for identifying system hardware here, because you'll use them in all of the following hardware sections.

In general, to identify system hardware, you run less /var/log/dmesg. That will display most general and low-level information, such as system and processor type, available memory, attached hard disks, etc. This is all pretty nice, but does not help you in identifying unknown hardware because those lines are written only after the appropriate drivers are loaded (and if that was the case, you wouldn't be trying to identify those devices).

Since (as we just said above) the basic components such as mainboards, processors and hard disks are recognized without special setup on your part, you'll be interested in identifying the rest of the hardware — that would mostly be "attachable" devices such as PCI and AGP cards, serial or parallel adapters and USB gadgets. To do so, run lspci (pciutils package) and lsusb (usbutils package).

Note that the ID strings returned by above utilities will be your starting point in finding out the name (or names) of the drivers that support your specific devices. Free software drivers usually target chipsets instead of each device model of each vendor separately. For example, many old ISA network cards (from different manufacturers) worked with the ne2000 driver. Nowadays, most network cards work using drivers 8139too, 8139cp, via-rhine or e100. That said, most of the time you will not be interested in searching for the exact identification strings as reported by lspci or lsusb, but only look at them for clues about chipset names (and those can also be easily read from the chip labels, if you can take the cards out of the computer and look at them).


Driver name identification techniques

So when you find some hardware ID strings, you'll want to search for the appropriate driver names. There are few approaches to this; you can run sudo modconf and try to locate your ID string in the list. If you find the device in the list, press Enter and try to load the driver module (you almost can't make a mistake at this step — if you pick a wrong module, it simply won't load successfully and you'll be able to try a different one). If you find the right one using modconf, the tool will do all the necessary steps by itself and you'll have the hardware configured.

Note

Just, don't feel lucky if you manage to load the dummy module. The whole purpose of that module is to only pretend as if the device was there, so obviously it won't do you much good in practice.

If you don't get lucky with modconf, simply open your Web browser application and look up your identification string (say, "Realtek Semiconductor RTL-8139") on http://www.google.com/linux. In the results returned, you should learn the name of the driver to support the specific hardware. The ID string I used in this example belongs to a network card, and directly the first result returned by Google would tell you the appropriate NIC (network card) driver name is 8139too.

Finally, since the drivers are kept on your hard disk as normal files, the third way to learn the correct driver name includes visiting the modules directory and doing some manual file snoop & search operation there. To enter the correct directory, simply run cd /lib/modules/`uname -r`/. Once there, try using cd, ls and find commands to get some results. Again, using the above "Realtek Semiconductor RTL-8139" for example, running find . -iname '*8139*' in the modules directory should give you a hint right away.

ImportantNote on drivers loading automatically on each boot
 

If you load the driver using sudo modconf, then modconf will also take care of adding the driver name to /etc/modules file. In effect, this would make the driver load on each machine boot.

If you don't succeed with modconf, then after finding out the driver name, you'll have to load it manually using sudo modprobe DRIVER_NAME. You'll also want to add the driver name to /etc/modules to make it load on every boot; you can do that by running echo DRIVER_NAME | sudo tee /etc/modules.

On systems of today, we have a mechanism called "udev" which is configured by files in /etc/udev/. For example, once the network cards are (auto)detected, their names become persistent since udev updates file /etc/udev/rules.d/70-persistent-net.rules. So that's the place to look for if you want to force or change your network card name assignments. Find more introductory information about "udev" in /usr/share/doc/udev/README.Debian.gz.

Ok, those few simple hints are enough that we can move on to practical hardware setup scenarios.


Common hardware setup and performance; guidelines and practical examples

Network Interfaces (LAN)

The machine picked for this example had two PCI network cards. Let's see how we could detect and recognize both. We will use the lspci command and search for lines mentioning Ethernet. Real output of lspci | grep Ethernet looks like this (each line representing one ethernet card):

$ lspci | grep Ethernet
00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
00:12.0 Ethernet controller: VIA Technologies, Inc. Ethernet Controller (rev 74)

Or a similar system might show:

$ lspci | grep Ethernet
0000:00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
0000:00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)

Today, popular low-cost PC network cards probably use one of 8139too or 8139cp (RealTek), winbond-840 or rl100a (Winbond), via-rhine (VIA) or eepro100 (Intel) drivers. We could try running modprobe MODULE_NAME for all of those driver module names above and we'd actually get lucky, but let's be some more clever than that; let's show how we could be successful using the identification tips we gave in the previous section.

If we fired up sudo modconf and entered the kernel/drivers/net section, we'd find both "RealTek RTL-8139 PCI Fast Ethernet Adapter" and "VIA Rhine" there. Selecting them and specifying no additional parameters (PCI devices don't usually need them) would successfully load the modules.

If we opened the Web browser and searched for the identification string "Realtek Semiconductor RTL-8139" on http://www.google.com/linux, the first result would hint 8139too to be the appropriate driver name. Searching for "VIA Technologies, Inc. Ethernet Controller" would reval the via-rhine driver name in the 3rd result.

And if we did cd /lib/modules/`uname -r`/ and find -iname '*8139*'; find -iname '*via*', we would also have something to work with.

It's important to know that, unlike most things in Unix, network interfaces are not represented by files in the /dev/ directory (although there exists a patch for the Linux kernel which enables that); in Linux, the interfaces are simply given "virtual" names such as eth0, eth1 etc. The tool to manage network interfaces is traditionally called ifconfig. Running sudo ifconfig -a should give you some nice output that you might want to look at.

If you ran ifconfig, you probably noticed that the network interfaces, for whom you have just loaded the drivers, have not been configured. The configuration file for this kind of stuff is named /etc/network/interfaces, and it defines the usual things, such as network interface IP addresses, netmasks and gateways. The interfaces(5) manual page is pretty informative, but we'll provide two most common examples; simple static IP and dynamic IP (DHCP) setups. Our first interface, eth0 will be given a static IP; the other, eth1 will have a dynamic IP.

Example 4-1. /etc/network/interfaces

#
# Standard stuff
auto lo eth0 eth1

iface lo inet loopback
#
# Eth0: static
iface eth0 inet static
  address 192.168.7.3
  netmask 255.255.255.0
  broadcast 192.168.7.255
  gateway 192.168.7.1
#
# Eth1: DHCP
iface eth1 inet dhcp

Configuring that file and running sudo /etc/init.d/networking restart should set you on stage.

In some cases you might want to change the network card's ethernet ("hardware") address. This is posible with the ifconfig command, and you have to run it on each system boot, of course:

ifconfig eth0 down
ifconfig eth0 hw ether BA:BE:BE:EF:D0:0D
ifconfig eth0 up


Hard disks

As we've mentioned already, hard disks are recognized and work without any special setup. (Well, not really, they too are an option in the kernel configuration, but definitely all of the usual setups include the disk and filesystem drivers out of the box).

One thing that you can do, however, is check the performace of your disks (cached and raw read performance). Install hdparm, then run sudo hdparm /dev/sda to get a feeling of what disks are about. (Note: old ATA disks on older Linux systems were called "hda". The distinction between "hda" and "sda" is a leftover from times when "hda" were strictly ATA and "sda" were strictly SCSI devices).

$ sudo hdparm /dev/sda

/dev/sda:
multcount    = 16 (on)
IO_support   =  1 (32-bit)
unmaskirq    =  1 (on)
using_dma    =  1 (on)
keepsettings =  0 (off)
readonly     =  0 (off)
readahead    =  8 (on)
geometry     = 1823/255/63, sectors = 29297520, start = 0

To actually measure the throughput, first make sure the system isn't busy with anything that could influence the outcome (top should report only 1 process running — and that being itself), then run sync three times (to sync, or clean memory buffers by synchronizing to the disk now), and finally, run sudo hdparm -tT /dev/TdL.

In /dev/TdL, T stands for disk type (which is s for SCSI, and h for IDE devices), and L stands for "number" which is actually letters of the alphabet, starting at a. If you had an IDE disk, it would most probably be /dev/sda. Actually, it would probably be nice to explain this "numbering" scheme properly: on typical PC motherboards, there are two IDE disk connectors (primary and secondary), each accepting a 40- or 80-wire cable with two connectors (master and slave) for peripherial devices. Altogether, this makes a total of 4 disks/CD-Roms that you can attach, and according to their position, they're assigned letters a, b, c or d. With some other bus types (SCSI for example), the devices are just numbered "in order of appearance"; regardless, the name/number assignment principle stays the same. In case of newer setups using LVM or MD, devices like /dev/sda won't do you much good; run mount to get an idea of proper device names in your setup.

Anyway, speaking of hdparm -tT, here's a sample output:

$ sync; sync; sync; sudo hdparm -tT /dev/hda

/dev/hda:
Timing buffer-cache reads:   848 MB in  2.00 seconds = 424.00 MB/sec
Timing buffered disk reads:   90 MB in  3.01 seconds =  29.90 MB/sec

Interesting, though, how the statistics looked back in the day. Here's a modern system from February 2010:

$ sync; sync; sync; sudo hdparm -tT /dev/sdc

/dev/sdc:
Timing cached reads:   13570 MB in  2.00 seconds = 6794.06 MB/sec
Timing buffered disk reads:  278 MB in  3.02 seconds =  92.08 MB/sec

Now, what can we see here? Cache performance test measures the raw throughput achieveable from system RAM. Buffered test eliminates the cache and directly measures disk's ability to read the magnetic medium. For a PC (and — Ultra ATA — 80-column cable), 30 MB/s was nice (older disks were in 15 - 25 MB/s range, but now we managed to get in the 100 MB range per disk. One traditional SCSI/Unix disk manufacturer and popular SATA manufacturer today is Seagate, although the quality of their SATA disks at least has dropped enormously. Altogether, I've had the best experience (quality-wise) with Western Digital disks.

If your disk throughput is much worse than this, chances are that DMA (Direct Memory Access) is not enabled; check that first with sudo hdparm /dev/sda | grep dma. If it's disabled, run sudo hdparm -d1 /dev/sda and repeat the test.

In any case, if you figure out any hdparm settings that boost your disk performance, you'll want to add -k1K1 to the options (to preserve settings over disk reset — something that can happen during runtime), and you'll want to save the whole line in /etc/bootmisc.sh or somewhere, to have it run on each boot.

If you have two disks, putting them on separate cables (in case they're not SATA which is one disk per cable) can further improve transfer rates.

Caution

Just don't rush and reposition the disk to which you installed Debian GNU. If you do so, the device filename will change (from say, /dev/sda to /dev/hdc) and your system will not boot properly any more! Don't do this until you read this Guide and learn how to use rescue CDs.

Actually, today you can even change disk order and partitions since most of the things target partitions by UUIDs (Unique Identifiers) and not physical position on the bus. Thumbs up for modern developments!


CD/DVD-Roms

Like hard disks, PC CD-Roms typically just work out of the box. There are some commands, however, that could be used to tune their exact behavior.

We could just say that the Linux kernel can drive CD-Roms in either standard IDE/ATAPI mode, or SCSI emulation. SCSI emulation was primarily used to allow cdrecord burn CDs on ATAPI drives. This, however, is not needed anymore, as cdrecord can now write to ATAPI devices as well.

Of the commands you can use with CD-Rom drives, there's hdparm to retrieve their basic hardware information. And then there's the setcd package that can control various other CD-specific settings, including drive speed and identification of the inserted CDs.

Finally, the standard, low-level CD-burning application is called wodim and can, besides for burning, be used to eject the drive if it hanged in RAW mode and won't listen to the Eject button. You generally do not need to use wodim if you have a graphical interface and a GUI program. But, using wodim is trivial, all you need to write the file (usually an ISO image) to the CD/DVD is wodim -vv -eject FILE_NAME.iso! In case you need to create the .iso first, copy all files to one directory (the easiest, not to search for them), and then produce the .iso with genisoimage -T -U -R -l -o mycd.iso DIR_NAME.


Graphics

Graphic cards arena is pretty interesting. If you know something about the evolution of PC graphics cards, you know there were a little myriad of graphic "modes" invented, each with its pros and cons, and each with a different set of graphic cards that supported it. Those "modes" have surely put a certain burden on backward-compatibility; even modern PC graphic cards like ATI or nVidia that probably never see a console (terminal) mode still support dozens or even hundreds of various text modes.

One of those modes is the standard character terminal mode, so you won't have any problems getting to the console (text mode) on PCs.

One other supported standard is the VGA mode which allows a resolution of 640x480 pixels in 256 colors. (Just by the way, it means the screen alone took 640x480x8 bits — or about 307 kilobytes — to fit in memory).

Then there's VESA 2.0 which is supported by SVGATextMode (and which is quite a pain to set up) so we won't discuss it.

And there's VESA 2.0 in framebuffer mode. Instead of using the card's specific instruction set, you can initialize and use the card in framebuffer/VESA mode. This was quite attractive before, when it was easier to fire up the framebuffer mode than wait for months (should I say years?) before XFree86 (X Window System implementation) card-specific drivers became available. VESA 2.0 framebuffers allowed for good resolutions, but they too were limited to 256 colors. Another big drawback was a completely unaccelerated display which flickered even when you were moving a 2D window around the screen. Nowadays the Linux kernel does support the hardware framebuffer acceleration for most popular card groups.

Framebuffers are also a comfortable option for graphic cards that do not support native console mode (such as Apple's) or were using framebuffer by design, such as some from Sun Microsystems.

The X Window implementation used today on Linux systems is X.org. You install it and it just works; it autodetects everything and doesn't even need the config file (/etc/X11/xorg.conf). And if you have an nVidia or ATI card, a program called jockey-gtk will recognize that there are proprietary drivers available to increase performance, and it'll allow you to install the drivers on a point & click. We've come a long way, indeed.


Modems

Telephone-line modems (of historical interest?)

First of all, we must say that there are two main types of modems: real hardware modems, and handicapped modems called Winmodems.

"Hardware modems" are the usual, full-featured modems. They are sometimes also called Rockwell or Hayes-compatible modems, they are attached to serial ports, and (being Hayes-compatible) they all support the standard "AT" command set. All external modems that you attach to the computer via serial cable are "hardware modems". Of internal modems that you plug into an ISA or PCI slot on the mainboard, usually only old ones (prior to year 1998 or even older) are "hardware modems".

"Winmodems", on the other hand, are once again a disastrous invention from the PC world; basically they are modems with one $5 (five dollar, at the time of invention!) chip removed, whose tasks then need to be performed by the main system processor. Winmodems are only sold as internal (and that PCI) cards, and have practically flooded the post-millenium modem market; at most "modern" computer shops, they don't even know that Winmodem is not the only type of "modem". Winmodems do not have a single technical quality or a bright idea — it's quite the opposite. Such modems need special drivers to work at all, and even then you are lucky if they support some degenerated "AT" command set. Needless to say, Winmodems are a major annoyance because all the drivers are proprietary and closed-source. The linmodems.org site is trying hard to help Winmodem owners, but it's really not worth it. Winmodems are a disgusting piece of "hardware", they smell bad, and you are better off passing them to the next lucky owner. If you can buy a real modem yourself, you're good; if you can persuade someone into swapping a real modem for your Winmodem, you're really good.

In the meantime, since originally writing the Winmodems section, the manufacturers seem to have standardized on one or two chips, and basically it is possible to get Winmodems working quite comfortably. They are, of course, as crappy as they've always been, but at least the cost of set up is no longer higher than of winmodem itself.

If you will be interested in connecting to ISPs, you will use the PPP protocol. If you'll want to use the modem in "terminal" mode, then the infamous minicom, seyon, lrzsz or modemu will be at your disposal.

If you are interested in PPP connections, you simply need to run sudo pppconfig. It is a very convenient program that will let you configure different PPP ("dial-up") connections. If you name the connection ISP1, then running sudo pon ISP1 will call the specified number and connect. plog will show some system log lines, and poff will terminate the connection. Remember that the full log is accessible in /var/log/ppp.log, we redirected messages to that file earlier in the Guide when we configured the syslog daemon.

Finally, it's worth noting that pppconfig does no magic; it only modifies files /etc/ppp/peers/NAME and /etc/chatscripts/NAME (and, well, /etc/ppp/TYPE-secrets if your ISP uses PAP or CHAP authentication method). For completness, we include a sample working configuration for a typical dial-up ISP connection.

Example 4-2. /etc/ppp/peers/provider

Notice that the string USERNAME should be replaced with the username you use to connect to the ISP. File /dev/ttyS0 represents the first COM port or, in DOS parlance, COM1 (for serial port 2, you would use /dev/ttyS1, and so on). Note that in case of internal modems, the ports would probably be /dev/ttyS2 or /dev/ttyS3; your computer does have 4 serial ports even if only one or two are available on external connectors.

# This optionfile was generated by pppconfig 2.0.10. 
# 
#
hide-password 
noauth
connect "/usr/sbin/chat -v -f /etc/chatscripts/provider"
debug
/dev/ttyS0
115200
defaultroute
noipdefault 
user USERNAME
remotename provider
ipparam provider

Example 4-3. /etc/chatscripts/provider

Notice that the number 1234567890 should be replaced with your ISP number. Also, the string ATx3m1 is a setting you should use in most european countries — it instructs the modem not to wait for dial-tone (this can be set from the pppconfig menu, Provider -> Advanced -> Modeminit).

# This chatfile was generated by pppconfig 2.0.10.
# Please do not delete any of the comments.  Pppconfig needs them.
# 
# ispauth PAP
# abortstring
ABORT BUSY ABORT 'NO CARRIER' ABORT VOICE ABORT 'NO DIALTONE' ABORT 'NO DIAL TONE' ABORT 'NO ANSWER' ABORT DELAYED
# modeminit
'' ATx3m1
# ispnumber
OK-AT-OK ATDT1234567890
# ispconnect
CONNECT \d\c
# prelogin

# ispname
# isppassword
# postlogin

# end of pppconfig stuff

Example 4-4. /etc/ppp/pap-secrets

USERNAME and PASSWORD need to be replaced with the actual values you use for authentication with the ISP, of course.

(none)  *       password
USERNAME provider PASSWORD

Mentioned Debian PPP tools are definitely superior to any "ad-hoc" programs that you would employ, but wvdial is worth mentioning because it can do a lot of auto-detection (basically, all you need to know to use wvdial is your ISP phone number, user name and password). After installation, run sudo wvdialconf /etc/wvdial.conf to create the configuration file. Then, running wvdial should establish the connection. As usual, a working /etc/wvdial.conf is provided for completeness.

Example 4-5. /etc/wvdial.conf

1234567890, USERNAME and PASSWORD need to be replaced with the actual values you use for authentication with the ISP, of course.

/dev/modem is a symbolic link to the modem device (wvdial should detect it all). If it does not, and /dev/ttyS0 were the real port (modem attached to the first serial port, or COM1 in DOS parlance), you'd create the symlink manually by running sudo ln -sf /dev/ttyS0 /dev/modem.

Additionally, the l0m0 part of the "init string" makes the modem speaker quiet during both dialling and the duration of the connection.

[Dialer Defaults]
Modem = /dev/modem
Baud = 115200
Init1 = ATZ 
Init2 = ATx3l0m0 S0=0
#Init3 = ATI 5
Phone = 1234567890
Username = USERNAME
Password = PASSWORD

If wvdial does not succeed in its mission, try going for an "all bets are off" approach — add Stupid Mode = 1 to the config file and try one more time.


Cable modems

Cable modems first need to be connected to the computer by an ethernet (LAN) or USB cable.

If you're using the cable modem to connect to your ISP, then the ISP probably provided DHCP server on their side. In other words, you can simply define the interface to pick up the configuration from a DHCP server. Your /etc/network/interfaces file should simply look like this:

auto eth0
iface eth0 inet dhcp

It's useful to know that ISPs match your ethernet card hardware address before giving you the connection details and finally access. But sometimes you're forced to change your hardware ethernet address; for example, your card might get destroyed by a lightning, or your machine breaks down and you need to replace the card or temporarily move the modem to another computer. Calling the ISP to change your hardware address is sometimes not an option — they're suspicious when you call, or it takes a while for their changes to take effect. Fortunately however, in Linux, it is possible to manually change the hardware address of your network card. You can't write the address to the card's memory permanently though (you need to repeat it on each reboot), but that doesn't affect its usefulness. For a real example, please see the Section called Network Interfaces (LAN).


ADSL modems

ADSL modems also arrive in ethernet or USB cable versions.

ADSL modems usually work using the PPPoE (PPP over Ethernet) protocol. To make your life real easy, install pppoe, pppoeconf and pppstatus packages.

If your modem is connected and turned on, pppoe -A should find at least one PPPoE provider "on air". This is very simple and should be done at the beginning to minimize problems with pppoeconf. If you see no providers listed by pppoe -A, then something's wrong. In newer Debian installations, pppoe was replaced by pppoe-discovery.

In the next step, you should read /usr/share/doc/pppoeconf/ to read a well-written ADSL introduction and practical HOW-TO. Running pppoeconf and following the instructions from the mentioned documentation file should be enough that you configure your ADSL modem. Once you do it, you can connect using pon dsl-provider and disconnect using poff. If you've redirected local2 stream to a separate file (as shown earlier in the Guide in the syslog daemon section), then you can also run plog to see latest "news" regarding the connection.


Mice (console)

Besides in the X Window System (which will be covered separately), you can also use mouse in your text consoles. That is, by the way, very convenient, because you can simply select text with the LMB (Left Mouse Button) and paste with the MMB (Middle Mouse Button).

The traditional program handling console mice is called gpm. After installing gpm, run sudo gpmconfig to configure it. The device file to use is /dev/psaux for PS/2 port mice, /dev/ttySN for serial, and /dev/usb/mouseN for USB (where N is a number dependent on the port assigned, starting at 0). Of protocol types, many exist and many are supported; on PCs, the "auto-sensing" autops2 is the best (and default) choice. Repeat protocol is a convenient way to let gpm handle mice, and "repeat" events on file /dev/gpmdata so that other applications (such as X Window System) could use the same events. This, however, is not necessary since nowadays they can both open the mouse file directly.

You can start, stop or restart the gpm daemon by invoking, for example, sudo /etc/init.d/gpm restart.

And again, gpmconfig does no magic, it simply modifies /etc/gpm.conf which I provide for completeness.

Example 4-6. /etc/gpm.conf

#  /etc/gpm.conf - configuration file for gpm(1)
#
#  If mouse response seems to be to slow, try using
#  responsiveness=15. append can contain any random arguments to be
#  appended to the commandline.  
#
#  If you edit this file by hand, please be aware it is sourced by
#  /etc/init.d/gpm and thus all shell meta characters must be
#  protected from evaluation (i.e. by quoting them).
#
#  This file is used by /etc/init.d/gpm and can be modified by
#  /usr/sbin/gpmconfig.
#

# PS/2 Mouse example
device=/dev/psaux
responsiveness=
repeat_type=
type=autops2
append=""
sample_rate=

Audio cards

The Linux audio drivers story is quite interesting. Traditionally, Linux used the Open Sound System (OSS) architecture. It was written by guys who ran their own company and wanted to make money on audio drivers, and only contributed old or uncommon drivers to the Linux kernel tree.

This was, as you might guess, pretty unfortunate, so the folks got together to rewrite the audio system and get rid of the hog. The result was ALSA - Advanced Linux Sound Architecture. ALSA is the default sound system in the Linux kernel 2.6 series.


Digital cameras

Some of the digital cameras were (or still are) supported directly by the Linux kernel, allowing you to mount their memory cards as the usual disks.

One other, and more common way, is to use user-space drivers for the cameras. The GNU gphoto2 application supports numerous camera models (533 in my version, not counting models that are not listed but work using an existing driver!), and does various camera-specific tricks. Here's how I use it:

gphoto2 --port usb: --get-all-files --camera "Kodak DC3400"

Another common trick these days is to plug the memory card in some universal reader device and then mount it as a standard disk. This way it's also possible to upload files to the card even if the camera itself doesn't support uploading (so you can't use gphoto2).


Printers

Traditionally, printing in Unix was done by the BSD lpr system (lp or lpr stand for Line Printer, although printing is all but limited to line/matrix printers, of course). Afterwards, lprng (lpr New Generation) appeared as well, offering some enhancements while preserving compatibility. The main configuration file was /etc/printcap, describing printers' capabilities.

Probably the three most relevant printing-related commands are lp, lpq and lprm. They print files, list, and cancel print jobs. All rival printing systems offer a compatibility mode in which all those commands are command-line compatible.

With PostScript printers it's quite easy. With PC printers, as always — the things are a little different. To overcome printing problems, (primarily on GNU/Linux PCs, but on other Unices as well), CUPS — Common Unix Printing System — was developed, and it's what we are going to use. Let's first install it:

$ sudo apt-get install cupsys cupsys-{bsd,client} foomatic-db foomatic-filters-ppds

By default, CUPS listens on http://localhost:631/. Visit the page with your web browser, and you'll be able to perform printer administration tasks from quite an usable web GUI. The username and password you're asked for are "root", and root's password.

For details on supported printers and the drivers you can use, surely visit linuxprinting.org.

CUPS can also be configured directly, just using config files in /etc/, but that is out of the scope of this Guide.

Here's what I wrote years ago in this section: "Printer setup is still unnecessarily too hard to get right, but hopefully things will get better." Well, things have surely changed for the better. Almost all printers are nowadays recognized by Cups (auto-detection finds them and the drivers exist). What's more, if you're using a modern desktop, such as Gnome, XFCE or KDE, or a more "out of the box" oriented flavor of Debian, such as Ubuntu, there's a great chance the printer will be automatically recognized and installed with the appropriate driver as soon as you plug the cable to the computer!


Chapter 5. Unix software technologies; constituent pieces and underlying concepts

 

"Knowing how things work is the basis for appreciation, and is thus a source of civilized delight."

 
--William Safire 

Introduction

In the section above, the Section called Booting the machine; runlevels and system services in Chapter 3, we've explained what does the system do to get from a power-on to some usable state. By now, you should have also learned how to log in, of course, and wander around the system a little. The system you see, however, is very much "alive", it's not just a collection of commands and files waiting to be ran or read. Those "live" parts (or subsystems) are crucial to Unix and account for a lot of what Unix stands for.

We are about to poke around the neighborhood and meet the crowd.


System Login

Under the term system login, we assume an action of verifying one's credentials, setting up access rights, and letting users proceed with their computer session. Exactly how does that session looks like, depends on the actual service requested and the type of the users' client software.

In general, users are given individual accounts, to which they can log-in. There are two main groups of accounts:

  • System accounts - accounts that are registered on a system level, usually in files /etc/passwd, /etc/group and /etc/shadow. Mentioned files form the traditional Unix users authentication scheme, although such information can also be kept at various databases, for example in so-called directories which consist of key:value pairs and are optimized for massive read-only access ( LDAP).

    System accounts are service-independent and deeply rooted in Unix philosophy. One of their key values is full accountability in terms of dates and times of access, performed actions and system resources used. Typical examples are the accounts you use to access all telnet, SSH and FTP services.

    Those "real" accounts will be of our primary concern, and we shall refer to them as system accounts or simply accounts.

  • Virtual accounts - accounts that are not registered on a system level, and instead live in service-specific databases. Those databases could be based on files or LDAP behind the scenes as well, but because virtual account solutions are popular for simplicity and ad-hoc setup (except for few notable implementations), most of them today seem to live in MySQL databases. Typical examples of virtual accounts in use are various Web shops, Web memberships, mailing list membership or "inventions" like e-mail-over-web. Virtual accounts have also been fairly popular in setups where users do access their e-mail using proper protocols, but only have "virtual mailboxes" on the servers instead of real accounts.

    As we've mentioned, virtual accounts are mostly service-dependent and are, lacking any formality in both design and implementation phase, inherently inconvenient to account for. Instead of re-using the established system infrastructure, applications must handle virtual users in their own ways. In addition, instead of performing tasks under the appropriate system accounts' privileges, such applications run under a single username, further complicating any deeper access control and usage statistics.

    We see how the computing word around us has changed over the past 10 years (for better or worse). Almost everything is nowadays in some form of virtual accounts, this is virtual, that is virtual, everything is virtual! Actually, this is too funny — here's what I wrote about virtual accounts in this Guide back in 2002: "Virtual accounts are a disaster (except, again, for few notable fiels of use and implementations) and have blossomed since 1995 onwards, the period that was characterized by the advent of 'personal computers' and the disappearance of all technical correctness from the civil sector."


Console login

Probably the most straight-forward way to log-in to the system is to sit between keyboard and chair, and log-in at the local system's console prompt. In general, a variant of the getty program will be listening on the consoles to receive your authentication info. Debian default, /sbin/getty, was spawned by /sbin/init. /sbin/init, in turn, took its configuration from the already-mentioned /etc/inittab file. There are many getty variants available (try running apt-cache search --names-only getty for example) but "getty" has also been established as a kind of a generic name for the whole class.

It's interesting to note that getty reads in the initial username and password, and pulls out of the deal by passing control onto the /bin/login program. The question is, however, what happens if you type in a wrong username or password (or do not authenticate successfully for some other reason)? Since getty is out of the game, the login program itself will serve you with another prompt, although it will look exactly the same as the original getty one. Only if you fail to authenticate for a couple of times in a row, or terminate your session, will /bin/login close down and (thanks to init) /sbin/getty be respawned ("started again") to wait for new logins.

NoteDetermining who's behind the login prompt; /sbin/getty or /bin/login?
 

If you press Enter on an empty console login prompt, and the system immediately serves you with a new one, you're talking to the system getty. Otherwise, such as if there is a timeout first, you're talking to /bin/login.


The 'login' shell

Supposing you manage to authenticate successfully and the getty or login programs let you through, what happens next?

Well, before we can answer that, you first need to get familiar with your entry in the /etc/passwd file. There are many ways to retrieve it; you could open the file in a text editor and search for your username, you could run grep $USER /etc/passwd, and you could run getent passwd $USER. The last variant is suggested as it can work with arbitrary user authentication scheme. A sample entry might look like this:

$ getent passwd $USER   
mirko:x:1000:1000:Mirko,,,:/home/mirko:/bin/bash
Fields 6, 7 and 8 specify users' GECOS information, their home directory, and the default shell.

Generally, after you have been authenticated, the software spawns the specified shell for you and changes to your home directory. Since Unix sites and users often configure their environment, there's are global tuning files available, /etc/profile and /etc/environment (the first is an executable script, the other is a collection of KEY=VALUE pairs and does not exist by default). The bash shell also reads /etc/bash.bashrc and possibly other /etc/bash* files (if configured to include them). After the site-wide configuration files are honored, the shell reads its corresponding user-specific dotfiles at startup. Again, in case of the bash shell, those are ~/.bash_profile or ~/.bashrc.

At this point, it is important to learn the difference between login- and non-login shells. Login shells are the special case where users are at the other end of a connection (instead of a batch script file or another program) and use the terminal interactively. When you log-in to the system using telnet or SSH, you're given a login shell. Login shells read ~/.bash_profile, which should contain settings relevant for interactive work (command aliases, prompt display, etc.). All other shells are non-login shells.

The root user does not read /etc/profile file and, by Debian convention, its dotfile is ~/.profile instead of ~/.bash_profile (but this is in no way enforced - if ~/.bash_profile was present, it would take precedence).

We could mention that the "shell language" was standardized by POSIX, so any shell files that are not bash-specific should be free of any bashisms. And indeed, there's a strong movement present in Debian to free the maintainer scripts of all non-POSIX-compliant constructs. Following the analogy, your root user's ~/.profile file should be written with POSIX sh standard in mind. Ksh (the Korn Shell) is very POSIX-compliant and you could use it to write POSIX-compliant scripts; see the Korn Shell (ksh) Programming page for additional information.

It is also useful to note that the shell does not use any secret techniques to read the dotfiles; it evaluates them in the context of the existing process using the source or . (a dot) command.

When those "startup" tasks are performed, the system shows the command prompt and is ready to accept commands.


Account login regulation

Since most of the accounts on your machine will be used locally, by yourself, there's no reason to let people log in remotely, right? You could be interested in giving your friends access, but that's a different issue — you would give them their own accounts and take some basic precautions before opening the system to the World.

This is all pretty hard to explain right now, because it already touches that magic World of Unix security, which is so broad and deverse that any immediate commentary on it would distract us noticeably, even if we ignored the "intuitive" thinking and stuck to formal definition.

So anyway, as we concluded we don't want people logging in remotely, edit file /etc/security/access.conf, read short introductory text included in the file, and add something like this to the end:

-:root mirko ante:ALL EXCEPT LOCAL

The above would deny access to root, mirko or ante, except from the localhost. Settings in the /etc/security/access.conf file are honored because the PAM subsystem can be configured to read them, as we'll just see explained in the next section.


Pluggable Authentication Modules (PAM)

So far, you should have understood that, in Unix, there are many data protocols (FTP, HTTP, telnet, IRC, ...) and their implementations (vsftpd, Apache, telnetd, dancer-ircd, ...).

Since most of the services require user authentication, it becomes obvious that supporting all kinds of authentication in every service would be hard, require a lot of manual and repetitive work, and be error-prone. On top of that, implementations would most probably end up being inconsistent, having different interpretations of "standard", and contain suble, hard-to-find bugs.

Fortunately, computer science is old enough that people came about to spot the problem, and think about eventual remedies. The idea that Sun Microsystems came up with was a generic Pluggable Authentication Module layer, or simply — PAM. Generally, each service makes a straightforward call to PAM and expects a Yes or No type of answer. This allows for one size fits all approach in client software; to perform all authentication work, simply invoke PAM and don't worry.

Even though PAM only returns a positive or a negative final answer, one could suppose that PAM uses more sophisticated techniques in reaching this boolean (Yes/No) conclusion. And indeed it is so. Each service drops a piece of its PAM configuration to PAM config files. That configuration can request arbitrary authentication steps to be performed, combined and stacked in any order (including either-or variants). There's also a default which you can use to handle multiple services with the same config file.

For example, you could configure PAM to authenticate the user if either his retinal scan matches the database, or he posseses both the correct RSA private key and a one-time password. And supporting arbitrary other authentication scheme becomes as easy as writing a PAM module to handle the specific method.

There are three main PAM implementations in use today: Solaris PAM used by the Solaris OS, Linux-PAM used by all Linux "distributions", and OpenPAM used by BSD-derived systems.

Linux-PAM is also the PAM implementation used by Debian. One very unfortunate fact is that, while PAM itself provides a standardized API even for requesting additional input from the user (which is quite a feat), it does not standardize the logging interface. Some Linux-PAM modules do not log at all, and those that do are not forced to consistency by formal methods. This is such a critical omission that it consequently puts PAM practice in a completely different light.

The solution to the PAM logging problem, however, came unexpectedly. Sebastien Tricaud added Prelude support to PAM 0.79, so PAM can now consistently report all the action to the Prelude manager.


System task schedulers

Computers do one thing well - they happily execute highly-repetitive tasks that you could never complete yourself in a reasonable amount of time (let alone the boredness experienced along the way). From that perspective, it's obvious that every serious operating system should have a way to schedule tasks for execution at some later, future time, or in a repetitive (periodic) fashion.

The "pioneering" work in automated schedulers was done by a chief of an IBM-powered farm (with crops, animals and all), back in the 1970's. He reduced three 8-hour shifts to two 8-hour shifts, replacing the third person (ho had practically nothing to do but run one system command at 3:00 am) with a timer-powered Lego block that would drop from a height onto the Return key.

Unix systems today have two schedulers available — at and cron. And all others looking for a non-trivial Unix batch processing system should look at Generic NQS.


At

As you might conclude from the command name, at is designed to run each job once, at the specified time. The time specification is very elaborate and supports all kinds of definitions, such as now, 12:00 or 12:00 tomorrow. It can also combine constructs like, 10:00 pm + 3 days; for a complete specification see /usr/share/doc/at/timespec.

For example, you could try echo "echo Hello, World" | at now + 1 minute. In a minute, you should see "Hello, World" in your mailbox. The example supplied the command "in place", but this is Unix so you can also save the set of commands to execute in a file (say, cmds.at) and then run at -f cmds.at now + 1 minute.

You can view pending jobs by running atq, and eventually remove them using atrm. The at package shipped with Debian also supports simple batch execution using batch. For all the information see the appropriate manual page.


Cron

Cron, or the Unix scheduler, periodically executes system or user tasks. As you can guess, cron plays a significant role on every Unix system and is, as such, part of Debian GNU as well.

In essence, cron configuration file consists of each task defined in its own line. In turn, each line consists of 5-field time specification, and the task to execute. The first five fields indicate minutes, hours, days of month, months, and days of week. Here are a couple of examples to clarify the subject:

# Run each minute
* * * * *  /usr/local/bin/syscommand

# Run every 15 minutes
0,15,30,45 * * * *  /usr/local/bin/syscommand

# Run every 15 minutes, enhanced specification
*/15 * * * *  /usr/local/bin/syscommand

# Run every 2 hours
* */2 * * *  /usr/local/bin/syscommand

# Run once every hour in period from 8:00 am to 3:00 pm
0 8-15 * * *  /usr/local/bin/syscommand

Cron configuration file is interesting. Make sure you read man 5 crontab.


System crontab

As usual, Debian GNU contains crontab in the base system. There's a number of great things going on on the system, even when you've installed nothing but the minimal setup.

Debian GNU's system crontab file is (would you guess?) /etc/crontab. You can see that, in between the time specification and the command to execute, this specific file accepts the Unix username to run the task as. (While this itself is convenient and easy to look into, you can of course specify a different username in the command specification as well). Furthermore, you see that Debian GNU prepared /etc/cron.*/ directories where both you and packages' postinstall scripts can simply drop tasks to execute. For example, if you want to execute once a day, just drop a script to /etc/cron.daily/. If, on the other hand, you want to exactly control the time, drop a file in /etc/cron.d/, where crontab config files are expected (or, if you must, edit /etc/crontab directly).


Users' crontab

System users can also have their crontabs. All you have to do as a system user, is to run crontab -e, type in your specification, and exit the editor. The crontab will be automatically installed. You can review your crontab by doing crontab -l, and remove cron jobs with crontab -r. See man 1 crontab for more information.

Besides running crontab -e, it's also possible to manually write the specification in an arbitrary file, then invoke crontab FILE.

Administrators can allow or forbid system users to use crontab; look for cron.allow and cron.deny in the crontab manpage.


Inet Meta Daemon

Inetd is yet another interesting concept but it needs a little general introduction first.

As you might or might not know, Unix daemons (or servers) accept client requests and then do something useful by first listening on specified ports for incoming connections. Typical examples are telnet, FTP or WWW servers which run on ports 23, 21 and 80 respectively. When the client connection arrives, the daemon process is forked (or "duplicated", roughly), it attaches to a high port (above 1024) and establishes a direct connection with the client. It then begins handling the client's request and exchanging data back and forth, all independently of the initial listening port (the listening port is immediately made available again to continue listening for new new connections).

Following the above logic, it became meaningful to have a specialized server that only listens for client connections, and then forks the appropriate daemons to handle actual requests. The result is the Inet Meta Daemon.

Inetd, however, did not have a shining security record, and it became too inflexible and slow for today's standards. In addition, Inet needed non-transparent support in every server program, so no wonder it slowly got out of mainstream Unix.

But we still mention Inetd here for numerous reasons; it's an important part of Unix, it's still being useful for particular applications, and it can be easily overlooked when trying to increase overall network security of your system.

There are a few Inetd implementations, but the default used by Debian GNU is the openbsd-inetd variant. (Previously, Debian used the implementation from the venerable NetKit, still available as package netkit-inetd). It's config file is /etc/inetd.conf and you should disable all the unnecessary services in it — probably all there are — and call the usual sudo invoke-rc.d openbsd-inetd reload.


E-Mail

Debian uses an extensive e-mail system based on the Exim mailer. See packages exim4 and exim4-config.

Exim is to elaborate to cover here. What's important is that at installation time, it asks you a couple questions and in most cases configures a basic, working email server on the machine. From that point on, it's easy (or "easier") to implement your modifications or setup requirements.

The upside is that, being the Debian default, it got all related Debian packages to work with it out of the box, so you can get many additional programs, such as greylistd (greylisting implementation — one of spam prevention methods) or mailman (mailing lists manager) to work with it with no or minimal effort on your part.

To get a grip on Exim, see documentation at Exim.org. To get a grip on Debian packaging and file layout, see /usr/share/doc/exim4/README.Debian.gz.


Tcp Wrappers

To restrict access to our systems and services, we can use packet-level and application-level solutions. Packet-level solutions are usually called firewalls, and usually perform their work from within the system kernel. In other words, if the firewall rejects a package, it will be discarded well before it gets the chance to reach the actual application at all.

We can, however, control access on an application level too. Application-level control can be implemented using proxies (content-based), TCP Wrappers (source/destination-based), custom methods, or a combination of those. As the section title says, we're going to take a look at TCP Wrappers here.

Basically, TCP Wrappers serve as a generic application-level access control mechanism, and were first developed by Vietse Venema. TCP Wrappers were most useful in combination with Inetd, but have been since integrated into a number of standalone services.

When a packet reaches the system (and the corresponding service listening for requests), all the application has to do is call for a TCP Wrappers check. Based on connection details (remote IP, remote username, destination service etc.), TCP Wrappers pass or deny requests. At that point, the application either continues with the client authentication (username/password mostly), or closes the connection.

Tcp Wrappers are a standard part of Debian. For more information see hosts.allow(5) and hosts.deny(5) manual pages.

TCP Wrappers can also serve as an example of professional programming practices — they come with a set of additional programs developed to conveniently test your configuration files and hypothetical connections; see tcpdchk(8) and tcpdmatch(8) manual pages.

To deny all services to remote addresses, make sure the file /etc/hosts.allow is empty, and put this in /etc/hosts.deny:

ALL: ALL EXCEPT LOCAL 127.0.0.1: DENY
For more information (including on how to trigger system commands upon incomming requests) read hosts_access(5) and hosts_options(5) manual pages.

NotePlease Note:
 

Tcp Wrappers and a firewall have very little in common; the level at which the allow/deny decision takes place is fundamentally different. With a firewall, it happens on a lowest, packet level: the packet targeted at say, an FTP port, could be dropped by the firewall as soon as it gets received by the network hardware and processed by the operating system's network layer — it would never reach the FTP daemon. With TCP Wrappers, the packet does reach its destination (Inetd, or a standalone service). The validity check must be explicitly called for by the handling application, and is usually performed before the server forks (starts) a new child process to eventually service the incoming request.

Today, one of the most known uses of the TCP Wrappers is via the denyhosts package. It is a Debian package that works out of the box and protects your SSH daemon from endless password guessing attacks (which are happening so often that it's unbelieveable). When a remote system tries to log in one too many times, denyhosts puts the IP on a temporary deny list in /etc/hosts.deny. When that happens, the client will see connection error: ssh_exchange_identification: Connection closed by remote host. The IP will be expired from the list after a while automatically. (On a side note, your first line of defense on SSH is to deny direct root logins using "PermitRootLogin no" in /etc/ssh/sshd_config. Denyhosts will then take care of the rest).



Conclusion

Congratulations on following through the Guide.

I initially wrote it in 2002, and things have changed enormously since then. However, better understanding and deeper knowledge always have a value, and with Linux — maybe even more so today then they've had before.

There are a couple other sections of the Guide I had in mind, but I either didn't get a chance to write them, or the systems I described changed their implementations radically enough that chapters needed a complete rewrite, and in the absence of time to do it I just removed them.

Some sections are also of lower quality, text-wise, but they nevertheless contain various interesting technical bits.

Anyways, altogether, I hope you enjoyed this brief "mix of everything"!

I invite you to continue reading other, more serious guides from the Spinlock Solutions' DKLAR series, the MIT Kerberos 5 Guide, OpenLDAP Guide, OpenAFS Guide and FreeRADIUS Guide.

Cheers!


Davor Ocelic,
Spinlock Solutions