Talkback: Discuss this article with peers
Just a few minutes before sitting down to write this article, I managed to fix a problem that has been the bane of my existence for the last two weeks. Since it is a problem that I have often seen mentioned in the Linux Gazette, usually phrased in a manner that shows the writer to be standing on a chair with a noose around his neck and typing with his toes, I've decided to share it with other readers, hopefully saving them wear and tear on good rope. This may also serve as a good guide to troubleshooting software problems in general. Be aware, though, that a login problem could involve _any_ of the areas described - what fixed my particular machine may not be the solution for yours.
A couple of weeks ago, I decided to install an MUA (Mail User Agent) on my machine. A strange thing to do, considering that I live on a sailboat anchored well away from phone lines or electricity - but I had my reasons. I'd done this on land-based systems before; there was just a bit of experimentation that I wanted to do. Wel
l, as a pride of lemmings goeth before a fall off a cliff, so does an MTA (Mail Transfer Agent) go before an MUA - you need something that will deliver the mail, otherwise there's not much point in writing it! So, an MTA/MUA installation. No problem - I keep the entire Debian distribution on the Linux partition of my hard drive; this speeds up installations as well as making package searches a trivial task.
If truth be known, I don't like 'su', at least not for major tasks: the fact that it keeps the original user's environment variables, rather than assuming those of the account being "su"'d to, has caused me a few "interesting moments". Yeah, a quick permissions change or an /etc file modification - all right, - but for serious work, like installing and uninstalling several major packages (I wasn't sure which MTA I wanted yet), I log in as `root'.
On to the task. Midnight Commander makes it the work of a few keystrokes to dive into and explore a directory tree, as well as letting you look inside - and install - any Debian or RedHat package. Let's see... `sendmail'? (Read the `man' page inside the package, look at the docs, install...) Nope, too big and complex. I need something a bit simpler. (Uninstall.) `exim'?... `exmh'?... `mh'?... `nmh'? All got the same "install/uninstall" treatment, with the exception of required libraries: whenever I install a library, it stays installed. After a bit of doing this on a new system, I don't get any complaints about `Required libraries missing' - if it wasn't for the fact that a number of libs in any given distribution are `either/or' choices (they'd conflict with each other), I'd install the entire "libs" directory and never worry about it again!
However, I still had an MTA to choose. Ah, `smail'! Easy to install, painless to configure - done. Easy choice for an MUA - I really like the configurability of `mutt' - and I'm finished! (Prophetic words...)
EXCEPT. Now, I found that I could not log in as a non-root user anymore. The message I got was:
Cannot execute /bin/bash: Permission denied
What in the heck was this?
`Was this some occult illusion?
Some maniacal intrusion?
These were choices Solomon
Himself had never faced before...'
I knew that I hadn't done anything in /etc/password - for that matter, anything in /etc - but I wasn't 100% sure of what those packages, safe as they're supposed to be, were doing under my auspices as `root'. So, I quickly did some double-checks - yes, user `ben' still existed in /etc/password; ditto for group `ben' in /etc/group; entering the wrong string as a password provoked the usual `Login incorrect' message instead of the `Cannot execute'. Hmm.
Another double-check: I created a new user ("joe"), new password and all
("joe") At this point, I let out a quiet "eep!" of minor panic, very
quickly switched to another VT, and tried to log in as `root'. WHEW; no
problems there. At least I would still have access to the machine when I next
brought it up... I'd have hated to do an immediate `live' backup and
reinstallation!
Open up /bin. What do the file permissions look like? Uh-huh...
everything is set to 755 (-rwxr-xr-x); in addition, `login', `mount',
`umount', `ping' and `su' are all SETUID (-rwsr-xr-x). So far, so good;
how about /etc permissions? They all look OK too - mostly 644
(-rw-r--r--), with an occasional 600 (-rw-------) here and there, for
files denied to everyone but `root'. All right, let's try something
silly; I overwrote `login' and `bash' with fresh copies, straight out of
their original packages, to make sure that they weren't corrupted. Nope;
still no luck.
Wait, how about /home? If the permissions on that got mis-set and the user
couldn't get in... Rats, it was fine too - 6775 (drwxrwsr-s). Checking the
.bashrc and .bash_profile showed nothing unusual - and their perms were
OK. Just for kicks, I checked all the other subdirectories in '/'; all
except /root were world-readable, which was fine.
There are a couple of files in /var that keep track of who's logged in,
when they logged out, and so on; if these guys get corrupted, *all* sorts
of strange unpredictable stuff happens. So - emergency measure time! - I
typed
Permissions on /dev/ttyX and /dev/vcsX (terminals and virtual consoles)?
They all looked OK too; I was starting to lose hope.
Wait; what about a systematic approach? Let's get an idea of exactly
what's happening before running in every direction. A quick look at the
System Administrator's Guide (SAG) to refresh my memory - ah, there's the
login process:
First, init makes sure there is a getty program for the terminal
connection (or console). getty listens at the terminal and waits for the
user to notify that he is ready to login in (this usually means that the
user must type something). When it notices a user, getty outputs a welcome
message (stored in /etc/issue), and prompts for the username, and finally
runs the login program. login gets the username as a parameter, and
prompts the user for the password. If these match, login starts the shell
configured for the user; else it just exits and terminates the process
(perhaps after giving the user another chance at entering the username and
password). init notices that the process terminated, and starts a new
getty for the terminal.
Note that the only new process is the one created by init (using the
fork system call); getty and login only replace the program running in the
process (using the exec system call).
Following the process, we can see that everything up until the last part
- the 'exec("/bin/sh")', that is - seems OK. It's during or after that
hand-off that things go wild. The problem was now down to system calls,
something I wasn't quite sure how to approach... and yet that piece of
information contained everything I needed to know; I just didn't know how
to apply it. Later on, it would become self-evident.
Over the next ten days or so, every time I logged in I would try
something new; some things totally outlandish and unlikely to work; some,
bright ideas that produced great disappointment when the Evil Message
once again showed its head. Nothing worked. I replaced `getty'; tried a
couple of shells other than /bin/bash; tried "su"ing to `ben'; checked
the logs (they showed `ben' as having successfully logged in (!), which
told me that `login' was fine; the failure occurred when it handed the
process off to `bash' - I knew that!)...
After finding only a few references to this on the Net - mostly in
Japanese, Swedish, and German (I managed to puzzle out the last two - one
of them suggested checking perms on '/' ! Excellent idea... which didn't
pan out in my case), I shot off a panicked resume of the problem to the
The Answer Guy Ah - `strace'! Remember `strace'; `strace' is your friend... A really
fantastic piece of software that traces the execution of a program and
reports it, step by step. Let's go!
Since you have to be logged in to run a program, I ran
from my current VT; this meant "Run strace on `login ben'; print all
lines up to 10000 characters long (I didn't want to miss any messages, no
matter how long they were); make the output verbose; trace any forked
processes; output the result to a file called `login.ben'". Then, as a
baseline, I ran
`strace login' makes for very informative reading. If I hadn't already
read the System Administrator's Guide, this would have given me the exact
information - in far more detail. It shows all the libraries that are
read, every file examined by `login', the comparison procedure for
`group' and `password'... the only thing it did NOT show was the reason
for the failure; just the fact itself, at exactly the point in the
procedure where I expected it to be:
Just great. The last thing poor `login' tried to do, before falling over
on its back with its legs twitching in the air, was to `execve' bash with
the defined variables collected from /etc/password, /etc/login.defs, and
so on - all of those looked OK - and write those 44 hateful characters to
"stderr" (output descriptor 2). Basically, the stuff I'd already figured
out.
I did notice, however, that `login' was opening a number of libraries in
/lib that were needed by the Name Service Switch configuration file
(/etc/nsswitch.conf). What if one of the mentioned libraries was
corrupted? That would be right in line with the `system calls' theory -
since libraries are where the system calls come from! Let's check the lib
that handles local logins for NSS (see `man nsswitch'):
Humm. The very core of the Linux libs. Well... a quick replacement of all
the /lib/libnss* ... and no change. Next idea.
This procedure got me thinking, though. Something was indeed "rotten in
the state of Denmark" - perhaps I needed to check perms on the files in
/libs?
The only problem was, I didn't know what they were supposed to be. You
see, most of the libs are set to "root.root 644" - owner root, group root,
user - read/write, group - read-only, others - read-only. There are a few,
though, that should be set "root.root 755" - as above, but with "execute"
permissions for everyone added... and without looking at a fresh Linux
installation, I had no idea of what was right.
WAIT a minute! As I'd mentioned in a 2-cent tip that I'd sent in to LG, I
like to keep a copy of a Debian "base installation" file set (7 files,
about 15MB) on my DOS partition as a 'rescue' utility - it should have
everything I need!
Yes, I did check the perms on all the other libraries; `ld-2.0.7.so' was
the only one that was affected. The only remaining `unknown' was how the
perms changed in the first place... but I suspect that question will never
be answered.
As usual, the lessons that Linux teaches are hard - but fair. There's
*always* a way to solve a problem; admittedly, often the easiest way is to
reinstall the system, but this does not teach you the "innards" of an OS
the way tracking down a problem will. In my case, reinstallation would
have been relatively easy: I have a couple of spare drives, easily big
enough to hold my "up to the minute" data so that I don't even need to
touch my backups, and a basic Debian install takes me less than 10
minutes. I wasn't interested in that. The thought uppermost in my mind
was: "What would happen if this occurred at a customer's site?" I
needed
to know what the right solution was... and through persistence - no, sheer
bloody-mindedness - I succeeded.
I don't suggest that every one of you beat his brains out against some
difficult problem once a week just to "keep in practice" - but I do
suggest that you use a methodical approach, based on knowledge gained
from reading the appropriate HOWTOs and other documentation available
before grabbing that installation CD yet another time. There will be
times when you'd like nothing better than to laugh maniacally as you
watch your system shrink to a pinpoint, dropping away from your lofty
perch on the Empire State Building... and there will be other times when
the satisfaction of having solved a knotty problem of this sort makes you
pound your chest and do Tarzan imitations.
Now, if you all will excuse me, I've got a chimpanzee and an elephant I'm
supposed to meet...
Happy Linuxing to all,
cat >/var/log/wtmp
cat >var/run/utmp
which blew their contents away and left them as zero-length files.
[He actually typed this without the "cat", but I put the "cat" in to
make it clear that the ">" was part of the command line and not the
shell prompt. -Ed.] I
logged out on all VTs (just so `utmp' and `wtmp' would get some data),
and...
From the "System Administrator's Guide", by Lars Wirzenius
' ' ' ' ' ' ' '
------------ ' GIF2ASCII '
| Start | ' conversion by '
------------ ' "fastfingers" '
V ' program '
------------------- ' Copyleft 2000 '
___________| init: fork + exec |_______ ' ' ' ' ' ' ' '
| | "/sbin/getty" | |
| ------------------- |
^ V ^
| ---------------------- |
| | getty: wait for user | |
| ---------------------- |
^ V ^
| ---------------------- |
| | getty: read username,| |
| | exec "/bin/login" | |
| ---------------------- |
^ V ^
| ---------------------- |
| | login: read password | |
| ---------------------- |
^ V ^
| / \ |
| / \ |
------------- / Do \ |
| Login: exit |---<-No- / they \ |
------------- \ match?/ ^
\ / |
\ / |
\ / |
| Yes ^
V |
------------------------ |
| login: exec("/bin/sh") | |
------------------------ ^
V |
---------------------- |
| sh: read and execute | |
| commands | ^
---------------------- |
V |
---------- |
| sh: exit |-----------
----------
Figure 8.1: Logins via terminals: the interaction of init, getty, login,
and the shell.
strace -s 10000 -vfo login.ben login ben
strace -s 10000 -vfo login.root login root
- and now, I had two files to compare. The `root' one was about twice
as long as `ben' - that made sense, since a successful login goes on to
execute all the stuff in the "~/.bash*" files.
(300+ lines elided)
execve("/bin/bash", ["-bash"], ["TERM=linux", "HZ=100", "HOME=/home/ben",
"SHELL=/bin/bash", "PATH=/bin:/usr/bin", "USER=ben", "LOGNAME=ben",
"MAIL=/var/spool/mail/ben", "LANG=C", "HUSHLOGIN=FALSE"]) = -1 EACCES
(Permission denied)
write(2, "Cannot execute /bin/bash: Permission denied\n", 44) = 44
dpkg -S libnss_compat-2.0.7.so
("Tell me, O Mighty Debian Package Manager, whence cometh said program?"),
and the Debian Oracle, in his wisdom, replied -
libc6: /lib/libnss_compat-2.0.7.so
And so it was. Midnight Commander, via its "Virtual File System", allows
you to explore compressed files as if they were directories; a look
inside "base2_1.tgz#utar/lib" (the VFS syntax used by MC) showed me that
one of the very first libs - ld-2.0.7.so - was supposed to be set to
755. Ten seconds later, I was the owner of a brand-new Virtual Terminal -
as user `ben'.
Ben Okopnik
Copyright © 2000, Ben Okopnik
Published in Issue 52 of Linux Gazette, April 2000
Talkback: Discuss this article with peers