For some reason, whenever I open up the Wikipedia, I end up with tons of tabs in my web browser, and usually the tabs are completely unrelated to each other. :P

Yesterday, I ended up looking the xargs Wikipedia article, and there I found an interesting note:

Under the Linux kernel before version 2.6.23, arbitrarily long lists of parameters could not be passed to a command,[1] so xargs breaks the list of arguments into sublists small enough to be acceptable.

Along with a link to the GNU coreutils FAQ.

And from there a link to the Linux Kernel mainline git repository.

After a bit of googling, I found a very nice article describing in great detail the ARG_MAX variable, which defines the maximum length of the arguments passed to execve.

Traditionally Linux used a hardcoded:

#define MAX_ARG_PAGES 32

to limit the total size of the arguments passed to the execve() (including the size of the ‘environment’). That limited the maxlen of the arguments passed to about 128KB (minus the size of the ‘environment’).

(Note: actually, very early Linux kernels did not have support for ARG_MAX and didn’t use MAX_ARG_PAGES, but back then I was probably 2-3 years old, so it’s ancient history for me :P)

With Linux-2.6.33, this hardcoded limit was removed. Actually it was replaced by a more ‘flexible’ limit. The maximum length of the arguments can now be as big as the 1/4th of the user-space stack. For example, in my desktop, using ulimit -s I get a stack size of 8192KB, which means 2097152 maxlength for the arguments passed. The same value you can obtain by using getconf. Now, if I increase the soft limit on the stack size, the maxlength allowed will also increase, although with a 8192KB soft limit, the ‘ARGS_MAX’ is already big enough. Two new limits where also introduced, one on the maxlength of each argument (equal to PAGE_SIZE * 32), and the total number of arguments, equal to 0x7FFFFFFF, or as big as a signed integer can be.

Linux headers however use the MAX_ARG_STRLEN, I think, as the ARG_MAX limit, which forces libc to #undef it in its own header files. I’m not sure, since I haven’t looked into code yet, but at least for Linux, ARG_MAX is not statically defined anymore by libc (ie in a header file), but libc computes its value from the userspace stack size.
(edit: that’s indeed how it works for >=linux-2.6.33 — code in sysdeps/unix/sysv/linux/sysconf.c:

    case _SC_ARG_MAX:
  #if __LINUX_KERNEL_VERSION < 0x020617
        /* Determine whether this is a kernel 2.6.23 or later.  Only
           then do we have an argument limit determined by the stack
           size.  */
        if (GLRO(dl_discover_osversion) () >= 0x020617)
  #endif
          {
            /* Use getrlimit to get the stack limit.  */
            struct rlimit rlimit;
            if (__getrlimit (RLIMIT_STACK, &rlimit) == 0)
              return MAX (legacy_ARG_MAX, rlimit.rlim_cur / 4);
          }
  
        return legacy_ARG_MAX;

).

And the kernel code that enforces that limit:

               struct rlimit *rlim = current->signal->rlim;
               unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;

               /*
                * Limit to 1/4-th the stack size for the argv+env strings.
                * This ensures that:
                *  - the remaining binfmt code will not run out of stack space,
                *  - the program will have a reasonable amount of stack left
                *    to work from.
                */
               if (size > rlim[RLIMIT_STACK].rlim_cur / 4) {
                       put_page(page);
                       return NULL;
               }

The whole kernel patch is a bit complicated for me to understand, since I don’t have digged much into kernel mm code, but from what I understand, instead of copying the arguments into pages, and then mapping those pages into the new process address space, it setups a new mm_struct, and populates it with a stack VMA. It then copies the arguments into this VMA (expanding it as needed), and then takes care to ‘position’ it correctly into the new process. But since I’m not very familiar with the Linux Kernel mm API, it’s very likely that what I said is totally wrong (I really have to read the mm chapters from “Understanding the Linux Kernel” :P).

I had blogged about this some time ago. The configuration I described in that post worked fine on my laptop, with Debian installed, but when I tried it on my Desktop, where I use Gentoo, it wouldn’t work.

It took me *3 days* of ‘debugging’, until I was able to find why that happened!

I tried various changes to the iptables and iproute2 configuration, giving more hints to both utilities in order to use the correct routing table, mark the packets correctly etc, but it still wouldn’t work.

After a lot of time tweaking the configuration, without results, I saw that, although ping -Ieth0 ${VPN_SERVER}, didn’t ‘work’ (with openvpn running, and tap0 configured with the correct address/netmask), I could see with tcpdump the ‘ECHO REPLY’ packets sent by the VPN server, with correct source and destination addresses.

After stracing the ping command, I saw that when ping issued a recvmsg syscall, recvmsg returned with -EAGAIN. So, now I know that the packets do arrive to the interface with correct addresses, but they couldn’t ‘reach’ the upper network stacks of the kernel.

The problem was that both machines were running vanilla kernels, so I couldn’t blame any Debian or Gentoo specific patches. But since I knew that the problem was in the kernel, I tried to see if any kernel .config options, regarding NETFILTER, and multiple routing tables didn’t match between the two configs. But I couldn’t find anything that could cause that ‘bug’.

So since the kernel sources are the same, and I can’t find anything in the .configs that could cause the problem, I try tweaking some /proc/sys/net ‘files’, although I couldn’t see why these would differ between the two machines. And then I saw some /proc/sys/net/ipv4/ files in Gentoo, that didn’t show up in Debian (/proc/sys/net/ipv4/cipso*).

I googled to find what cipso is, and I finally found out that it was part of the NetLabel project. CIPSO (Common IP Security Option) is an IETF draft (it’s quite old actually) and is implemented like a ‘security module’ in the Linux Kernel, and it was what it caused the problem, probably because it tried to do some verification on the inbound packets, which failed, and therefore the packets were ‘silently’ dropped. LWN has an article with more infromation about packet labeling and CIPSO, and there’s also related Documentation in the Linux Kernel.

make defconfig enbales Netlabel, but Debian’s default configuration had it disabled, and that’s why Openvpn/iproute2/iptables configuration worked with Debian, but failed on Gentoo.

Instead of compiling a new kernel, one can just do

echo 0 > /proc/sys/net/ipv4/cipso_rbm_strict_valid

and disable CIPSO verification on inbound packets, so that multiple routing tables and packet marking work as expected.

A couple of days ago, we did some presentations about DNS at a FOSS NTUA meeting.

I prepared a presentation about DNS tunneling and how to bypass Captive Portals at Wifi Hotspots, which require authentication.
(We want to do another presentation, to test ICMP/ping tunnel too ;)).

I had blogged on that topic some time ago.
It was about time for a test-drive. :P

I set up iodine, a DNS tunneling server(and client), and I was ready to test it, since I would be travelling with Minoan Lines the next day.

I first did some tests from my home 24Mbps ADSL connection, and the results weren’t very encouraging. Although the tunnel did work, and I could route all of my traffic through the DNS tunnel, and over a second OpenVPN secure tunnel, bandwidth dropped to ~30Kbps, when using the NTUA FTP Server, through the DNS tunnel.
(The tunnel also worked with the NTUA Wifi Captive Portal, although at first we had some ‘technical issues’, ie I hadn’t set up NAT on the server to masquarade and forward the traffic coming from the tunnel :P).

The problem is that the bandwidth of the Minoan Lines(actually Forthnet ‘runs’ it afaik) Wifi(not inside the ‘local’ network of course) was ~30Kbps(terrible, I know), without using DNS tunneling. So, I wasn’t very optimistic. (I think they have some satelite connection, or something like that from the Wifi to the Internet).

When I was on the ship, I tried to test it. At first, I encountered another technical issue(the local DNS had an IP inside the Wifi local network, and due to NAT the IP our server was ‘seeing’, was different than the IP of the DNS packets, so we had to run iodined with the -c flag). Luckily, FOSS NTUA members(who had root access on the computer running iodined) are 1337 and fixed that in no time. :P

And at last, I had a ‘working’ DNS tunnel, but with extremely high ping times(2sec RTT) to the other end of the tunnel, and when I tried to route all traffic through the tunnel I had a ridiculous 22sec RTT to ntua.gr. Of course even browsing the Web was impossible, since all the HTTP requests timed out before an answer could reach my laptop. :P

However, because I am a Forthnet customer(for my ADSL connection), I was able to use my Username/Password of my home ADSL connection, and have free access to the Internet, from their hotspot(with the amaing bandwidth of ~30Kbps :P). At least they do the authentication over SSL. :P

Although DNS tunneling didn’t really work in this case(the tunnel itself worked, but due to the bandwidth being so low, I didn’t have a ‘usable’ connection to the Internet), I think that in other hotspots, which provide better bandwidth/connection, it can be a very effective way to bypass the authentication and use them for free. ;)

Probably, there’ll be a Part 3, with results from bandwidth benchmarks inside the NTUA Wifi, and maybe some ICMP tunneling stuff.

Cheers! :)

sshd + reverse DNS lookup

October 19, 2009

This post is mainly for ‘self reference’, in case something like this happens again.

According to the sshd man page, by default, sshd will perform a reverse DNS lookup, based on the client’s IP, for various reasons.

A reverse DNS lookup is used in order to add the hostname to the utmp file, that keeps track of the logins/logouts to the system. One way to ‘switch it off’ is by using the -u0 option when stating sshd. The -u option is used to specify the size of the field of the utmp structure that holds the remote host name.

A reverse lookup is also performed when the configuration(or the authentication mechainsm used) requires such a lookup. The HostBasedAuthentication auth mech, a “from=hostname” option in the .authorized_keys file, or the AllowUsers/DenyUsers option that includes hostnames, in the sshd_config, require a reverse DNS lookup.

Btw, the UseDNS option in the sshd_config, which I think is enable by default, will not prevent sshd from doing a reverse lookup, for the above mentioned reasons. However, if this option is set to ‘No’, sshd will not try to verify that the resolved hostname maps back to the same IP that the client provided(adding an extra ‘layer’ of security).

So, the point is that if for some reason the ‘primary’ namserver in the resolv.conf is not responding, you’ll experience a lag when trying to login using ssh, which can be confusing if you don’t know the whole reverse DNS story.

Another thing that I hadn’t thought before I learned about sshd reverse lookups, is that a DNS problem can easily ‘lock you out’ of a computer, if you use hostname based patterns with TCP wrappers(hosts.allow, hosts.deny). And maybe this can explain some “Connection closed by remote host” errors, when trying to login to a remote computer. :P

Follow

Get every new post delivered to your Inbox.

Join 276 other followers