Saturday 23 February 2013

Setting up an FTP server on AWS

Recently for testing some code, I had to host an FTP server. I tried doing it on my local first. It was easy. I just had to follow the Arch wiki for vsftpd. File transfers in both directions were working so I thought I can try it on an Amazon instance too.

My local machine ran Arch linux on while the Amazon instance ran Fedora 8. After looking up the details of the package manager for Fedora and some help from a friend, I installed vsftpd on it, applied the same config and started the FTP service. When we started testing it, we could operate successfully from command line but not from the code. From the command line, we were using active mode of operation while the code was using the passive mode, so we looked into the config to check settings related to passive mode of operation. It turned out that the passive mode is enabled by default. However, going through the various options we found an option called pasv_address. From prior experience I know that AWS machines have a private LAN IP and a separate public IP. Now, the OS on the cloud instance is not aware of what public IP it is serving. So, we suspected that in its response it must be asking the client to connect on the private LAN IP which would obviously fail. So we just set the pasv_address option to the public IP of the instance and passive mode started working fine. We could successfully connect to it and get file transfers done. So, we decided to use it for testing our code. However, when we tested it, we saw that our application was trying to post files but it was failing every time. The error we were getting each time said '500: Invalid Port command'.

The FTP protocol really goes funky with ports. It uses separate ports for control and data. The behaviour of data ports is dependent upon the mode of operation. In active mode, the client initiates the data connection and therefore the port selection is done by the client, while in passive mode, the server initiates the data connection and therefore the port selection is done by the server. We were using the passive mode of operation and the server was hitting the client at a port that later turned out to be blocked. To debug the situation, we tried connecting to the FTP server from the command line utility 'ftp' using the following command.

ftp ip address of FTP server

To turn on the passive mode and debug mode, we can use the commands 'passive' and 'debug' respectively. However, they only set the options on the client without actually sending any control data to the server. To test the FTP service, try some command that sends some control data. We went with an 'ls'. The following FTP commands were executed in sequence.

PASV
LIST


The PASV commands [1] outputs a line, like the following, indicating the port the data transfer will happen.

Entering Passive Mode (1,2,3,4,224,186)

The port has to be calculated from the last two numbers using the following formula.

n1 x 256 + n2

In the above instance, it is 224 x 256 + 186 = 57530. Once we knew that the issue was the port that the FTP server was trying to communicate to the application machine on was blocked, we decided to configure the FTP server to connect on some port within the open port range. This can be done setting the pasv_min_port and pasv_max_port options correctly in vsftpd.conf. Once we got the server connect to the client on proper ports, the transfers worked fine.

[1] A reference of FTP commands.

Friday 22 February 2013

Getting to know the Syslog protocol

Recently I was looking into FTP issues, when I learnt some details about Syslog. I was using vsftpd for hosting an FTP service. I had enabled logging but I was not seeing anything in journalctl output. The reason for that turned out to be a configuration flaw. I had not turned on the option for vsftpd to use syslog. Once I turned the option on, proper log files were created.

I do not have Syslog-ng or any other syslog package installed. I had uninstalled it when I switched to systemd. So, I had not enabled that option in vsftpd. However, as it turns out when I turn that option on vsftpd uses the syslog protocol for logging. Systemd listens for messages sent using that protocol and creates appropriate logs.

This is a great way of unifying all logging. Individual packages do not have to bother about logging. They just act as clients of the protocol and the listener will take care of maintaining the logs. There are instances of similar architecture being followed for logging in other domains too.

As it turns out, there are standardized versions of the syslog protocol:


The version used commonly is the BSD one even the former is more advanced. Now syslog is being replaced by systemd's journal because it capitalizes over syslog. It provides efficient transfer of binary data and supports JSON. Maintaining logs is much easier.

It was interesting to know that even though syslog packages are becoming obsolete, the syslog protocol is still the logging standard. It is actually a nice example of robust architecture surviving over the years.