Open standards are awesome, and the File Transfer Protocol FTP (inspite of its flaws) has been in constant use for an amazing 40 years! FTP can be a pain to run over firewalls and load balancers, so this blog explains how to configure Microsoft FTP and HAProxy.
First a little bit of history about FTP
FTP (RFC 114!) was first implemented at MIT more than 40 years ago, and to this day this is still the standard protocol for file transfers. One of the reasons FTP has lasted so long is that it has so many clients available on every platform you can think of.
FTP runs over two separate channels known as the command channel and the data channel, which both are unencrypted. Any data sent over these channels can be exploited with a MITM (Man-in-the-middle) attack using ARP poisoning and packet sniffers. The inventor didn’t build the protocol with security in mind as mainframes were more common and attacks to information security were not widespread back in the 80’s. Furthermore, firewalls and load balancers didn’t exist.
That being said, FTP is mostly used for data exchange inside your company’s private network. Although FTP can be secured with SSL/TLS to ensure secure transmission, encrypting the content, username and password, external users prefer SFTP (Secure File Transfer Protocol) also known as SSH File Transfer Protocol against FTP, probably because it runs over a single port and it's secure without requiring further configurations and certificates.
FTP can run in two modes and those are Active and Passive mode.
Active mode FTP
Active mode is the older version of the two modes. Next, I will provide a simplified explanation how this mode works:
The client connects from a random port of the file transfer client to port 21 on the server. It sends the PORT command, specifying what client-side port the server should connect to, that will be used for the data channel.
The server connects from port 20 to the client port designated for the data channel. Once connection is established, file transfers are then made through these client and server ports. (obviously all firewalls will block a remote connection request like this!)
Passive mode FTP
Passive mode works in a similar manner to the Active mode, however instead of sending the PORT command, it sends the PASV command, which is a request for a server port to connect to for data transmission. Once the FTP server replies, it indicates what port number it has opened for the data transfer.
Here is a short explanation of how this mode works:
The client connects from a random port to port 21 on the server and issues the PASV command. The server replies, indicating which (random) port it has opened for data transfer.
The client connects from another random port to the random port specified in the PORT command from the server's response. Once connection is established, data transfers are made through these client and server ports.
Let’s get started
Enough introductory information about FTP, for a POC (proof of concept) I have set up a lab using FTP Passive mode that will load balance traffic using HAProxy to multiple backend FTP servers that will be running on Microsoft Server Standard 2016.
You want to use passive mode FTP and not active mode FTP:
Why? Because in a real world scenario you will have a firewall and or load balancer — so any kind of Network Address Translation (NAT) blocks the connection request breaking FTP.
You can get Active FTP working, but a big range of ephemeral ports have to be allowed on the firewall, this is so that the FTP server can communicate back to the client with the PORT command — so the client then knows which port to send data through. Opening a large number of ports is considered very bad practice for security.
Deploying HAProxy on Centos and configuring an FTP cluster:
On the load balancer that is running on CentOS Linux release 7.3.1611 (Core) the built-in firewall daemon is blocking the ports, I have opened port 21 and the range of the passive ports between 10000 - 10020 required for the FTP traffic because the client is initiating the data traffic, then reloaded so changes takes effect.
firewall-cmd --permanent --add-port=21/tcp firewall-cmd --permanent --add-port=10000-10020/tcp firewall-cmd --permanent --add-service=http firewall-cmd --reload firewall-cmd --list-ports
To install haproxy I have used the following command
yum install haproxy, this will install HAProxy version 1.5.18, at the time of writing this blog.
The configuration file for HAProxy
/etc/haproxy/haproxy.cfg is the following :
global daemon log 127.0.0.1 local2 #Log configuration chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy stats socket /var/lib/haproxy/stats defaults mode http log global option tcplog option dontlognull retries 3 maxconn 10000 option redispatch timeout connect 4s timeout client 5m timeout server 5m listen stats bind *:8080 mode http option forwardfor option httpclose stats enable stats show-legends stats refresh 5s stats uri /stats stats realm Haproxy\ Statistics stats auth loadbalancer:loadbalancer stats admin if TRUE listen FTPVIP bind 192.168.1.123:21 transparent bind 192.168.1.123:10000-10020 transparent mode tcp option tcplog balance leastconn stick on src stick-table type ip size 10240k expire 30m server WinFTPServer2016FTP1 192.168.1.110 check port 21 inter 10s rise 2 fall 2 server WinFTPServer2016FTP2 192.168.1.111 check port 21 inter 10s rise 2 fall 2
Last but not least, one of the issues that I have came accross when setting up the load balancer was that SELinux is enabled by default and doesn't allow HAProxy to bind to any of the IPs configured, in order to disable it run the command
You must modify the PASV response from Microsoft FTP server:
On the FTP servers that are running on Windows Server 2016 Standard, I have ensured that the same range of the passive ports as from the HAProxy configuration file has been enabled. This is because it is not practical or sensible to get a layer 7 load balancer such as HAProxy listening to every port available above 1024 (the default for the FTP protocol).
From the FTP Firewall Support section of the FTP site settings, configure the Data Channel Port Range as 10000 - 10020.
Set the External IP Address of Firewall as the VIP (Virtual IP of load balanced cluster) i.e. 192.168.1.123. This is the address that the client is connecting to, and the FTP server must respond with this — otherwise passive mode will not work.
Then just apply changes and restart the FTP Service.
I hope the above helps, but if you have any questions, feel free to leave a comment below or send me a message.
In a recent comment, an interesting point has been brought to my attention regarding the HAProxy configuration used for this lab.
At the time of writing this blog post, I didn't considered that persistence is quite important when using FTP in passive mode. See the following explanation:
Once the server sends back the PORT command containing the high-numbered port (i.e. 10000-10020), the client acknowledges the port to initiate the data transfer back to the VIP.
At this point, due to lack of persistence, the connection from the client can be initiated through the HAProxy frontend to either of the servers and not to the initial server the connection was started to.
The HAProxy configuration has been updated in order to have sticky sessions to a particular server based on the source IP. Entries are stored in a stick table and can be viewed using the following command :
socat unix-connect:/var/lib/haproxy/stats stdio <<< 'show table FTPVIP'
Please note that the labels of the HAProxy frontend and the socket stats may differ in your environment.