Open standards are awesome, and the File Transfer Protocol FTP (inspite of its flaws) has been in constant use for an amazing 40 years! FTP can be a pain to run over firewalls and load balancers, so this blog explains how to configure Microsoft FTP and HAProxy.

How-to-load-balance-Microsoft-Windows-FTP-Server-with-HAProxy-cartoon

First a little bit of history about FTP

FTP (RFC 114!) was first implemented at MIT more than 40 years ago, and to this day this is still the standard protocol for file transfers. One of the reasons FTP has lasted so long is that it has so many clients available on every platform you can think of.

FTP runs over two separate channels known as the command channel and the data channel, which both are unencrypted. Any data sent over these channels can be exploited with a MITM (Man-in-the-middle) attack using ARP poisoning and packet sniffers. The inventor didn’t build the protocol with security in mind as mainframes were more common and attacks to information security were not widespread back in the 80’s. Furthermore, firewalls and load balancers didn’t exist.

That being said, FTP is mostly used for data exchange inside your company’s private network. Although FTP can be secured with SSL/TLS to ensure secure transmission, encrypting the content, username and password, external users prefer SFTP (Secure File Transfer Protocol) also known as SSH File Transfer Protocol against FTP, probably because it runs over a single port and it's secure without requiring further configurations and certificates.

FTP can run in two modes and those are Active and Passive mode.

Active mode FTP

Active mode is the older version of the two modes. Next, I will provide a simplified explanation how this mode works:

  1. The client connects from a random port of the file transfer client to port 21 on the server. It sends the PORT command, specifying what client-side port the server should connect to, that will be used for the data channel.

  2. The server connects from port 20 to the client port designated for the data channel. Once connection is established, file transfers are then made through these client and server ports. (obviously all firewalls will block a remote connection request like this!)

ActiveFTP

Passive mode FTP

Passive mode works in a similar manner to the Active mode, however instead of sending the PORT command, it sends the PASV command, which is a request for a server port to connect to for data transmission. Once the FTP server replies, it indicates what port number it has opened for the data transfer.

Here is a short explanation of how this mode works:

  1. The client connects from a random port to port 21 on the server and issues the PASV command. The server replies, indicating which (random) port it has opened for data transfer.

  2. The client connects from another random port to the random port specified in the server's response. Once connection is established, data transfers are made through these client and server ports.

PassiveFTP

Let’s get started

Enough introductory information about FTP, for a POC (proof of concept) I have set up a lab using FTP Passive mode that will load balance traffic using HAProxy to multiple backend FTP servers that will be running on Microsoft Server Standard 2016.

You want to use passive mode FTP and not active mode FTP:

Why? Because in a real world scenario you will have a firewall and or load balancer — so any kind of Network Address Translation (NAT) blocks the connection request breaking FTP.

You can get Active FTP working, but a big range of ephemeral ports have to be allowed on the firewall, this is so that the FTP server can communicate back to the client with the PORT command — so the client then knows which port to send data through. Opening a large number of ports is considered very bad practice for security.

Deploying HAProxy on Centos and configuring an FTP cluster:

On the load balancer that is running on CentOS Linux release 7.3.1611 (Core) the built-in firewall daemon is blocking the ports, I have opened port 21 and the range of the passive ports between 10000 - 10020 required for the FTP traffic because the client is initiating the data traffic, then reloaded so changes takes effect.

firewall-cmd --permanent --add-port=21/tcp
firewall-cmd --permanent --add-port=10000-10020/tcp
firewall-cmd --permanent --add-service=http
firewall-cmd --reload
firewall-cmd --list-ports

To install haproxy I have used the following command yum install haproxy, this will install HAProxy version 1.5.18

The configuration file for HAProxy /etc/haproxy/haproxy.cfg is the following :

global
        daemon
	    log         127.0.0.1 local2     #Log configuration
        chroot      /var/lib/haproxy
        pidfile     /var/run/haproxy.pid
        maxconn     4000
        user        haproxy
        group       haproxy
        stats socket /var/lib/haproxy/stats

defaults
        mode                    http
        log                     global
        option                  tcplog
        option              dontlognull
        retries             3
        maxconn                 10000
        option              redispatch
        timeout connect 4000
        timeout client 42000
        timeout server 43000

listen stats
	bind *:8080
        mode http
        option forwardfor
        option httpclose
        stats enable
        stats show-legends
        stats refresh 5s
        stats uri /stats
        stats realm Haproxy\ Statistics
        stats auth loadbalancer:loadbalancer
        stats admin if TRUE
          
listen FTPVIP
	bind 192.168.1.123:21 transparent
        bind 192.168.1.123:10000-10020 transparent
        mode tcp
        option tcplog
        balance leastconn
        server WinFTPServer2016FTP1 192.168.1.110 check port 21 inter 10s rise 2 fall 2                
	    server WinFTPServer2016FTP2 192.168.1.111 check port 21 inter 10s rise 2 fall 2

Last but not least, one of the issues that I have came accross when setting up the load balancer was that SELinux is enabled by default and doesn't allow HAProxy to bind to any of the IPs configured, in order to disable it run the command setenforce 0

You must modify the PASV response from Microsoft FTP server:

On the FTP servers that are running on Windows Server 2016 Standard, I have ensured that the same range of the passive ports as from the HAProxy configuration file has been enabled. This is because it is not practical or sensible to get a layer 7 load balancer such as HAProxy listening to every port available above 1024 (the default for the FTP protocol).

  1. From the FTP Firewall Support section of the FTP site settings, configure the Data Channel Port Range as 10000 - 10020.

  2. Set the External IP Address of Firewall as the VIP (Virtual IP of load balanced cluster) i.e. 192.168.1.123. This is the address that the client is connecting to, and the FTP server must respond with this — otherwise passive mode will not work.

Then just apply changes and restart the FTP Service.

FTPpassiveports-1

I hope the above helps, but if you have any questions, feel free to leave a comment below or send me a message.