The Lokad platform is geared toward terabyte-scale data processing and intended for predictive supply chain optimization. However, in order to crunch data, we need first to import data. Most web apps feature web APIs styled as REST, yet Lokad features FTPS and SFTP which may appear surprising1. This choice was not accidental but dicted by data transfer performance requirements. While this choice was made a decade ago based on prior experience, the forces in favor of FTP are still as strong as ever.
Let’s clarify the use case of Lokad. A typical client company needs to transfer the integral copy of 10 to 100 database tables, 5 to 500 fields per table, to its Lokad account. This data represents the content of its ERP, MRP, WMS, POS, etc. Over time, the copies need to stay fresh, typically not older than the day of yesterday. The total amount of data involved, assuming uncompressed flat text files (i.e. CSV) ranges from 1 GB for a medium company to 10 TB for a large company. By adopting incremental data transfers for the larger tables, the volume of daily transfers typically range from 100 MB to 10 GB.
The client company - or its software vendor, or the integrator of the vendor - must succeed in pushing the relevant data to Lokad. Indeed, in most situations, Lokad never gets IT access to any production system. This is reasonable: for security reasons, third party analytics should not have direct access to production systems. The client company must remain in control, as a gatekeeper, of the data flowing out of its system. Bonus, this allows them to strip out every single bit of personal data, that a third-party like Lokad does not even need anyway.
In theory, REST can be made arbitrarily performant. In practice, this is not the case. With REST, reliably transferring 100 MB of fine-grained relational data, on a daily basis, invariably generates massive performance overheads unless a top-notch software engineering team is involved. Two angles must be handled with utmost care: chattiness and retries. Chattiness relates to the number of calls over the network. Retries relate to behaviors needed to cope with transient network failures. Both are difficult, exceedingly so in practice.
In the enterprise space, most software vendors can’t even get paging right for their own APIs. This proposition that the same vendors, while rolling out a quick-and-dirty integration of Lokad, are going to suddenly devote top-notch software engineering talent is not reasonable. My experience at Lokad indicates that, on the contrary, rushed IT jobs are the best that Lokad can expect. This is fine. There are tons of battles to be fought IT-wise. Realistically, the data integration of Lokad cannot be expected to be the one battle that requires bringing out the elite forces.
In practice, when pressured to obtain a decent performance, web APIs devolve into flat file transfers over HTTP. This devolution addresses the ‘chattiness’ angle mentioned above. However, it still leaves open the ‘retry’ angle. Once the ‘retry’ angle is addressed as well, congratulations, the API has finally re-invented a File Transfer Protocol (aka FTP).
However, this ad-hoc FTP is nowhere near as instrumented as the real thing. FTP is supported by all major operating systems. Both have massive open source tooling support. The relevant implementations have been production-grade for decades. True, there are quirks, those protocols may not be “fashionable” anymore, but when it comes to getting the job done within hours without involving rockstar engineers, it does not even compare.
This is why Lokad has adopted FTP to support both inbound and outbound data transfers. Furthermore, a decade of operations has proven that this choice was the correct one. Even low-skill, low-cost, outsourced IT companies do succeed at transferring large amounts of relational data quickly and reliably to Lokad through FTP, which never happened, not even once, during all the years that we featured a web API.
In the following, for the sake of concision, FTP always refers to “FTPS and SFTP”. Those two protocols are quite different, but for the sake of this discussion, those differences are irrelevant. Favoring one protocol over the other is mostly a matter of alignment with pre-existing IT practices within the company of interest. ↩︎