With the advent of cloud computing, a little more than a decade ago, it has become straightforward to acquire computing resources on-demand (storage, compute, network) pretty much at any scale as long as one is willing to pay for it. Yet, while it is straightforward to perform large scale calculations over the cloud computing platform of your choice, it does not imply that it will be worth the cost.

Data mountains

At Lokad, we do not charge our clients per GB of storage or per CPU per hour. Instead, the primary driver for our pricing, when opting for our professional services is the complexity of the supply chain challenge to be addressed in the first place. Naturally, we do factor into our prices the computing resources that we need to serve our clients, but ultimately, every euro that we spend on Microsoft Azure - spending-wise, we did become a “true” enterprise client - is a euro that we cannot spend on R&D or on the Supply Chain Scientist who is taking care of the account.

Thus, the Lokad software platform has been designed under the guiding principle that we should be as lean as possible in terms of computing resources1. The challenge is not to process 1TB of data - which is easy - but to process 1TB of data as cheaply as possible. This led us to a series of somewhat “unusual” decisions while designing Lokad.

Diff execution graphs. Supply Chain Scientists - like other data scientists - typically don’t write hundreds of lines of code at once before probing their code against the data. The process is typically highly incremental: add a few lines, crunch the data, rinse & repeat. These iterations are required as the results obtained from the data are frequently guiding what the data scientist will do next. Yet, most data science tools (eg. NumPy or R) will re-compute everything from scratch whenever the script gets re-executed. In contrast, Envision is performing a diff over successive execution graphs. Unlike traditional diff that finds differences between files, our diff finds differences between compute graphs: the diff identifies the new compute nodes - which still need to be computed. For all the other nodes, results have already been computed, and get “recycled” instead. For the Supply Chain Scientist, diffing the execution graphs looks like a ultra-fast run where terabytes of data get crunched in seconds (hint: Lokad did not crunch terabytes in seconds, only the few hundreds of megabytes which differed from one script to the next).

Domain-driven datatypes. Probabilistic forecasting is a game-changing approach for supply chain: let’s consider all possible futures, instead of electing one future as if it were guaranteed to come to pass. Unfortunately, processing probability distributions requires an algebra of distributions which involves non-trivial computations. Thus, we invested significant efforts to optimize this algebra to perform large scale operations over random variables at minimal CPU costs. In particular, we are aggressively taking into account the fact that in supply chain most random variables represent probability in small quantities, typically no more than a few dozen units2. Compared to generic approaches intended for say - scientific computing - the domain-specific angle gives two orders of magnitude of computing speed-up.

Defensive scalability. Most programming languages intended for large scale data processing (e.g. Scala or Julia) offer tremendous capabilities to distribute computations over many nodes. However, this means that every line of code being written has the opportunity to eat up an arbitrarily large amount of computing resources. It takes a lot of engineering discipline to counter the seemingly ever increasing needs of the app as changes make their way into the application. In contrast, Envision takes a defensive stance: the language has been crafted to stir away Supply Chain Scientists for written code which would be tremendously costly to scale. This explains why Envision has no loops, for example as its near-impossible to offer predictable performance at compilation time when the language contains arbitrary loops.

Key-value storage only. Blob storage3 is the most bare-metal cost-efficient data storage approach offered by on the cloud, with prices getting as low as $20 per TB per month or so. Lokad operates directly over Blob Storage (plus local disks for cache), we do not have any relational databases or NoSQL - except the ones we have built ourselves on top of the Blob Storage. In practice, our storage layer is profoundly integrated with Envision, the language dedicated to Quantitative Supply Chain optimization within Lokad. This allows us to avoid layers of overhead that traditionally exist at the intersection between the application and its data access layer. Instead of micro-optimizing the friction at the boundaries, we have removed those boundaries altogether.

While achieving lean scalable data processing for your supply chain may appear to be a “technicality” for sizeable supply chains, the IT overhead of crunching terabytes of data is real. Too frequently the system is either too expensive or too slow, and the friction ends up eating a good part of the intended benefits generated by the system in the first place. Cloud computing costs are still decreasing, but don’t expect much more than 20% per year, thus letting the general progress of computing hardware do its magic isn’t really an option anymore unless you are willing to delay your data-driven supply chain by another decade or so.

You can also check out the Lokad TV episode we produced on Terabyte Scalability for Supply Chains.

  1. Enterprise software vendors who are selling computing resources typically have a perverse incentive: the more resources are consumed, the more theycharge. Two decades ago, IBM was chronically facing this conundrum while charging for MIPS (million instructions per second). This frequently lead to situations where IBM had little incentive to fine-tune the performance of their enterprise systems, as it only decreased their revenues. The problem mostly went away as IBM (mostly) moved away from MIPS pricing. ↩︎

  2. It’s hard to have millions of SKUs, with each SKU associated with millions of inventory movements. If you have millions of SKUs, then most likely the majority of those SKUs are slow movers with few units in and out per month. Conversely, if you have SKUs that move millions of units a day, it’s unlike that you have more than 100 of those SKUs. ↩︎

  3. Blob Storage on Azure is a simple key-value storage. Nearly every cloud vendor offers a similar service. Amazon pioneered the domain with its S3 service. Google is referring to this service as its Cloud Storage↩︎