Ionic file format for Big Data processing

Ionic file format for Big Data processing

Home » Resources » Here

Envision is geared towards processing tabular files in batches. However, while common flat text files (e.g. CSV) are straightforward to generate and to move around, the primary intent of using such formats is simplicity, not high-performance computing. In contrast, the Ionic file format is a columnar storage format intended for Big Data processing. Ionic files can be read and written through Envision scripts. This file format provides significant scalability benefits over flat text files, as well as stronger data typing.

Intended use for Ionic files

Ionic files (associated with the file extension .ion) are intended to support Big Data processing within Lokad. Not only can this type of file be processed much faster by Envision (about 5 to 10 times faster compared to CSV files), but it can also be processed more reliably thanks to stronger data typing. In fact, whenever the files to be processed exceed a few hundred megabytes, we suggest using the Ionic file format. With Ionic, processing even dozens of gigabytes at the same time can be done relatively swiftly.

The Ionic file format is intended as an internal data format restricted to the Lokad platform itself. We do not expect client companies to either transfer files to Lokad already formatted as Ionic files, nor to retrieve Ionic data files for consumption in their enterprise systems.

Ionic files are typically generated when preparing the input data. In fact, raw input data obtained from ERP extractions is almost never suitable to be processed ‘’as is’’ from a business optimization perspective. As a result, the raw input data files need to be transformed into prepared data files, the latter being better suited for analysis to be carried out in due course. These “prepared” data files are prime candidates to be written in the Ionic file format.

Technical overview of the Ionic storage format

Ionic is a columnar storage format. Within a Lokad account, this format appears in the form of binary files associated with the .ion file extension. This format bears similarities with Apache Parquet albeit being moderatly simpler. However, Ionic also includes optimized algorithms that are specifically tailored for the predictive analysis frequently performed on the Lokad platform.

Unlike flat text files, Ionic files are strong-typed. It means that each column has a data type, such as text, date or number. Strong-typing is an important property that helps prevent data processing mistakes from creeping into a sequence of scripts, namely when one script reads as an input file previously written as an output file by another script.

Columnar storage means that each column can be retrieved and read separately. This Ionic file property can help gain a significant boost in processing speed, first because it allows Lokad to only load the columns which are actually needed for a calculation, and second because it facilitates the implicit parallelization of the data processing. This property is particularly useful for supply chain or pricing analytics because tables with dozens, if not hundreds, of columns are frequently encountered in these two fields.

Ionic's binary format ensures a degree of lossless data compression. Consequently, Ionic files typically tend to be significantly smaller than their CSV counterpart, frequently coming relatively close to the size of a GZip compressed CSV file. The file size reduction also contributes to improving the processing speed performance because - unlike GZip compression - this binary format is also designed to be very fast (but non-generic, as it only applies to tabular data, unlike GZip which applies to any kind of data).

Using Ionic in Envision

From the Envision perspective, the Ionic data format is treated just like any other data format. It is identified by the .ion file extension. Generating a file with the Ionic format is done with the following:
show table "Products" export:"/sample/products.ion" with Id, Name
Then, reading a file with the Ionic format is done with:
read "/sample/products.ion" as Products[*]
Once again, there are no specificities for Ionic files, except their .ion file extension.

When using the Ionic data format, it is possible to export distribution vectors into the .ion file.