Usage of the REST API of BotDefender

Usage of the REST API of BotDefender


Home » BotDefender » Here

An online store can protect a price value by sending a request to BotDefender which returns the HTML snippet to be inserted in the HTML page - in place of the original value. This page provides a technical documentation of the REST API that supports this usage.


Authentication

All merchant requests must be authenticated using Basic Authentication with the login/password provided by BotDefender. In order to obtain those credentials, you need to sign-up a Lokad account.

GET stub

The method returns a list of HTML snippets to be inserted in the HTML page of online store in place of the original price value.

GET https://bdapi.lokad.com/rest/stub/{page}/?prices={price1};{price2};{price3}

where
  • {page} identifies the page where the price is located.
  • {price1};{price2};{price3} is the list of price values to be protected against scraping. It is not advised to request more than 256 prices at once per call.

The HTML snippets are delimited by line returns (\n).

Page identifier

An identifier of the page is required by BotDefender to ensure that no dictionary attack can be carried against your online store to reverse engineer your prices. BotDefender is an injective function where two distinct prices will never be represented by the same snippet of HTML. However, in most online stores, there is not that many distinct price values.

Thus, if the same snippet is used across the entire site to represent the same price, an attacker could manually list the Top 50 most frequently occurring snippets and manually assign values to those HTML snippets, defeating the BotDefender protection.

That's why we request an identifier of the page to be made available to BotDefender, that way, two identical prices located on two distinct pages will not be represented with the same HTML snippet, hence defeating potential dictionary attacks.

In practice, it's not an requirement to have truly unique identifiers. Indeed, if a few pages share the same identifiers, it's not enough to make it worthwhile for an attacker to try a dictionary attack.

Price value

The value can include up to 16 characters comprising numbers, comma(s) or dot(s). The regular expression used to validate the values is: ^([0-9]|\.|\,){1,16}$.

The currency symbol (i.e. $ or €) should not be passed as argument within the BotDefender request. BotDefender minimizes the impact on protected pages, and the currency is not an information worthwhile to protect from a commerce viewpoint.

Time dependence

The HTML snippets returned by BotDefender vary over time. Again, this behavior is required to avoid potential dictionary attacks, but it's also necessary to minimize the amount of indirect information leakage.

Indeed, if the HTML snippets were not automatically changing over time, then, an attacker could still monitor price changes (aka new prices leading to revised HTML snippets), which is already a valuable information, because it would help your competitors to know exactly where they should be looking at.

Since the HTML snippets vary over time, it means that from an attacker perspective, all the prices are changing all the time, hence providing no information to distinguish the real changes from the protective changes.

Account dependence

The HTML snippets returned by BotDefender vary from one account to another. It's not possible to open a Lokad account and leverage the responses obtained from BotDefender to reverse-engineer the prices of another company.

Recommended usage

As BotDefender touches the front of an online store, it's very important to minimize its impact on performances. By following the guidelines below, you can achieve a near-zero performance impact, but also safeguard the store against any downtime of BotDefender itself.

Post-render cached lazy load

The usage pattern that we recommend when integrating BotDefender is post-render cached lazy load.

By lazy loading, we refer to a behavior where the HTML snippets are not requested from BotDefender until the containing page is also requested by a client (aka either a visitor or a robot). Obviously, if BotDefender is called server-side for every single page served, the performance impact would be significant, so we strongly advise to cache the HTML snippets for 24h. The amount of memory consumed for this purpose is minimal (each snippet weights less than 0.5kB), and should not exceed a few MB even or a very large online store.

While caching is good, the "naive" implementation still degrades the performance when the application is starting up, because every page request hits a cold cache, triggering a server-side request to BotDefender. Thus, we strongly suggest to use a post-render mechanism. The process goes like this:

  1. When the page is rendered, for every price:
    a. The cache is tested. If the cache contains the BotDefender snippet, then the price value is replaced by its snippet; otherwise the price is immediately injected without any protection.
    b. The result of the cache test is appended to a price list.
  2. After the page is served (HTTP response sent), the server keeps processing the price list.
    a. All the prices that have a snippet counterpart aged of less than 24h are ignored.
    b. For the other prices (if any), a single request is made to BotDefender, leveraging the multi-price logic.
    c. The response of BotDefender is injected in the cache, and the process terminates.

This mechanism has several very good properties. First, it's non-blocking as far the HTTP response is concerned. From the client viewpoint, the server never waits for BotDefender, even if the online store has just been rebooted. Second, no more than a single HTTP request toward BotDefender is made for a given page. This aspect is important, because a single page many contains dozens of prices.

Some pages will be served unprotected. However, we firmly believe this is a non-issue. Indeed, bad robots - price crawlers - are only a tiny fraction of the overall web traffic. While the probability is hard to quantify, we can reasonably assume that there is a 99% probability that the first client to request a page - hence getting an unprotected version of the page - will NOT be a price crawler, but just a visitor, or a friendly search engine. From a competitive intelligence perspective, being able to extract a few prices once in a while is useless.

Auto-deactivation in case of downtime

While the Lokad team does its best to ensure quasi-zero downtime, we cannot promise no downtime will ever happen for BotDefender. Instead, we believe it's better to plan for a downtime from the very start to ensure that any downtime remains invisible for web visitors.

We recommend to check every 30s whether BotDefender is up and responsive, and immediately - but temporally - disable the add-on if it is not the case.

In order to implement this behavior, we suggest to proceed in introduce a uptime flag stored as a cache value. Whenever a price is about to be replaced by an HTML snippet, the flag is tested, and if the flag is not up, then no snippet is inserted, and the raw unprotected price is used instead.

Then, at the post-response stage, assuming the flag is up, any query that fails against BotDefender should change the flag value to down, immediately preventing further price protection. Then, if no call needs to be made, we still suggest to check the age of the last call to BotDefender. If the last call is older than 30 seconds, then we suggest to make a new call, for the sole purpose of checking the uptime status. Since this test is happening after the HTTP response has been sent, there is no impact on the page retrieval latency.

Finally, if the status is down and if the last call to BotDefender is less than 30 seconds old, then we suggest NOT to make any further call. Indeed, this behavior ensures that in case of of a brownout (degraded performance), the surge of calls trying to refresh the cache do not make the problem worse. Once the last failed call to BotDefender is more than 30 seconds old, then another call can be made to BotDefender, resetting the flag to up if the call succeeds.

Like the previous behavior, the auto-deactivation might cause unprotected prices to be sent. It's OK as the uptime of BotDefender is too high to be of any practical use by price crawlers. The auto-deactivation is first and foremost intended to preserve the user experience of visitors.

CURL options

When making HTTP requests to the BotDefender API, we recommend:

  • to treat any delay greater than 2 seconds as an API error, hence changing the BotDefender flag to down. Indeed, BotDefender must be very fast otherwise the user experience (in tar pitting mode) would be impacted.
  • to make sure that HTTP compression is enabled. Indeed, when retrieving many snippets at once, most of the content is redundant from one snippet to the next. Hence, there are massive gains in retrieved a compressed response.