How does tarpitting work - BotDefender

How does tarpitting work?


Home » BotDefender » Here

Tarpitting is one of the two protection methods offered by BotDefender. In short, tarpitting consists in forcing all visitors (humans and bots alike) into downloading a tiny extra file from the BotDefender servers in order to be able to display the prices on the originating page. This page explains in more details how tarpitting works.

Flow of web requests

Image

When a client visits a web page, her browser sends a (web) request toward the online store (1) which returns the HTML (3) to be rendered (i.e. displayed) on the the client side.

In the usual non-protected approach, the HTML returned to the client contains all the prices. However, if the client is not a human, but a robot, then the prices are readily available to be extracted from the HTML, a process known as scraping.

When BotDefender is in place, the online store does not directly returns the prices. Instead, the prices are first replaced by stubs, that is, self-contained snippets of HTML. Those stubs are requested (2) by the online store itself calling the API of BotDefender, which returns the stubs.

Finally, when the client renders the protected HTML, the stub triggers a tiny web request toward the BotDefender servers (4) in order to obtain an extra piece of information required to correctly display the prices.

Identifying robots

Based on this setup, in order to gain access to the prices, each visitor, humans and robots alike, need to request a piece of content from BotDefender. This gives the opportunity to BotDefender to deny the information to abusive visitors.

The exact technology used by Lokad to tell humans from robots apart is not disclosed on purpose, because it would only help attackers (i.e. the people running scrapers) in designing more effective robots in their capabilities to evade our detection technology.

Technical FAQ

Is it going to have an impact on my Google ranking? None. Google keeps indexing your pages - minus the prices that are not visible any longer for the crawler - but this aspect is irrelevant for SEO. Technically, one of our first steps to protect your prices is that the BotDefender sub-domain bdapi.lokad.com is marked with Robots.txt, thus disallowing all crawlers, including Google, but it only affects a very specific segment of your website, that is, the prices themselves, and not the rest of your pages.

Is it going to slow down the performance of my store? No. While the schema above outlines a call (No2) from the online stores toward BotDefender, in practice, the stub returned by BotDefender is cached, that is, locally preserved in the memory of your server for a relatively long period; we suggest 24h. Thus, in practice, for the overwhelming majority of pages served by your store, zero requests are made toward BotDefender.

Is it going to slow down the user experience? Very marginally and no more than adding a web tracker (like Google Analytics). The stub itself plus its counterpart on the BotDefender server weigh less than 1kB, that is, less than a really tiny image.

Can a human be tar pitted? Yes, BUT, this is also true for Google Search. Just try to manually run thousands of web search on Google.com within a few hours (granted, it's a tedious exercise), and you will get temporarily blocked by Google. In short, if a visitor starts to behave a like a robot, then she ends up flagged as a robot. However, except for very eccentric users, such limitations remain forever invisible.