Merjbot

Introducing Merjbot

Merjbot is a collection of tailor made, distributed and scalable web crawlers that provide insights about your website. It allows us to discover, analyse, and resolve a wide range of technical issues for you. Although our IP address can change from time to time, by conducting a reverse IP address lookup, you will be able to see that Merjbot resolves to bots.merj.com.

Why is Merjbot helpful?

Merjbot helps us create an intricate understanding of your website through analysing its structure, content changes, new pages, and both internal and external linking structures.

Why Merjbot may have crawled your site

We collect data for a variety of different reasons, including:

  1. You are a client of ours and we are working with your internal teams.
  2. We are working for one of your competitors and are conducting competitive intelligence.
  3. We are collecting market data for our reports, legal cases, investor relations or due diligence.

Or, it might be that we're on one of our quests to conduct original and thought-provoking research. We often do this in order to help find interesting solutions to some of the technical community's most difficult and problematic issues.

Useful Information

Bot User Agent Strings

The full user agent strings we use for Merjbot are:

Mozilla/5.0: (compatible; Merjbot/1.0; +https://merj.com/bot)
Mozilla/5.0: (compatible; Googlebot/2.1; +https://merj.com/bot)
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +https://merj.com/bot)

Like many web crawlers, Merjbot obeys robots.txt files, including disallow and allow rules, unless our research requires us not to.

Controlling Merjbot on your website can be done a variety of ways.

Bot Blocking Merjbot from your site

If you would prefer that the Merjbot does not visit your site, you can block it by using your robot.txt file, by adding the following lines:

User-agent: Merjbot
Disallow: /

Bot Bot Rules

Although we respect robots.txt where possible, if you would like to stop Merjbot from crawling certain pages or areas of your site, you can tell Merjbot not to crawl pages or subdirectories. For example:

User-agent: Merjbot
Disallow: /comments/ # Block all comments

Rate Rate Limiting Merjbot

You can also slow down Merjbot so that it crawls your site at a slower rate. Control this by entering the Crawl-Delay rule. For example, here's what you'd use for a 15 second crawl speed:

User-agent: Merjbot
Crawl-delay : 15 # seconds
Question

How many web pages can Merjbot crawl?

Merjbot is built on our distributed scaling language, with the capacity to crawl and process approximate one billion web pages per day (11,500 requests per second). We scale our resources according to client or research requirements.

Why choose Merj?

Using a blend of scientific and data-led methodologies with our experience and in-depth knowledge, we can support the growth of you and your company with unique insights and strategies for powerful organic search campaigns. Contouring our services to the needs and objectives of each client, we’re totally flexible in our approach to working with clients, both in the UK and abroad.

Want to work together?

FAQs

We may be providing your company with various data insights. Alternatively, we could be running competitor analysis or industry research

The crawlers use both fixed and floating IP addresses. We can direct fixed IP addresses at specific sites to help teams manage Merjbot through firewall routing and whitelsting bot protection.

Let us help you solve your digital problems

We help leading organisations to optimise their digital presence the right way, by tailoring software to integrate business and digital processes, so the humans can focus on strategy, while the machines do the heavy lifting.