getting Forbidden by robots.txt: scrapy

Posted on

Question :

getting Forbidden by robots.txt: scrapy

while crawling website like, getting Forbidden by robots.txt:>

ERROR: No response downloaded for:

Answer #1:

In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your with ROBOTSTXT_OBEY


Here are the release notes

Answered By: Rafael Almeida

Answer #2:

First thing you need to ensure is that you change your user agent in the request, otherwise default user agent will be blocked for sure.

Answered By: Ketan Patel

Answer #3:

Netflix’s Terms of Use state:

You also agree not to circumvent, remove, alter, deactivate, degrade or thwart any of the content protections in the Netflix service; use any robot, spider, scraper or other automated means to access the Netflix service;

They have their robots.txt set up to block web scrapers. If you override the setting in to ROBOTSTXT_OBEY=False then you are violating their terms of use which can result in a law suit.

Answered By: CubeOfCheese

Leave a Reply

Your email address will not be published. Required fields are marked *