Home | Trees | Indices | Help |
|
---|
|
object --+ | Config.CrawlConfig
|
|||
None |
|
||
Inherited from |
|
|||
useragent User-agent header to use during crawl - Specify a valid User-agent while crawling. |
|||
crawlfrom From header to use during crawl - Specify your email address here. |
|||
obeyrobotstxt Whether to obey or ignore robots.txt - If possible, always obey robots.txt during crawl. |
|||
obeymetarobots Recently crawler options are specified directly within html - Specify whether to obey or ignore meta robots. |
|||
acceptencoding Accept-encoding header to use during crawl - Specify whether to accept g-zip content or plain text only. |
|||
crawldelay Number of seconds (default 120 seconds) to wait before crawling next url within a website |
|||
crawlscope CrawlScope to use while crawling a website. |
|||
allowedmimes Valid MIME types to accept (default ['text/html']) during crawl. |
|||
allowedextns Valid extensions to accept during crawl. |
|||
levels Number of levels (default 2) to crawl deeper within a website. |
|||
maxcontentbytes Upper limit of page-size (default 500kb) to download in bytes during crawl. |
|||
maxretries Number of times to retry (default 3) a failed, unavailable url during crawl - Sometimes a url might be temporarily available, but may become available after a while. |
|||
retrydelay Number of seconds to wait before retrying (default 120 seconds) a failed, unavailable url. |
|||
maxcontenttruncate Whether to use available downloaded contents till max. page-size (default True). |
|||
maxcontentdiscard Whether to ignore a page completely if it's size exceeds max. page-size (default False). |
|
|||
Inherited from |
|
Note: Please refer to Instance Variables section for details on each parameter. |
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0beta1 on Sun May 06 20:47:05 2007 | http://epydoc.sourceforge.net |