| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
Config.CrawlConfig
|
|||
| None |
|
||
|
Inherited from |
|||
|
|||
|
useragent User-agent header to use during crawl - Specify a valid User-agent while crawling. |
|||
|
crawlfrom From header to use during crawl - Specify your email address here. |
|||
|
obeyrobotstxt Whether to obey or ignore robots.txt - If possible, always obey robots.txt during crawl. |
|||
|
obeymetarobots Recently crawler options are specified directly within html - Specify whether to obey or ignore meta robots. |
|||
|
acceptencoding Accept-encoding header to use during crawl - Specify whether to accept g-zip content or plain text only. |
|||
|
crawldelay Number of seconds (default 120 seconds) to wait before crawling next url within a website |
|||
|
crawlscope CrawlScope to use while crawling a website. |
|||
|
allowedmimes Valid MIME types to accept (default ['text/html']) during crawl. |
|||
|
allowedextns Valid extensions to accept during crawl. |
|||
|
levels Number of levels (default 2) to crawl deeper within a website. |
|||
|
maxcontentbytes Upper limit of page-size (default 500kb) to download in bytes during crawl. |
|||
|
maxretries Number of times to retry (default 3) a failed, unavailable url during crawl - Sometimes a url might be temporarily available, but may become available after a while. |
|||
|
retrydelay Number of seconds to wait before retrying (default 120 seconds) a failed, unavailable url. |
|||
|
maxcontenttruncate Whether to use available downloaded contents till max. page-size (default True). |
|||
|
maxcontentdiscard Whether to ignore a page completely if it's size exceeds max. page-size (default False). |
|||
|
|||
|
Inherited from |
|||
|
|||
Note: Please refer to Instance Variables section for details on each parameter. |
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0beta1 on Sun May 06 20:47:05 2007 | http://epydoc.sourceforge.net |