Module ruya :: Class CrawlScope
[hide private]
[frames] | no frames]

Class CrawlScope

source code

object --+
         |
        CrawlScope

Ruya's configuration object to determine which scope will be used for a website while crawling

Instance Methods [hide private]

Inherited from object: __delattr__, __getattribute__, __hash__, __init__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Methods [hide private]
boolean
isvalidscope(cls, scope)
Checks if the scope is valid - one of allowed scopes in CrawlScope
source code
Class Variables [hide private]
  SCOPE_ALL = 100000
NOT SUPPORTED - Maybe next version?
  SCOPE_HOST = 100001
For url http://domain.ext/support/index.htm crawl pages only under host http://domain.ext/
  SCOPE_DOMAIN = 100002
For url http://domain.ext/support/index.htm crawl pages from domains and sub-domains under domain domain.ext - http://domain.ext/, http://second.domain.ext/, http://third.domain.ext/
  SCOPE_PATH = 100003
For url http://domain.ext/support/index.html crawl pages only under folder - http://domain.ext/support/
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

isvalidscope(cls, scope)
Class Method

source code 
Checks if the scope is valid - one of allowed scopes in CrawlScope
Parameters:
  • scope (number) - A valid crawl scope.
Returns: boolean
True is crawl scope is valid, False otherwise.