| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
Uri
|
|||
| None |
|
||
|
|||
|
|||
|
|||
|
|||
|
|||
| Uri |
|
||
| boolean |
|
||
| boolean |
|
||
| boolean |
|
||
| boolean |
|
||
| str |
|
||
| str |
|
||
| boolean |
|
||
| boolean |
|
||
|
Inherited from |
|||
|
|||
|
url Url with querystring removed. |
|||
|
hash SHA has for url. |
|||
|
|||
|
parts Returns a tuple consisting of various parts of a url. |
|||
|
domainurl Returns the domain found after analyzing the url. |
|||
|
robotstxturl Returns the robots.txt path for a url. |
|||
|
domains Returns valid domains found after analyzing the url. |
|||
|
hashes Returns valid SHA hashes for url string. |
|||
|
Inherited from |
|||
|
|||
|
|
|
|
See Also: urlparse |
|
|
See Also: issamedomain |
Note: Sub-domain is simply determined if example.domain.ext ends in domain.ext. |
|
|
|
|
|
|
|||
partsReturns a tuple consisting of various parts of a url.
See Also: urlparse |
domainurlReturns the domain found after analyzing the url.
|
robotstxturlReturns the robots.txt path for a url. Usually, http://domain.ext/ has robots.txt placed in it's root as http://domain.ext/robots.txt.
|
domainsReturns valid domains found after analyzing the url. http://www.domain.ext/ and http://domain.ext/ both point to the same domain domain.ext, so they must be considered same. This function assists the crawler when determining if two urls are from same domain.
|
hashesReturns valid SHA hashes for url string. Two different hashes will be returned if url domain starts with www as http://www.domain.ext/ and http://domain.ext/ both point to the same domain domain.ext.
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0beta1 on Sun May 06 20:47:05 2007 | http://epydoc.sourceforge.net |