gc3libs.url

Utility classes and methods for dealing with URLs.

class gc3libs.url.Url

Represent a URL as a named-tuple object. This is an immutable object that cannot be changed after creation.

The following read-only attributes are defined on objects of class Url.

Attribute Index Value if not present
scheme 0 URL scheme specifier empty string
netloc 1 Network location part empty string
path 2 Hierarchical path empty string
query 3 Query component empty string
hostname 4 Host name (lower case) None
port 5 Port number as integer (if present) None
username 6 User name None
password 7 Password None
fragment 8 URL fragment (part after #) empty string

There are two ways of constructing Url objects:

  • By passing a string urlstring:

    >>> u = Url('http://www.example.org/data')
    
    >>> u.scheme == 'http'
    True
    >>> u.netloc == 'www.example.org'
    True
    >>> u.path == '/data'
    True
    

    The default URL scheme is file:

    >>> u = Url('/tmp/foo')
    >>> u.scheme == 'file'
    True
    >>> u.path == '/tmp/foo'
    True
    

    However, if a # character is present in the path name, it will be taken as separating the path from the “fragment”:

    >>> u = Url('/tmp/foo#1')
    >>> u.path == '/tmp/foo'
    True
    >>> u.fragment == '1'
    True
    

    Please note that extra leading slashes ‘/’ are interpreted as the begining of a network location:

    >>> u = Url('//foo/bar')
    >>> u.path == '/bar'
    True
    >>> u.netloc == 'foo'
    True
    >>> Url('///foo/bar').path == '/foo/bar'
    True
    

    (Check RFC 3986 http://tools.ietf.org/html/rfc3986)

    If force_abs is True (default), then the path attribute is made absolute, by calling os.path.abspath if necessary:

    >>> u = Url('foo/bar', force_abs=True)
    >>> os.path.isabs(u.path)
    True
    

    Otherwise, if force_abs is False, then the path attribute stores the passed string unchanged:

    >>> u = Url('foo', force_abs=False)
    >>> os.path.isabs(u.path)
    False
    >>> u.path == 'foo'
    True
    

    Other keyword arguments can specify defaults for missing parts of the URL:

    >>> u = Url('/tmp/foo', scheme='file', netloc='localhost')
    >>> u.scheme == 'file'
    True
    >>> u.netloc == 'localhost'
    True
    >>> u.path == '/tmp/foo'
    True
    

    Query attributes are also supported:

    >>> u = Url('http://www.example.org?foo=bar')
    >>> u.query == 'foo=bar'
    True
    

    and so are fragments:

    >>> u = Url('postgresql://user@db.example.org#table=data')
    >>> u.fragment == 'table=data'
    True
    
  • By passing keyword arguments only, to construct an Url object with exactly those values for the named fields:

    >>> u = Url(scheme='http', netloc='www.example.org', path='/data')
    

    In this form, the force_abs parameter is ignored.

See also: http://goo.gl/9WcRvR

adjoin(relpath)

Return a new Url, constructed by appending relpath to the path section of this URL.

Example:

>>> u0 = Url('http://www.example.org')
>>> u1 = u0.adjoin('data')
>>> str(u1)
'http://www.example.org/data'

>>> u2 = u1.adjoin('moredata')
>>> str(u2)
'http://www.example.org/data/moredata'

Even if relpath starts with /, it is still appended to the path in the base URL:

>>> u3 = u2.adjoin('/evenmore')
>>> str(u3)
'http://www.example.org/data/moredata/evenmore'

Optional query attribute is left untouched:

>>> u4 = Url('http://www.example.org?bar')
>>> u5 = u4.adjoin('foo')
>>> str(u5)
'http://www.example.org/foo?bar'
class gc3libs.url.UrlKeyDict(iter_or_dict=None, force_abs=False, **extra_kv)

A dictionary class enforcing that all keys are URLs.

Strings and/or objects returned by urlparse can be used as keys. Setting a string key automatically translates it to a URL:

>>> d = UrlKeyDict()
>>> d['/tmp/foo'] = 1
>>> for k in d.keys(): print (type(k), k.path) # doctest:+ELLIPSIS
<class '....Url'> /tmp/foo

Retrieving the value associated with a key works with both the string or the url value of the key:

>>> d['/tmp/foo']
1
>>> d[Url('/tmp/foo')]
1

Key lookup can use both the string or the Url value as well:

>>> '/tmp/foo' in d
True
>>> Url('/tmp/foo') in d
True
>>> 'file:///tmp/foo' in d
True
>>> 'http://example.org' in d
False

Class UrlKeyDict supports initialization by copying items from another dict instance or from an iterable of (key, value) pairs:

>>> d1 = UrlKeyDict({ '/tmp/foo':'foo', '/tmp/bar':'bar' })
>>> d2 = UrlKeyDict([ ('/tmp/foo', 'foo'), ('/tmp/bar', 'bar') ])
>>> d1 == d2
True

An empty UrlKeyDict instance is returned by the constructor when called with no parameters:

>>> d0 = UrlKeyDict()
>>> len(d0)
0

If force_abs is True, then all paths are converted to absolute ones in the dictionary keys.

>>> d = UrlKeyDict(force_abs=True)
>>> d['foo'] = 1
>>> for k in d.keys(): print(os.path.isabs(k.path))
True
>>> d = UrlKeyDict(force_abs=False)
>>> d['foo'] = 2
>>> for k in d.keys(): print(os.path.isabs(k.path))
False
class gc3libs.url.UrlValueDict(iter_or_dict=None, force_abs=False, **extra_kv)

A dictionary class enforcing that all values are URLs.

Strings and/or objects returned by urlparse can be used as values. Setting a string value automatically translates it to a URL:

>>> d = UrlValueDict()
>>> d[1] = '/tmp/foo'
>>> d[2] = Url('file:///tmp/bar')
>>> for v in d.values(): print (type(v), v.path) # doctest:+ELLIPSIS
<class '....Url'> /tmp/foo
<class '....Url'> /tmp/bar

Retrieving the value associated with a key always returns the URL-type value, regardless of how it was set:

>>> d[1] == Url(scheme='file', netloc='', path='/tmp/foo',               hostname=None, port=None, query='',               username=None, password=None, fragment='')
True

Class UrlValueDict supports initialization by any of the methods that work with a plain dict instance:

>>> d1 = UrlValueDict({ 'foo':'/tmp/foo', 'bar':'/tmp/bar' })
>>> d2 = UrlValueDict([ ('foo', '/tmp/foo'), ('bar', '/tmp/bar') ])
>>> d3 = UrlValueDict(foo='/tmp/foo', bar='/tmp/bar')

>>> d1 == d2
True
>>> d2 == d3
True

In particular, an empty UrlDict instance is returned by the constructor when called with no parameters:

>>> d0 = UrlValueDict()
>>> len(d0)
0

If force_abs is True, then all paths are converted to absolute ones in the dictionary values.

>>> d = UrlValueDict(force_abs=True)
>>> d[1] = 'foo'
>>> for v in d.values(): print(os.path.isabs(v.path))
True
>>> d = UrlValueDict(force_abs=False)
>>> d[2] = 'foo'
>>> for v in d.values(): print(os.path.isabs(v.path))
False