I recently pushed the v.0.1.2 release of my new Django application, django- cloud-browser, to PyPi. Cloud Browser is a simple web-based file browser for cloud-based datastores, which so far includes Amazon S3, Rackspace Cloud Files (including OpenStack) and local file system. The application can expose read-only cloud files to users and/or administrators with configurations for both normal and administrative deployments.
The basic reference points to get started include:
At work, we have some containers with very large numbers of objects – over 5 million and counting. While objects ultimately are named in a flat namespace, we divide up the namespace with slashes (e.g., “path/to/cloud/object/file.txt”). Nearly all cloud providers (Amazon S3, Microsoft Azure, and Rackspace) include functionality in their object APIs to return results grouped around hierarchical “implied” or “pseudo-” directory objects based on a separator (like the slash that we use).
Despite the fact that Rackspace’s underlying REST and Python APIs both support implying directories from a delimiter like a slash, Rackspace’s own management console will only display objects using a completely flat namespace. As an example, I have an old side project where I stored US patent data in XML form in both Rackspace Cloud Files and Amazon S3, and broke up each object with a slash-delimited path. (I have obscured my bucket / container names in all of the screen shots in this post). Let’s view the objects in the Rackspace container using their web interface:
Thus, you have to wade through multiple pages of flat objects to find one you are looking for, and all of the work in sectioning out the object namespace has basically gone to waste.
By contrast, Amazon’s web dashboard for S3, will imply pseudo-directories, as we can see from the analogous view in S3:
Given the hassles with trying to browse our millions of objects in the Rackspace dashboard, I finally decided to bite the bullet and just code up a Django application to do the job (such that I could easily integrate it into our system at work). A few months later, I finally hacked together a first version of the
Django Cloud Browser
At a high level, Cloud Browser application exposes containers and objects in a Cloud datastore to users or administrators. Let’s look at a basic example:
On your system, first install the application:
$ pip install django-cloud-browser
Then, edit your Django “settings.py”. Here, I’m setting up a Rackspace Cloud Files account for browsing:
INSTALLED_APPS = ( # ... 'cloud_browser', ) CLOUD_BROWSER_DATASTORE = "Rackspace" CLOUD_BROWSER_RACKSPACE_ACCOUNT = "<my_account>" CLOUD_BROWSER_RACKSPACE_SECRET_KEY = "<my_secret_key>"
Next, add the URLs to your “urls.py”:
urlpatterns = patterns('', # ... url(r'^cb/', include('cloud_browser.urls')), )
And start up your Django project! You should be able to click through container and object navigation links, as well as page results when they are over the maximum number of objects per page (configurable via
CLOUD_BROWSER_DEFAULT_LIST_LIMIT, but defaults to 20 objects). Here is my navigation into the same Rackspace object as before, but now the objects have implied pseudo-directories.
Objects that lead to ultimate files will be downloaded / displayed when clicked. Breadcrumb links are displayed along the top of the page as well.
The application even figures out and sets appropriate headers for content type and content encoding, which means, e.g., that compressed files will be uncompressed on the fly by browsers that support the compression scheme. At this point, it is worth noting that all access through Cloud Browser is read-only, although in the future we will look to enhancements like uploads, metadata retrieval / modification, etc.
In addition to Rackspace Cloud Files, the application supports Amazon S3 as well as the local file system (with a mock container / objects scheme imposed on real directories and files). Examples:
# AWS S3 CLOUD_BROWSER_DATASTORE = "AWS" CLOUD_BROWSER_AWS_ACCOUNT = "<my_account>" CLOUD_BROWSER_AWS_SECRET_KEY = "<my_secret_key>"
# Local filesystem CLOUD_BROWSER_DATASTORE = "Filesystem" CLOUD_BROWSER_FILESYSTEM_ROOT = "/usr/share/doc"
White / Black Lists
By default, all containers and all objects are browsable. However, the application supports basic white and black lists at the container level. If a whitelist is specified, only those containers will be browsable.
CLOUD_BROWSER_CONTAINER_WHITELIST = ( 'my_container', 'more_stuff', )
If a blacklist is specified, only those containers are excluded from browsing.
CLOUD_BROWSER_CONTAINER_BLACKLIST = ( 'secret_stuff', 'archive', )
If both are specified, access is effectively only what is allowed by the whitelist.
The settings variable
CLOUD_BROWSER_VIEW_DECORATOR can be set to a function (e.g., a decorator) that wraps all browsing views. Setting this to Django’s
login_required decorator permits only logged-in users to use the Cloud Browser.
CLOUD_BROWSER_STATIC_MEDIA_DIR variable to the relative path from
Cloud Browser also allows for integration with the Django admin. Here, we just make a simple change to our “urls.py”:
urlpatterns = patterns('', # ... # Admin URLs. Note: Include ``urls_admin`` **before** admin. url(r'^admin/cb/', include('cloud_browser.urls_admin')), url(r'^admin/', include(admin.site.urls)), )
It is also a really good idea to manually set the “staff required” decorator in “settings.py” to mirror the other admin view restrictions:
from django.contrib.admin.views.decorators import staff_member_required CLOUD_BROWSER_VIEW_DECORATOR = staff_member_required
And here’s the same view as above in the admin:
The administrative views also have a closable container element (click the +/- box in the upper right hand corner to toggle):
This work is at a very early stage and will likely change a lot, as there are many unresolved issues: links don’t appear on the dashboard because there are no models, there are no configurable admin permissions to the Cloud Browser views (either all or none), etc. But, it’s close enough for now.
Cloud Browser provides a very basic cloud file viewer for your Django projects, but it gets the job done, particularly for Rackspace Cloud Files. Any and all feedback and bugs are strongly encouraged – just open a ticket on the GitHub issues page. Happy browsing.