Wednesday, April 18, 2012

DSE 3.2.0 released

DSE version 3.2.0 has been released, with two fixes:

  • patch from andornaut@gmail.com to be compatible with Django 1.4.0. 
  • Patch from HervĂ© Cauwelier to provide support for models with non-autokey primary fields.

Monday, January 09, 2012

Django-photofile v.0.4.0 released

New in 0.4.0:

Photofile can detect screen resolution using a decorator, like so:

    from django.http import HttpResponseRedirect, HttpResponse
    from photofile.decorators import provide_screen_info

    @provide_screen_info
    def index(request):
         return HttpResponse("%sx%s" % (request.session.get('screen_width'), request.session.get('screen_height')))

You also need to add the photofile.urls:

    from django.conf.urls.defaults import patterns, include, url
    import photofile

    urlpatterns = patterns('',
        url(r'^default.html$', 'testme.views.index'),
    )
    urlpatterns += photofile.urls.urlpatterns;


This also makes it possible for photofile to automatically generate maximized thumbnails depending on the screen resolution:

    {% generate_thumbnail imagefile max %}

using the max option for resolution.

Wednesday, July 06, 2011

Kolibri v.0.2.0 released - now taking user input for background processing

For the impatient:

PYPI and source. Modified BSD license.

Screencast showing off the user-input-part.

Comments and ideas highly welcome, especially help in the UI/HTML/CSS-department. My design skills sucks :-(

Thanks for your attention.

Thomas

Friday, June 24, 2011

DSE v.3.0.0 Beta #1 released!

For the impatient: Pypi Source Using Modified BSD license.

New in the 3.x version of DSE is the bulk_update-method, more intuitive syntax and code clean up.
NB! The new syntax is not backwards compatible so existing code using DSE must be updated to work.

New syntax:

    with Person.delayed as d:
        d.insert({'name': 'Thomas', 'age': 36, 'sex': 'M'})
        d.update({'id': 1, 'name': 'John'})
        d.delete(10) # Deletes record with id 10

I hope the syntax is more intuitive and easy to read. Comments wanted.

Bulk update It takes a dictionary of values to update, requires a value for the primary key/id of the record, but uses the django orm's own update method
instead of plain sql to reduce number of statements to execute. This is helpful when your fields can have a limited set of values, like EXIF-data from photos or metadata from mp3s.

An example::

    with Photo.delayed as d:
        d.update({'id': 1, 'camera_model': 'Nikon', 'fnumber': 2.8, 'iso_speed': 200})
        d.update({'id': 2, 'camera_model': 'Nikon', 'fnumber': 11, 'iso_speed': 400})
        d.update({'id': 3, 'camera_model': 'Nikon', 'fnumber': 2.8, 'iso_speed': 400})
        d.update({'id': 4, 'camera_model': 'Canon', 'fnumber': 3.5, 'iso_speed': 200})
        d.update({'id': 5, 'camera_model': 'Canon', 'fnumber': 11, 'iso_speed': 800})
        d.update({'id': 6, 'camera_model': 'Pentax', 'fnumber': 11, 'iso_speed': 800})
        d.update({'id': 7, 'camera_model': 'Sony', 'fnumber': 3.5, 'iso_speed': 1600})
        # and then some thousand more lines like that

Internally DSE will construct a structure like this::

    bulk_updates = {
        'camera_model': {
                'Nikon': [1,2,3],
                'Canon': [4,5],
                'Pentax': [6],
                'Sony': [7],
            },
        'fnumber': {
                2.8: [1,3],
                11: [2,5,6],
                3.5: [4,7],
            },
        'iso_speed': {
                200: [1,4],
                400: [2,3],
                800: [5,6],
                1600: [7]
        }
    }

And then execute those statements using::

    # pk = the primary key field for the model, in most cases id
    for field, values in bulk_updates.iteritems():
        for value, ids in values.iteritems():
            model.objects.filter(**{"%s__in" % pk: ids}).update(**{field: value})

For huge datasets where the fields can have limited values this has a big impact on performance. So when to use update or bulk_update depends on the data you want to process. For instance importing a contact list where most of the fields had almost unique values would benefit from the update-method, but importing data from photos, id3-tags from your music collection etc would process much faster using bulk_update.

Thanks to Cal Leeming [Simplicity Media Ltd] for inspiration on this one :-)

--
Mvh/Best regards,
Thomas Weholt
http://www.weholt.org

Tuesday, June 21, 2011

Kolibri released - Asynchronous Processors/Workflow management for django.

For the impatient


Released under the Modified BSD license.

Background

Kolibri is a reusable django app for designing and executing asynchronous processes
and workflows. A workflow is a collections of steps in a defined order,
processing data in each step. A step can break the flow if an exception is
raised and/or a specified step can be executed to handle a specific exception.
Kolibri uses celery to handle processing in the background. All processors
and workflows can only be started by staff members, but more fine grained access
control might be implemented in future versions.

The project got started because I needed to control how I added content to a
photo project I'm developing in django. The project involved lots of heavy processes
 like thumbnail generation and metadata processing. Adding content consists of steps that
needs to be done in a specific order, and I need to control what action to take
if one step throws an exception. I was using celery, but adding a new step or
process was tedious and I wanted more dynamic way of defining and managing processors.

The current implementation is not stable and a proof of concept. Comments very
welcome, especially on how to monitor status of celery processes and provide
feedback to the user.

I've even included some screencasts showing it in action:
Screencast #1.
Screencast #2

Features

* asynchronous processes, which can process items/querysets or execute processes not
related to specific models or instances (sending email, scanning filesystems etc)

* connect several processors into workflows, with exception handling, clean-up
steps and an optional fluent interface

* template tags to handle execution of processors/workflows for an item or queryset
in your templates

* admin action integration for your models

* dashboard listing running processors

* a concept of pending processors and a history of what has been processed so you
don't execute unnecessary processesors or workflows

* user exclusive processors so two users can execute the same processor at the
same time without touching the same data

* logging and history, with direct link to processed instances

* ajax integration using jquery


Planned features

* better examples, more detailed tutorial and actual documentation in the source

* option of giving a processor a form class to provide input from user prior to
starting process, like username and password for your Flickr-account before publishing photos

* full-blown dashboard with feedback and progress from running processes and some way of killing processes

* nicely formatted logs and history for processed items

* a way of telling users that something is going on with the item they're looking at
(progressbar, growl notification etc.)


Installation

pip install django-kolibri

or

hg clone https://bitbucket.org/weholt/django-kolibri
python setup.py install

* set STATIC_ROOT and STATIC_URL in settings.py
* add 'kolibri' to your installed apps
* add url(r'^kolibri/', include('kolibri.urls')), to your urls.py

It would be smart to read through usage.txt first for a more detailed tutorial or experiment with
the working example project provided in the source, available at bitbucket.


Requirements

* Django
* Celery / django-celery


Example usage

The simplest processor you can define looks something like::

    from kolibri.core import *
    from models import *

    dirty_words = ('foo', 'fudge', 'bar',)

    class RemoveProfanity(Processor):
        model = Article

        def process(self, user, article, **kwargs):
            for dirty_word in dirty_words:
                article.text = article.text.replace(dirty_word,'*'*len(dirty_word))
            article.save()

    manager.register.processor(RemoveProfanity())

It's a very simple processor which replaces all dirty words, defined in
dirty_words, with * from instances of a model called Article.

To create a workflow, connecting a series of processors::

    from kolibri.core import manager
    from kolibri.core.workflow import Workflow

    workflow = Workflow('Publish article', model=Article)
    workflow.first(RemoveProfanity()).on_exception(ValueError, DirtyWordRemover()).\
        then(PublishArticle()).then(ArchiveArticle())

    manager.register.workflow(workflow)

Here we create a workflow called "Publish article" for the Article-model. First
we remove all profanity using the RemoveProfanity, if RemoveProfanity raises
an ValueError we run the DirtyWordRemover-processor, then we publish
the article using a processor called PublishArticle and finally we archive it.

See the usage.txt document in the source for more examples and in-depth
explanation of features.

Wednesday, May 18, 2011

DSE: 2.0.0-RC1 released - now using BSD license

No change in code, just a new license. No new issues has been reported. Will try to add the announced and planned unittests related to SQL injection in a few days, but this version of the source is most likely the same as the forthcoming stable 2.0.0.

Monday, May 16, 2011

DSE: When in Rome ..... or "What software license should I use?"

So far I've released all my django-apps using the GPL license. This is not by accident. GPL, FSF, RMS and the whole philosophy of free software was and still is important to me. It was the main motivation for my departure from Windows to Linux about 11-12 years ago.

After posting a question about what license to choose for my django-apps and seeing the amount of debate it generated I recognize that this is a pretty sensitive issue, but it also made it clear, especially after Jacob Kaplan-Moss posted a reply, that there's much more to choosing a software license than just the legal side. It also sends a strong signal into whatever community you're active in, in this case, the django community, where the more permissive BSD license is mostly used. 

So I've decided to release the forthcoming DSE version 2.0 under the same license as django. I'm still a strong believer in free software and the GPL, but I also think it's important to recognize the social values of the community I want to be a part of and contribute to. So my plan is to release all strictly django-related software using the BSD license in the future.

My motivation for releasing my private software development projects as open source has allways been selfish; I hope that if my software is any good somebody else will use it too. And then they find bugs, add features and hopefully make those enhancements available for me as well, so we can make the software better - and everybody wins.

If I had to make a living from my private projects my thoughts about this might change and I respect that people don't agree with me and/or believe they're better of choosing a different licensing scheme. I got a fulltime job in a Windows-only, purely propriertary software company paying my bills so open source is just a hobby.

To wrap this up; a good friend of mine told me that the people who want to contribute and give feedback, post source online, share new ideas etc - they'll do it no matter what license you choose ( at least in most cases - it's obvious that the GPL is sort of a repellent for some, even other open source developers ). On the other side; the people who don't respect a certain license or don't contribute in a community won't suddenly do so just because I use a specific license anyhow.