Photos / Bilder

Friday, June 24, 2011

DSE v.3.0.0 Beta #1 released!

For the impatient: Pypi Source Using Modified BSD license.

New in the 3.x version of DSE is the bulk_update-method, more intuitive syntax and code clean up.
NB! The new syntax is not backwards compatible so existing code using DSE must be updated to work.

New syntax:

    with Person.delayed as d:
        d.insert({'name': 'Thomas', 'age': 36, 'sex': 'M'})
        d.update({'id': 1, 'name': 'John'})
        d.delete(10) # Deletes record with id 10

I hope the syntax is more intuitive and easy to read. Comments wanted.

Bulk update It takes a dictionary of values to update, requires a value for the primary key/id of the record, but uses the django orm's own update method
instead of plain sql to reduce number of statements to execute. This is helpful when your fields can have a limited set of values, like EXIF-data from photos or metadata from mp3s.

An example::

    with Photo.delayed as d:
        d.update({'id': 1, 'camera_model': 'Nikon', 'fnumber': 2.8, 'iso_speed': 200})
        d.update({'id': 2, 'camera_model': 'Nikon', 'fnumber': 11, 'iso_speed': 400})
        d.update({'id': 3, 'camera_model': 'Nikon', 'fnumber': 2.8, 'iso_speed': 400})
        d.update({'id': 4, 'camera_model': 'Canon', 'fnumber': 3.5, 'iso_speed': 200})
        d.update({'id': 5, 'camera_model': 'Canon', 'fnumber': 11, 'iso_speed': 800})
        d.update({'id': 6, 'camera_model': 'Pentax', 'fnumber': 11, 'iso_speed': 800})
        d.update({'id': 7, 'camera_model': 'Sony', 'fnumber': 3.5, 'iso_speed': 1600})
        # and then some thousand more lines like that

Internally DSE will construct a structure like this::

    bulk_updates = {
        'camera_model': {
                'Nikon': [1,2,3],
                'Canon': [4,5],
                'Pentax': [6],
                'Sony': [7],
            },
        'fnumber': {
                2.8: [1,3],
                11: [2,5,6],
                3.5: [4,7],
            },
        'iso_speed': {
                200: [1,4],
                400: [2,3],
                800: [5,6],
                1600: [7]
        }
    }

And then execute those statements using::

    # pk = the primary key field for the model, in most cases id
    for field, values in bulk_updates.iteritems():
        for value, ids in values.iteritems():
            model.objects.filter(**{"%s__in" % pk: ids}).update(**{field: value})

For huge datasets where the fields can have limited values this has a big impact on performance. So when to use update or bulk_update depends on the data you want to process. For instance importing a contact list where most of the fields had almost unique values would benefit from the update-method, but importing data from photos, id3-tags from your music collection etc would process much faster using bulk_update.

Thanks to Cal Leeming [Simplicity Media Ltd] for inspiration on this one :-)

--
Mvh/Best regards,
Thomas Weholt
http://www.weholt.org

0 comments:

Post a Comment

Be nice. Always. Except.