better schema

todos

  1. better dynamic attributes
  2. better schema
  3. gem
  4. links rewrite
  5. tagging system

This is a set of ongoing thoughts to further enhance the way data is actually stored in zena. There are mainly two problems to address:

  1. reduce the number of queries per page
  2. allow for more advanced collaborative editing

In the following lines, I’m going to review some thoughts on both these subjects.

less queries

The goal would be to avoid the N+1 queries for a list of N elements (query for the list of nodes, get the version) or to make it a 2 (query for nodes, query for versions).

We could solve the N+1 horror by using rails’ :include => :version in the node query. That would mean we would need to simplify the version query rule so that it can be easily scoped in SQL.

Getting the version from the node is quite complex for the moment and that’s bad. It’s bad because its too complex to be intuitive and it forbids us to use :include. The rule is:

  1. get a redaction (status = 30) from the visitor (user_id = xyz) in the current language (lang = 'en')
  2. get a publication (status = 50) with a preference for the current language (lang = 'en') if the visitor hasn’t got publish rights (!visitor.group_ids.include?(node.pgroup_id)).
  3. get anything that’s not a redaction (status != 30) if the visitor has publication rights on the node.

In order to simplify this, we would need to drop some rules. For example, we could decide that redactions and publications are public to all writers (thus merging rules 1 and 3). We might be able to reduce further by properly preparing the query in two groups (1 for the nodes with write access and 1 for the others). We would end up with 3 queries and some mixing up.

I did some testing and found that the :include in rails actually does 2 queries, not a single one with mixed columns and a LEFT JOIN. Before deciding anything, we will need to benchmark it all. This 3 step query can actually be slower if there are many versions for each node since all the versions from all the nodes will be loaded (not just the right one). We could partly reduce the overhead of this problem by filtering before ActiveRecord instanciation.

dynamic attributes

That’s another issue that actually adds N queries… The worst case: you list a bunch of images (node + version + content) with dynamic attributes: (1+N+N+N) queries. Bad.

fuzzy nodes

This is to improve collaborative editing of text. The idea is to be able to:

  1. like they do on Wikipedia: you can edit just a paragraph once you have sections defined (we could use h2 titles).
  2. change only some attributes (do not duplicate everything that does not change)
  3. etc

An idea would be to drop the idea of versions (1,2,3) and work with timestamps on the elements that build up the node… But this will not work because it’s hard to get “all attributes up to a certain date but only the latest for each key” (max of min problem).

comments

  1. Tuesday, September 22 2009 22:28 Gaspard Bucher

    Oh, and by the way, one of the motivations with zena is to solve the horrible problem to fully understand your data set before even starting. You build huge tables, but then your client needs something different. That’s why relations in zena can be added and altered easily and are never mandatory and why the closer we can get to a powerful key/value storage while retaining relations, complex queries and such, the better.

  2. Thursday, October 15 2009 01:09 Gaspard Bucher

    First part (less queries implemented in zena 0.15).

  3. leave a comment