introduction
For those unfamiliar with zena, we currently store data in the following schema:

One “document” (called node) contains many “versions” with a “title”, “summary” and “text” field. If we need other fields, we either use another table “xxx_contents” (image_content, contact_content) or we use dynamic attributes (key/value pairs).
problems
The current implementation of dynamic attributes has the following problems:
a. cannot do a fulltext search on a specific key
b. bad performance / wrong result set when used in pseudo sql queries
c. scaling problems (produces too many table entries)
d. only support text fields (no dates, integers or floats)
Solving each of these problems requires some adjustments to the way we store dynamic attributes.
going “schema free” in a relational database
I always felt there was a lot of over-engeneering in deciding versions should have a “title”, a “summary” and a “text” field, the rest being dynamic attributes. What I really want is to store a “document” with whatever fields it needs.
The solution to this problem is obvious:
- serialize all document attributes in a blob using
Marshall. - store information on required fields/validations in the definition of the virtual classes
That would give us many advantages:
- we are no longer limited to storing text
- once we have a version, we do not need any more db queries to build the full object
- we could get rid of the “xyz_content” tables
indexing in a “schema free” environment
In order to permit both fulltext search on the document content and SQL queries using the fields, we need to provide a create/update hook. Hopefully, we now have rubyless, so we can use this to build custom indexes for each type of document.
The idea is to:
- compose a field for fulltext search
- generate key/value entries for pseudo sql queries
fulltext composition
fulltext(
:high => [title, "npa:#{zip_code}", "zip:#{zip_conde}"],
:medium => [text],
:low => [remark]
)
The example above would concatenate the title and zip codes into a single “high priority” field. The text would go into the “medium” priority field and the remark in the “low” one. A higher priority field has more importance to compute the result’s score. This example would enable searches such as “zip:14032” or “rails” (we just have to check our indexing engines to see how the split words…).
key/value index
We only build the index entries we need:
property :latitude, Float, :range => (-90..90), :indexed => true
property :longitude, Float, :range => (-90..90), :indexed => true
property :event_date, Time, :nullable => false, :indexed => true
This will produce three entries in the “query helpers” tables (two in the “float_index” table and one in the “time_index” table).
Note that we destroy the entries related to a version when the version is no longer visible (replaced, removed, deleted). This avoids the ever growing size of these index tables.
These tables could have the following schema:
node_id integer
version_id integer
key varchar(40)
value float / integer / datetime / varchar(100) / etc
Now we can build pseudo sql queries like this and the query compiler will use our “number_index” table to find the records:
events where abs(latitude - #{params[:lat}) < 0.2 and abs(longitude - #{params[:long]} < 0.2 limit 5
virtual classes types
For this big idea to work properly, we need to provide type/validation to the fields of our “documents”. This can be done using RubyLess again:
class Event < Note
property :latitude, Float, :range => (-90..90)
property :longitude, Float, :range => (-90..90)
property :zip_code, Integer, :nullable => false
property :text, String
end
Yes, this syntax is inspired by DataMapper :-)
drawbacks
Apart from the coding work implied to implement all this, this solution does have a few drawbacks:
- code complexity to ensure database consistency if the “properties” are altered
- slower writes
abstracting relations
Since we might be using something close to ruby to define attributes, we could as well use the same tool to define relations. Putting it all together makes class definitions look like this:
class Event < Note
habtm :related_events, :class => Event
has_one :icon, :class => Image
property :latitude, Float, :range => (-90..90), :indexed => true
property :longitude, Float, :range => (-90..90), :indexed => true
property :zip_code, Integer, :nullable => false, :indexed => true
property :title, String
property :text, String
fulltext(
:high => [title, "npa:#{zip_code}", "zip:#{zip_conde}"],
:medium => [text],
:low => [remark]
)
end
Now I’m not totally sure this is as easy to implement as it looks, especially the relations definitions.
Since we need some way to link together the has_one :icon from Event to the reverse relation in Image, we might need to add a ‘rel’ clause. In Image:
belongs_to :event, :rel => 'icon-event', ...
and in the Event:
hash_one :icon, :rel => 'icon-event', ...
comments