author: | Anton Gritsay, http://angri.ru |
---|---|
version: | 0.5.2 |
license: | 2-clause BSD (see LICENSE) |
download: | http://sqlamp.angri.ru/sqlamp-0.5.2.tar.gz |
sqlamp is an implementation of an efficient algorithm for working with hierarchical data structures — Materialized Path. sqlamp uses (and depends of) SQLAlchemy.
Materialized Path is a way to store (and fetch) a trees in a relational databases. It is the compromise between Nested Sets and Adjacency Relations in respect to simplicity and efficiency. Method was promoted by Vadim Tropashko in his book SQL Design Patterns. Vadim’s description of the method can be read in his article Trees in SQL: Nested Sets and Materialized Path (by Vadim Tropashko).
Implemented features:
- Setting up with declarative.ext or without it.
- Saving node roots — if no parent set for node. The tree will have a new tree_id.
- Saving child nodes — if node has some parent. The whole dirty job of setting values in tree_id, path and depth fields is done by sqlamp.
- Fetching node’s descendants, ancestors and children using the most efficient way available (see MPInstanceManager)
- Autochecking exhaustion of tree size limits — maximum number of children and maximum nesting level (see MPManager to learn more about limits fine-tuning) is done during session flush.
- Rebuilding all trees (see MPClassManager.rebuild_all_trees()) and any subtree (MPClassManager.rebuild_subtree()) on the basis of Adjacency Relations.
- Collapsing flat tree returned from query to recursive structure (see tree_recursive_iterator()).
- Node classes may use polymorphic inheritance.
Moving of nodes is not yet implemented.
Known-to-work supported DBMS include sqlite (tested with 3.6.14), MySQL (tested using both MyISAM and InnoDB with server version 5.1.34) and PostgreSQL (tested with 8.3.7), but sqlamp should work with any other DBMS supported by SQLAlchemy.
Note
Code examples here are all runable by copy-paste to interactive interpreter.
import sqlalchemy, sqlalchemy.orm
engine = sqlalchemy.create_engine('sqlite:///:memory:', echo=False)
metadata = sqlalchemy.MetaData(engine)
node_table = sqlalchemy.Table('node', metadata,
sqlalchemy.Column('id', sqlalchemy.Integer, primary_key=True),
sqlalchemy.Column('parent_id', sqlalchemy.ForeignKey('node.id')),
sqlalchemy.Column('name', sqlalchemy.String)
)
There is nothing special to sqlamp here. Note self-reference “child to parent” (‘parent_id’ is foreign key to table’s primary key) just as in any other implementation of adjacency relations.
import sqlamp
class Node(object):
mp = sqlamp.MPManager(node_table)
def __init__(self, name, parent=None):
self.name = name
self.parent = parent
def __repr__(self):
return '<Node %r>' % self.name
Attach instance of MPManager to class that represents node. The only required argument for MPManager constructor is the table object.
Now we can create the table and define the mapper (it is important to create table after MPManager was created as created MPManager appends three new columns and one index to the table):
node_table.create()
Setting up the mapper requires only one extra step — providing Node.mp as mapper extension:
mapper = sqlalchemy.orm.mapper(
Node, node_table,
extension=[Node.mp],
properties={
'parent': sqlalchemy.orm.relation(
Node, remote_side=[node_table.c.id]
)
}
)
You may see value provided as properties argument: this is a way recomended by the official SQLAlchemy documentation to set up an adjacency relation.
Alternative way to set up: ext.declarative
Starting from version 0.5 it is able and convenient to use declarative approach to set your trees up:
import sqlalchemy, sqlalchemy.orm
from sqlalchemy.ext.declarative import declarative_base
import sqlamp
engine = sqlalchemy.create_engine('sqlite:///:memory:', echo=False)
metadata = sqlalchemy.MetaData(engine)
BaseNode = declarative_base(metadata=metadata,
metaclass=sqlamp.DeclarativeMeta)
class Node(BaseNode):
__tablename__ = 'node'
__mp_manager__ = 'mp'
id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
parent_id = sqlalchemy.Column(sqlalchemy.ForeignKey('node.id'))
parent = sqlalchemy.orm.relation("Node", remote_side=[id])
name = sqlalchemy.Column(sqlalchemy.String())
def __init__(self, name, parent=None):
self.name = name
self.parent = parent
def __repr__(self):
return '<Node %r>' % self.name
Node.__table__.create()
As you can see it is pretty much the same as usual for sqlalchemy’s “declarative” extension. Only two things here are sqlamp-special: metaclass argument provided to declarative_base() factory function should be DeclarativeMeta and the node class should have an __mp_manager__ property with string value. See DeclarativeMeta for more information about options.
Now all the preparation steps are done. Lets try to use it!
session = sqlalchemy.orm.sessionmaker(engine)()
root = Node('root')
child1 = Node('child1', parent=root)
child2 = Node('child2', parent=root)
grandchild = Node('grandchild', parent=child1)
session.add_all([root, child1, child2, grandchild])
session.flush()
We have just created a sample tree. This is all about AL, nothing special to sqlamp here. The interesting part is fetching trees:
root.mp.query_children().all()
# should print [<Node 'child1'>, <Node 'child2'>]
root.mp.query_descendants().all()
# [<Node 'child1'>, <Node 'grandchild'>, <Node 'child2'>]
grandchild.mp.query_ancestors().all()
# [<Node 'root'>, <Node 'child1'>]
session.query(Node).order_by(Node.mp).all()
# [<Node 'root'>, <Node 'child1'>, <Node 'grandchild'>, <Node 'child2'>]
for node in root.mp.query_descendants(and_self=True):
print ' ' * node.mp_depth, node.name
# root
# child1
# grandchild
# child2
As you can see all sqlamp functionality is accessible via MPManager descriptor (called 'mp' in this example).
Note: Node.mp (a so-called “class manager”) is not the same as node.mp (“instance manager”). Do not confuse them as they are for different purposes and their APIs has no similar. Class manager (see MPClassManager) used to features that are not intended to particular node but for the whole tree: basic setup (mapper extension) and tree-maintainance functions. And an instance managers (MPInstanceManager) are each unique to and bounded to a node. They are implements a queries for a related nodes and other things specific to concrete node. There is also third kind of values that MPManager descriptor may return, see its reference for more info.
sqlamp had borrowed some implementation ideas from django-treebeard. In particular, sqlamp uses the same alphabet (which consists of numeric digits and latin-letters in upper case), sqlamp as like as django-treebeard doesn’t use path parts delimiter — path parts has fixed adjustable length. But unlike django-treebeard sqlamp stores each tree absolutelly stand-alone — two or more trees may (and will) have identical values in path and depth fields and be different only by values in tree_id field. This is the way that can be found in django-mptt.
sqlamp works only on basis of Adjacency Relations. This solution makes data more denormalized but more fault-tolerant. It is able to rebuild all pathes for all trees using only AL data. Also it makes applying sqlamp on existing project easer.
Feel free to email author directly to send bugreports, feature requests, patches or just to say “thanks”! :)
Base class for exceptions in calculations of node’s path.
Maximum children limit is exceeded. Raised during flush.
Maximum depth of nesting limit is exceeded. Raised during flush.
Descriptor for access class-level and instance-level API.
Basic usage is simple:
class Node(object):
mp = sqlamp.MPManager(node_table)
Now there is an ability to get instance manager or class manager via property 'mp' depending on way to access it. Node.mp will return mapper extension till class is mapped, class manager MPClassManager after that and instance_node.mp will return instance_node’s MPInstanceManager. See that classes for more details about their public API.
Changed in version 0.5.1: Previously mapper extension was accessible via class manager’s property.
Parameters: |
|
---|
Warning
Do not change the values of MPManager constructor’s attributes after saving a first tree node. Doing this will corrupt the tree.
There may be three kinds of return values from this getter.
The first one is used when the class which this descriptor is attached to is not yet mapped to any table. In that case the return value is an instance of MPMapperExtension. which is intended to be used as mapper extension.
The second scenario is access to MPManager via mapped class. The corresponding MPInstanceManager‘s instance is returned.
Note
If the nodes of your tree use polymorphic inheritance it is important to know that class manager is accessible only via the base class of inheritance hierarchy.
And the third way is accessing it from the node instance. Attached to that node instance manager is returned then.
Metaclass for declaratively defined node model classes.
New in version 0.5.
See usage example above in Quickstart.
All options that accepts MPManager can be provided with declarative definition. To provide an option you can simply assign value to class’ property with name like __mp_tree_id_field__ (for tree_id_field parameter) and so forth. See the complete list of options in MPManager‘s constructor parameters. Note that you can use only string options for field names, not the column objects.
A special class variable __mp_manager__ should exist and hold a string name which will be used as MPManager descriptor property.
Node class manager. No need to create it by hand: it created by MPManager.
Parameters: |
|
---|
Allows to use instances of MPClassManager directly as argument for sqlalchemy.orm.Query.order_by(). Sort query by tree_id and path fields. Can be used like this (assume that MPManager is attached to class Node and named 'mp'):
query = session.query(Node).filter(root.filter_children())
query.order_by(Node.mp)
The maximum number of children in each node, readonly.
The maximum level of nesting in this tree, readonly.
Query all stored trees.
Parameters: | session – a sqlalchemy Session object to bind a query. |
---|---|
Returns: | Query object with all nodes of all trees sorted as usual by (tree_id, path). |
Perform a complete rebuild of all trees on the basis of adjacency relations.
Drops indexes before processing and recreates it after.
Parameters: | order_by – an “order by clause” for sorting root nodes and a children nodes in each subtree. |
---|
Reset paths for all nodes in subtree defined by root_node_id on the basis of adjacency relations.
Parameters: |
|
---|
A node instance manager, unique for each node. First created on access to MPManager descriptor from instance. Implements API to query nodes related somehow to particular node: descendants, ancestors, etc.
Parameters: |
|
---|
Get a filter condition for node’s descendants.
Requires that node has path, tree_id and depth values available (that means it has “persistent version” even if the node itself is in “detached” state or it is in “pending” state in autoflush-enabled session).
Usage example:
session.query(Node).filter(root.mp.filter_descendants()) \
.order_by(Node.mp)
This example is silly and only shows an approach of using filter_descendants, dont use it for such purpose as there is a better way for such simple queries: query_descendants().
Parameters: | and_self – bool, if set to True self node will be selected by filter. |
---|---|
Returns: | a filter clause applicable as argument for sqlalchemy.orm.Query.filter() and others. |
Get a query for node’s descendants.
Requires that node is in “persistent” state or in “pending” state in autoflush-enabled session.
Parameters: |
|
---|---|
Returns: | a sqlalchemy.orm.Query object which contains only node’s descendants and is ordered by path. |
The same as filter_descendants() but filters children nodes and does not accepts and_self parameter.
The same as query_descendants() but queries children nodes and does not accepts and_self parameter.
The same as filter_descendants() but filters ancestor nodes.
The same as query_descendants() but queries node’s ancestors.
Make a recursive iterator from plain tree nodes sequence (Query instance for example). Generates two-item tuples: node itself and it’s children collection (which also generates two-item tuples...) Children collection evaluates to False if node has no children (it is zero-length tuple for leaf nodes), else it is a generator object.
Parameters: |
|
---|
Can be used when it is simpler to process tree structure recursively. Simple usage example:
def recursive_tree_processor(nodes):
print '<ul>'
for node, children in nodes:
print '<li>%s' % node.name,
if children:
recursive_tree_processor(children)
print '</li>'
print '</ul>'
query = root_node.mp.query_descendants(and_self=True)
recursive_tree_processor(
sqlamp.tree_recursive_iterator(query, Node.mp)
)
If flat_tree is a sqlalchemy.orm.Query instance, it will be ordered by class_manager. If it is plain list, do not forget that such ordering is strictly required for tree_recursive_iterator() to work right.
Warning
Process flat_tree items once and sequentially so works right only if used in depth-first recursive consumer.
Varchar field subtype representing node’s path.
Integer field subtype representing node’s depth level.
Integer field subtype representing node’s tree identifier.
This release contains some backward-incompatible changes in setup facilities. The main highligts are support of declarative SQLAlchemy extension and some cleaning up in MPManager‘s constructor options.
Small fixes in documentation: actually Tropashko was not the first who introduced MP, he only promoted it.
Implemented MPClassManager.query_all_trees()
Fixed a bug of MPClassManager.rebuild_all_trees() did not reset path for root nodes.
Implemented tree_recursive_iterator()
Changed the value of path field for a root nodes. Previously they used to had '0' * steplen path and so first-level children gain '0' * steplen * 2, but now new roots will have an ampty string in their path field. This change should be backward-compatible as it touches only new trees. But if you want to have no difference between two identical old and new trees in your table you can rebuild all your trees by Node.mp.rebuild_all_trees() or use sql query like this:
UPDATE <table> SET mp_path = substr(mp_path, <steplen> + 1);
This will remove steplen characters from the beginning of each node’s path. Do not forget to backup your data!