MEDOR User Guide

This user guide is under construction.

This user guide should help a novice MEDOR user get familiar with the MEDOR concepts. More detailed documentation is available from the MEDOR JavaDoc.

Table of contents

  1. Introduction
  2. Data structures
  3. Expressing a query
  4. Optimizing a query
  5. Compiling a query
  6. Evaluating a query
  7. Extending MEDOR

Introduction

MEDOR is a query framework. Its goal is the expression, optimization, compilation and evaluation of queries on data coming from distributed heterogeneous data sources.

Even though it can be used directly by a final user, it is intended to be used for building middleware which require query capacities. MEDOR personalities include EJB 2.0, with the JOnAS EJB server, and JDO, with the Speedo implementation.

MEDOR is in particular integrated with the persistence framework JORM. With regards to JORM, it is meant to be a complement for querying data made persistent through JORM.

The MEDOR user typically performs the following tasks:

  1. Express a query

    This is done using a programmatic algebra. A MEDOR query is represented as a query tree, which implements the QueryTree interface.

  2. Ask MEDOR to optimize the query

    MEDOR can optimize the QueryTree using various optimization rules, and produces an optimized QueryTree.

  3. Ask MEDOR to compile the optimized query

    This optimized QueryTree is compiled prior to evaluation. This compilation may involve various steps such as indexes generation, bytecode generation, etc.

  4. Ask MEDOR to evaluate the compiled query

    The evaluation of a query returns a TupleCollection, which gives access to the query results in a way similar to a JDBC ResultSet.

Data structures

MEDOR itself considers typed structured data. The typed structure is represented by a TupleStructure (see interface JavaDoc), which is composed of a list of named and typed Fields (see interface JavaDoc).

A TupleStructure is typically associated to a QueryTree (the basic structure in a query) or to a TupleCollection (the result of query evaluation).

In the case of JORM classes, the JORM meta schema implicitly defines corresponding MEDOR TupleStructures.

Expressing a query

A MEDOR query is not expressed using a query language, but progammatically using an algebra.

The main interface is QueryTree (see interface JavaDoc). In general, a QueryTree is an abstraction which represents data in the form of Tuples (see interface JavaDoc). As such, each QueryTree has an associated TupleStructure.

The QueryTree interface is specialized in QueryNode and QueryLeaf (see package JavaDoc). A QueryLeaf encapsulates data typically coming from an underlying data store, whereas a QueryNode represents an algebraic operation working on data represented by other QueryTrees.

A QueryTree has an associated TupleStructure, representing the fields (the data typing information) of this QueryTree (see Defining fields).

The leaves of the query tree

The leaves represent data available either directly as memory data, or encapsulate access to an underlying data store.

An example of a memory data leaf is the MedorTCQueryLeaf (org.objectweb.medor.query.lib.MedorTCQueryLeaf).

Leaves accessing relational data stores are the RdbExpQueryLeaf and RdbStringQueryLeaf (in org.objectweb.medor.query.rdb.lib).  RdbStringQueryLeaves allow the encapsulation of a SQL query expressed as a String. They are terminal in the sense that MEDOR will not attempt to rewrite such an RdbStringQueryLeaf in its optimization process. Instead, RdbExpQueryLeaves allow the encapsulation of a SQL query expressed in an abstract manner using QualifiedTables and Expressions. They is optimizable: several RdbExpQueryLeaves working on the same DataStore (in org.objectweb.medor.datasource.api) can be grouped by the optimizer, and transformed into a single RdbStringQueryLeaf for the evaluation; through this optimization process, MEDOR can delegate part of the query evaluation to the underlying database.

Other means of accessing existing data are the JormExtent and GenClassExtent (in org.objectweb.medor.query.jorm.lib). They represent data coming from JORM classes and GenClasses. Several constructors exist for JormExtent, leading to a QueryLeaf with either all Fields of the associated JORM class, only the PName, or some chosen fields.

Query nodes

A QueryNode can be built on one or several other QueryTrees (which are either QueryLeaves or QueryNodes), called its children. Depending on the type and constructor of the QueryNode, the children either explicit in the constructor of the QueryNode, or are implicitly derived from other parameters in the constructors, or from later added Propagated- or ConstructedFields, or from the filter Expression associated to the QueryNode (see table below).

A QueryNode defines its fields. Those fields are typically projected from existing fields of the children QueryTrees, or can be calculated (see Defining fields).

A QueryNode typically has an associated filter. This filter is used to define the conditions under which data from underlying children QueryTrees are considered as data of the current QueryNode.

Doing the parallel with SQL, the children represent the "FROM" part of the query, the filter represents the "WHERE" part of the query, and the fields represent the "SELECT" part of the query.

There are several predefined types of QueryNodes (in org.objectweb.medor.query.lib).

Node Description Children
Cartesian Cartesian product between tuples of two QueryTrees. The evaluation result will return all tuples which are the concatenation of one tuple from the left QueryTree with one tuple from the right QueryTree. explicit in constructor
Intersection Represents the intersection of two QueryTrees. The evaluation result will contain only those tuples which are in both QueryTrees. Both QueryTrees must have compatible TupleStructure. explicit in constructor
JoinProject Represents a join and project operation between two QueryTrees. The join condition is expressed as a filter. The Fields to project are described using the addPropagatedField and addCalculatedField methods (see Defining fields). derived from added fields and filter
Nest Nesting operation. This is equivalent to the GROUP BY statements of SQL. See the Nest and Unnest section for more details. derived from fields in constructor
Project Simple projection. It only works with one underlying QueryTree. The Fields to project are described using the addPropagatedField and addCalculatedField methods (see Defining fields). derived from added fields and filter
SelectProject Simple projection with a Filter. It is the extension of Project with an associated filter. derived from added fields and filter
Union Union of two QueryTrees. Like for Intersection, both QueryTrees must have compatible TupleStructure. explicit in constructor
Unnest Unnesting operation. This is the reverse of the Nest operation. See the Nest and Unnest section for more details. derived from fields in constructor

In the case of JORM classes, MEDOR also provides a utility class, NavigatorNodeFactory (in org.objectweb.medor.query.lib), to ease the construction of QueryNodes performing navigation operations. The QueryTree produced by the NavigatorNodeFactory methods can be constructed "by hand"; the factory is provided as a facility. Three methods are available, which take a NavigatorOperator as input. In all cases, the result is a QueryTree, which contains all fields of the QueryTree hosting the start Field of the path expression, and the field reached by the end of the path expression. If this latter field is a GenClass, the GenClass element is projected.

  1. getNavigatorNode. This is the basic constructor for path traversal.
  2. getIn. In this case, the end of the path expression must be a GenClass. An alias name is given for the name of the Field corresponding to the GenClass element.
  3. getMemberOf. This method also takes as input a FieldOperand. The end of the path expression must be a GenClass. The constructed QueryTree adds a filter checking that the field represented by the FieldOperand is equal to the GenClass element.

Filters

Filters are expressed using Expressions (in org.objectweb.medor.filter.api). Expressions are constructed from Operators and Operands.

MEDOR comes with a set of predefined operators and operands, as described in the figure below.

expression.jpg

We now detail the ParameterOperand and the FieldOperand.

ParameterOperand

A ParameterOperand allows the use of a variable for which the value in unknown at the time the query is being constructed. The actual value of the parameter will be passed as an argument to the MEDOR evaluator. This value will be used at evaluation time.

FieldOperand

A FieldOperand represents a pointer to an existing Field. For a given QueryNode, the associated filter may not manipulate FieldOperands built on Fields of its own TupleStructure, but only on Fields of other QueryTrees. One noticeable exception is that of RdbExpQueryLeaves, where the associated Expression manipulates fields of the associated RdbExpQueryLeaf.

NavigatorOperator

A NavigatorOperator is used in the JORM context to express a path traversal. A path traversal can be represented by f1.f2...fn, where f1, ..., fn-1 are fields of type reference to another class, and fn can either be a single-valued field (whether a RefClass or a primitive type), or a reference to a GenClass. An example of a path traversal can be department.manager.projects, representing the projects under the responsibility of the department manager.

Defining fields

QueryTree fields can be of different nature. For QueryLeaves, the fields are defined when creating the leave. For QueryNodes, a field can be defined either as:

For JoinProject, Project and SelectProject, the Fields which are present in the resulting QueryNode are defined using the addPropagatedField and addCalculatedField methods.

In the case of Union, Cartesian and Intersection, the fields are implicitly defined.

For Nest and Unnest, see the next section.

Nest and Unnest

Nest produces a new QueryTree where some fields are grouping fields (the GROUP BY statement): for all tuples sharing the same value of those grouping fields, the remaining fields are grouped into a new Field of type TupleCollection.

Simple example

The following figure describes a simple example with two QueryLeaves, emp and dept. The TupleStructure of emp contains fields named name and dept, that of dept fields named deptName and town. QueryNode worksIn is a JoinProject, built upon those two QueryLeaves. Its TupleStructure has two PropagatedFields, named name and town, respectively coming from name of emp and town of dept. QN also has a filter, which specifies that dept of emp should be equal to deptName of dept.

example.jpg

The tables below shows the result of evaluating QN given data for QL1 and QL2.

emp
name dept
paul sales
john mkt
mary sales
mark toys
lucy accounts
dept
deptName town
sales Paris
toys L.A.
accounts London
worksIn
name town
paul Paris
mary Paris
mark L.A.
lucy London

Optimizing a query

At optimization time, MEDOR uses a QueryRewriter (in org.objectweb.medor.optim.api) to transform the input QueryTree, applying various RewriteRules. The order in which the rules are applied is determined by a RuleConfiguration (in org.objectweb.medor.optim.api). For QueryLeaves, the LeafRewriteRule interface extends the RewriteRule interface, and relies on LeafRewriters which are capable of rewriting certain types of QueryLeaves.

The rules are responsible for various types of optimization. Below is the current list of rewrite rules, in the typical order in which they should be applied (this order is defined in the BasicQueryRewriter (org.objectweb.medor.optim.lib) and its extension, the JormQueryRewriter (in org.objectweb.medor.optim.jorm).

rule package description
PushNotInExpressionRule org.objectweb.medor.optim.lib Manipulates filters only (no changes to the QueryTree structure). Pushes negations inside the expression. For example, !(a > b) is replaced by (a <= b)
PushSelectionRule org.objectweb.medor.optim.lib Pushes a selection as far down as possible into its children nodes. The goal is to perform a selection on a Field as early as possible, i.e., as low in the QueryTree as possible.
DropUnusedProjFieldsRule org.objectweb.medor.optim.lib Drops fields which are projected in intermediary QueryNodes but which are not used by the parent QueryNode.
DropUselessNodeRule org.objectweb.medor.optim.lib Eliminates nodes which do not do any selection (no filter), and for which all fields are propagated fields with one ancestor only. Such nodes may appear after the previous PushSelection and DropUnusedProjFields rules have been applied.
GroupSameDBRule org.objectweb.medor.optim.rdb For RdbExpQueryLeaves, groups together as a single RdbExpQueryLeaf all leaves which operate on the same DataStore.
JormAssignMapperRule org.objectweb.medor.optim.jorm Assigns a mapper to each JORM class.
JormLeafRewriterRule org.objectweb.medor.optim.jorm Rewrites JORM leaves into the corresponding QueryLeaves on the underlying data store. For example, if a JORM class is mapped to a relational database, the corresponding class extent will be rewritten into an RdbExpQueryLeaf.
JormGoUpDecodeRule org.objectweb.medor.optim.jorm Moves the construction of the PName as far up in the QueryTree as possible. The reason behind this rule is as follows. When transforming JORM extents into QueryLeaves,  JORM PNames are replaced with the construction of the PName using the CompositePName and SinglePName operators (in org.objectweb.medor.filter.jorm.lib). By moving the construction of the PName up in the QueryTree, and replacing PName comparisons with the comparision of the underlying Fields, this will allow the delegation of larger parts of a QueryTree to the underlying data store.

Note that the BasicQueryRewriter (in org.objectweb.medor.optim.lib), which is both a QueryRewriter and a RuleConfiguration, contains a predefined combination of rules.

[TODO: add an example here showing a complete rewrite, or point to a complete example, to be written...]

Compiling a query

Evaluating a query

The result of a query evaluation is a TupleCollection (org.objectweb.medor.tuple.api.TupleCollection). It represents a collection of structured typed Tuples (org.objectweb.medor.tuple.api.Tuple). This typed structure is represented by a TupleStructure (org.objectweb.medor.api.TupleStructure).

Extending MEDOR

This manual dated 11 October 2002

Author: Alexandre Lefebvre