Q best implementation for hierarchical tree (like directory tree) with databases

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Frank van Nuffe » Wed, 05 Jun 2002 17:52:41



Hi all,

I am looking for implementing a database which represents
a directory tree; i'm still not quite decided as how this should
be modelled for best performance and minimal memory
requirements.  However, i'm quite decided as to use one or
other form of database (table?) for it.

The presupposition is that every (sub-) dir should be an entry
with a primary key and a foreign key as to indicate which
other dir (in that same table?) is the parent dir of it.

I will most probable need scoped indexes for optimal
performance, grouping the sub-dirs per parent, but that's
mostly all the way i ever came with my model.  I'm still
not sure whether it will be performant and memory friendly

Any suggestions?

Thank you

Frank van Nuffel

 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Jerason Bane » Thu, 06 Jun 2002 00:02:05


Well, the most common solution looks like this:

Integer dir_id PRIMARY KEY
Varchar dir_name
Integer parent_id INDEXED

The top level directory would contain 0 in the parent_id field. If I wanted
to find the children of any directory, I would then do a query like this:

select * from directories where parent_id = <whatever the parent's dir_id
is>;

This can be done recursively from code to get all directories, or some
databases (such as Oracle) have a way of traversing from SQL. Check your
database's documentation to see if your database has this function. From
there, you can hang things off of the "directories" in other tables by
referencing the dir_id as a foreign key.

As for speed, this is more than fast enough for most apps. Is there anything
in specific about your app that you feel would need more speed? I use a
directory scheme that you can see at:

http://myrecipes.webhop.biz

Recipes are organized in a directory structure using the exact scheme I
listed above. The recipes themselves hang off of any level of the directory
by referencing the directory id as a foreign key. Hope this helps.

Jerason Banes

--
___________________________________
Need a good Database managment solution?
http://java.dnsalias.com



> Hi all,

> I am looking for implementing a database which represents
> a directory tree; i'm still not quite decided as how this should
> be modelled for best performance and minimal memory
> requirements.  However, i'm quite decided as to use one or
> other form of database (table?) for it.

> The presupposition is that every (sub-) dir should be an entry
> with a primary key and a foreign key as to indicate which
> other dir (in that same table?) is the parent dir of it.

> I will most probable need scoped indexes for optimal
> performance, grouping the sub-dirs per parent, but that's
> mostly all the way i ever came with my model.  I'm still
> not sure whether it will be performant and memory friendly

> Any suggestions?

> Thank you

> Frank van Nuffel



 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Frank van Nuffe » Thu, 06 Jun 2002 00:31:43




Quote:> Well, the most common solution looks like this:

> Integer dir_id PRIMARY KEY
> Varchar dir_name
> Integer parent_id INDEXED

> The top level directory would contain 0 in the parent_id field. If I
wanted
> to find the children of any directory, I would then do a query like this:

> select * from directories where parent_id = <whatever the parent's dir_id
> is>;

> This can be done recursively from code to get all directories, or some
> databases (such as Oracle) have a way of traversing from SQL. Check your
> database's documentation to see if your database has this function. From
> there, you can hang things off of the "directories" in other tables by
> referencing the dir_id as a foreign key.

I see you are using SQL for selections; the database i was thinking of
doesn't support SQL.  However, there's a way of getting the same results
with scoped indexes.  My question about speed would then come to the
issue of limiting the necessary indexes (for they are the only way of having
similar speed performance); i can hardly imagine i would need 1 index
per parent dir

As to what you write about traversing from SQL, what does this mean
precisely?

Quote:> As for speed, this is more than fast enough for most apps. Is there
anything
> in specific about your app that you feel would need more speed?

Is this still true with (ten) thousands of dirs?

Quote:

I use a
> directory scheme that you can see at:

> http://myrecipes.webhop.biz

> Recipes are organized in a directory structure using the exact scheme I
> listed above. The recipes themselves hang off of any level of the
directory
> by referencing the directory id as a foreign key. Hope this helps.

thank you for the reference ( bringing water to my lips ;-)

Frank

 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Jerason Bane » Thu, 06 Jun 2002 00:55:31


Quote:> I see you are using SQL for selections; the database i was thinking of
> doesn't support SQL.  However, there's a way of getting the same results
> with scoped indexes.  My question about speed would then come to the
> issue of limiting the necessary indexes (for they are the only way of
having
> similar speed performance); i can hardly imagine i would need 1 index
> per parent dir

What do you mean? There really isn't any way of getting around the fact that
you need at least one index per node of your tree. It really isn't that bad
tho. Say you have 1000 records with 200 nodes. 200 unique indexes isn't that
bad. And it shouldn't really impact performance either.

Quote:> As to what you write about traversing from SQL, what does this mean
> precisely?

I've never actually tried this, but some databases can somehow traverse the
tree for you and return the records in tree order. It isn't very portable
SQL, but I have heard of people doing this to simpilfy their coding.

Quote:> > As for speed, this is more than fast enough for most apps. Is there
> anything
> > in specific about your app that you feel would need more speed?

> Is this still true with (ten) thousands of dirs?

For the most part, sure. Especially if you only display a couple levels at a
time (i.e. you traverse the directory tree). If you need to query the entire
tree at once, it might take a few seconds depending on your database, but
considering that it isn't much data (say about 300 bytes per record * 10000
= 3000000 bytes or 3 megs) you can actually cache all or part of the data in
memory. Depends on what you want to do .

If you could be a little more specific about what you want to accomplish, I
can possibly be more specific with answers.

Thanks,
Jerason

--
___________________________________
Need a good Database managment solution?
http://java.dnsalias.com

 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Jame » Thu, 06 Jun 2002 01:06:35


Quote:> ... implementing a database which represents a directory tree
> ... modelled for best performance and minimal memory requirements.

See XDb (www.xdb1.com/Benchmark/8.asp).
If there is a faster please provide info.
 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Olivier Gaumo » Thu, 06 Jun 2002 02:36:28


Have a look at this article

http://www.intelligententerprise.com/001020/celko1_1.shtml

Olivier


> Hi all,

> I am looking for implementing a database which represents
> a directory tree; i'm still not quite decided as how this should
> be modelled for best performance and minimal memory
> requirements.  However, i'm quite decided as to use one or
> other form of database (table?) for it.

> The presupposition is that every (sub-) dir should be an entry
> with a primary key and a foreign key as to indicate which
> other dir (in that same table?) is the parent dir of it.

> I will most probable need scoped indexes for optimal
> performance, grouping the sub-dirs per parent, but that's
> mostly all the way i ever came with my model.  I'm still
> not sure whether it will be performant and memory friendly

> Any suggestions?

> Thank you

> Frank van Nuffel


 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Frank van Nuffe » Thu, 06 Jun 2002 21:44:11


Hi James,
----- Original Message -----

Newsgroups: comp.databases
Sent: Tuesday, June 04, 2002 18:06
Subject: Re: Q best implementation for hierarchical tree (like directory
tree) with databases

> > ... implementing a database which represents a directory tree
> > ... modelled for best performance and minimal memory requirements.

> See XDb (www.xdb1.com/Benchmark/8.asp).
> If there is a faster please provide info.

Neat stuff, really.  Is XDb a commercial library?  Could it be used
with CA-Visual Objects?

Thanks anyway for the info

Frank

 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Frank van Nuffe » Thu, 06 Jun 2002 21:45:31


Olivier, i'll have a look into the article

Thanks

Frank

----- Original Message -----

Newsgroups: comp.databases
Sent: Tuesday, June 04, 2002 19:36
Subject: Re: Q best implementation for hierarchical tree (like directory
tree) with databases

> Have a look at this article

> http://www.intelligententerprise.com/001020/celko1_1.shtml

> Olivier

 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by --CELKO » Sat, 08 Jun 2002 10:21:26


Quote:>> I am looking for implementing a database which represents a

directory tree; <<

The usual example of a tree structure in SQL books is called an
adjacency list model and it looks like this:

CREATE TABLE OrgChart
(emp CHAR(10) NOT NULL PRIMARY KEY,
  boss CHAR(10) DEFAULT NULL REFERENCES OrgChart(emp),
  salary DECIMAL(6,2) NOT NULL DEFAULT 100.00);

OrgChart
emp       boss      salary
===========================
'Albert' 'NULL'    1000.00
'Bert'    'Albert'   900.00
'Chuck'   'Albert'   900.00
'Donna'   'Chuck'    800.00
'Eddie'   'Chuck'    700.00
'Fred'    'Chuck'    600.00

Another way of representing trees is to show them as nested sets.
Since SQL is a set oriented language, this is a better model than the
usual adjacency list approach you see in most text books. Let us
define a simple OrgChart table like this, ignoring the left (lft) and
right (rgt) columns for now. This problem is always given with a
column for the employee and one for his boss in the textbooks. This
table without the lft and rgt columns is called the adjacency list
model, after the graph theory technique of the same name; the pairs of
emps are adjacent to each other.

CREATE TABLE OrgChart
(emp CHAR(10) NOT NULL PRIMARY KEY,
  lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
  rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
  CONSTRAINT order_okay CHECK (lft < rgt) );

OrgChart
emp         lft rgt
======================
'Albert'      1   12
'Bert'        2    3
'Chuck'       4   11
'Donna'       5    6
'Eddie'       7    8
'Fred'        9   10

The organizational chart would look like this as a directed graph:

            Albert (1,12)
            /        \
          /            \
    Bert (2,3)    Chuck (4,11)
                   /    |   \
                 /      |     \
               /        |       \
             /          |         \
        Donna (5,6)  Eddie (7,8)  Fred (9,10)

The first table is denormalized in several ways. We are modeling both
the OrgChart and the organizational chart in one table. But for the
sake of saving space, pretend that the names are job titles and that
we have another table which describes the OrgChart that hold those
positions.

Another problem with the adjacency list model is that the boss and
employee columns are the same kind of thing (i.e. names of OrgChart),
and therefore should be shown in only one column in a normalized
table.  To prove that this is not normalized, assume that "Chuck"
changes his name to "Charles"; you have to change his name in both
columns and several places. The defining characteristic of a
normalized table is that you have one fact, one place, one time.

The final problem is that the adjacency list model does not model
subordination. Authority flows downhill in a hierarchy, but If I fire
Chuck, I disconnect all of his subordinates from Albert. There are
situations (i.e. water pipes) where this is true, but that is not the
expected situation in this case.

To show a tree as nested sets, replace the emps with ovals, then nest
subordinate ovals inside each other. The root will be the largest oval
and will contain every other emp. The leaf emps will be the innermost
ovals with nothing else inside them and the nesting will show the
hierarchical relationship. The rgt and lft columns (I cannot use the
reserved words LEFT and RIGHT in SQL) are what shows the nesting.

If that mental model does not work, then imagine a little worm
crawling anti-clockwise along the tree. Every time he gets to the left
or right side of a emp, he numbers it. The worm stops when he gets all
the way around the tree and back to the top.

This is a natural way to model a parts explosion, since a final
assembly is made of physically nested assemblies that final break down
into separate parts.

At this point, the boss column is both redundant and denormalized, so
it can be dropped. Also, note that the tree structure can be kept in
one table and all the information about a emp can be put in a second
table and they can be joined on employee number for queries.

To convert the graph into a nested sets model think of a little worm
crawling along the tree. The worm starts at the top, the root, makes a
complete trip around the tree. When he comes to a emp, he puts a
number in the cell on the side that he is visiting and increments his
counter. Each emp will get two numbers, one of the right side and one
for the left. Computer Science majors will recognize this as a
modified preorder tree traversal algorithm. Finally, drop the unneeded
OrgChart.boss column which used to represent the edges of a graph.

This has some predictable results that we can use for building
queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
FROM TreeTable)); leaf emps always have (left + 1 = right); subtrees
are defined by the BETWEEN predicate; etc. Here are two common queries
which can be used to build others:

1. An employee and all their Supervisors, no matter how deep the tree.

SELECT O2.*
   FROM OrgChart AS O1, OrgChart AS O2
  WHERE O1.lft BETWEEN O2.lft AND O2.rgt
    AND O1.emp = :myemployee;

2. The employee and all subordinates. There is a nice symmetry here.

SELECT O1.*
   FROM OrgChart AS O1, OrgChart AS O2
  WHERE O1.lft BETWEEN O2.lft AND O2.rgt
    AND O2.emp = :myemployee;

3. Add a GROUP BY and aggregate functions to these basic queries and
you have hierarchical reports. For example, the total salaries which
each employee controls:

SELECT O2.emp, SUM(S1.salary)
   FROM OrgChart AS O1, OrgChart AS O2,
        Salaries AS S1
  WHERE O1.lft BETWEEN O2.lft AND O2.rgt
    AND O1.emp = S1.emp
  GROUP BY O2.emp;

4. To find the level of each emp, so you can print the tree as an
indented listing.

DECLARE Out_Tree CURSOR FOR
SELECT O1.lft, COUNT(O2.emp) AS indentation, O1.emp
   FROM OrgChart AS O1, OrgChart AS O2
  WHERE O1.lft BETWEEN O2.lft AND O2.rgt
  GROUP BY O1.emp
  ORDER BY O1.lft;

5. The nested set model has an implied ordering of siblings which the
adjacency list model does not. To insert a new emp as the rightmost
sibling.

UPDATE OrgChart
   SET lft = lft + 2,
       rgt = rgt + 2
WHERE rgt >= (SELECT rgt -- right_most_sibling
                 FROM OrgChart
                WHERE emp = :your_boss);

INSERT INTO OrgChart (emp, lft, rgt)
VALUES ('New Guy', right_most_sibling, (right_most_sibling + 1))
END;

6. To convert a nested sets model into an adjacency list model:

SELECT B.emp AS boss, P.emp
  FROM OrgChart AS P
       LEFT OUTER JOIN
       OrgChart AS B
       ON B.lft
          = (SELECT MAX(lft)
               FROM OrgChart AS S
              WHERE P.lft > S.lft
                AND P.lft < S.rgt);

For details, see the chapter in my book JOE CELKO'S SQL FOR SMARTIES
(Morgan-Kaufmann, 1999, second edition)

http://searchdatabase.techtarget.com/tip/1,289483,sid13_gci537290,00....

http://searchdatabase.techtarget.com/tip/1,289483,sid13_gci801943,00....

 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Frank van Nuffe » Sat, 08 Jun 2002 16:35:58


Impressive article, man!  It'll take some extra reading and
re-reading before the theory exposed will penetrate my
gray cells.  Does your book explore more about these
concepts?  And finally, a small question: what if the worm
eats all of my data? ;-)  No, really, SQL theoremes at
their best, i'd guess!  Are you a technical writer?

Frank

 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by 9rowzn01i00 » Sat, 08 Jun 2002 18:01:27




Quote:> Impressive article, man!  It'll take some extra reading and
> re-reading before the theory exposed will penetrate my
> gray cells.  Does your book explore more about these
> concepts?  And finally, a small question: what if the worm
> eats all of my data? ;-)  No, really, SQL theoremes at
> their best, i'd guess!  Are you a technical writer?

                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Quote:

> Frank

Celko is one of _the_ gurus on SQL :-)

--
akmal chaudhri

IBM developerWorks - http://www.ibm.com/developerWorks/
XMLDatabases ------- http://www.btinternet.com/~xmldatabases/

 
 
 

Q best implementation for hierarchical tree (like directory tree) with databases

Post by Frank van Nuffe » Sat, 08 Jun 2002 18:06:32


Hi again Joe,

though i'm struggling with SQL syntax, there are major issues
i'm enthusiastic about;

Quote:> Another problem with the adjacency list model is that the boss and
> employee columns are the same kind of thing (i.e. names of OrgChart),
> and therefore should be shown in only one column in a normalized
> table.  To prove that this is not normalized, assume that "Chuck"
> changes his name to "Charles"; you have to change his name in both
> columns and several places. The defining characteristic of a
> normalized table is that you have one fact, one place, one time.

your set representation in one table with only one field denominating
emps (to be dir names) is precisely what i was considering for my
directory tree implementation; it is good representation, specifically
in oo context, for all entries would ultimately represent directories,
right?

Quote:> The first table is denormalized in several ways. We are modeling both
> the OrgChart and the organizational chart in one table. But for the
> sake of saving space, pretend that the names are job titles and that
> we have another table which describes the OrgChart that hold those
> positions.

Do you mean that i have to use another table with a 1-1 relational
projection into the main table, enlisting all necessary attribute fields
in order to minimize space taken up by OrgChart (to be the main
dir tree table in my case)?

Quote:> The final problem is that the adjacency list model does not model
> subordination. Authority flows downhill in a hierarchy, but If I fire
> Chuck, I disconnect all of his subordinates from Albert. There are
> situations (i.e. water pipes) where this is true, but that is not the
> expected situation in this case.

My dir tree would be more like a water pipe, however your argument
here is strong; good thinking!

Quote:> To show a tree as nested sets, replace the emps with ovals, then nest
> subordinate ovals inside each other. The root will be the largest oval
> and will contain every other emp. The leaf emps will be the innermost
> ovals with nothing else inside them and the nesting will show the
> hierarchical relationship. The rgt and lft columns (I cannot use the
> reserved words LEFT and RIGHT in SQL) are what shows the nesting.

more or less coming along, ok

Quote:> If that mental model does not work, then imagine a little worm
> crawling anti-clockwise along the tree. Every time he gets to the left
> or right side of a emp, he numbers it. The worm stops when he gets all
> the way around the tree and back to the top.

here i'm not quite with you, but i understand the numbering must result
in pairs, especially for bosses, to represent an interval which all
subordinates will fall in between; however, the exact way of assigning
the lft and rgt column integers still doesn't revele completely, or?

Quote:> At this point, the boss column is both redundant and denormalized, so
> it can be dropped. Also, note that the tree structure can be kept in
> one table and all the information about a emp can be put in a second
> table and they can be joined on employee number for queries.

you were referring to the boss column from the original adjacency list
representation of OrgChart, right? ok, 1-1 relation with an extra attributes
table, as stated above

Quote:> To convert the graph into a nested sets model think of a little worm
> crawling along the tree. The worm starts at the top, the root, makes a
> complete trip around the tree. When he comes to a emp, he puts a
> number in the cell on the side that he is visiting and increments his
> counter. Each emp will get two numbers, one of the right side and one
> for the left. Computer Science majors will recognize this as a
> modified preorder tree traversal algorithm. Finally, drop the unneeded
> OrgChart.boss column which used to represent the edges of a graph.

two questions here: would the worm jump from sibling leaf to leaf, or
doesn't it matter he's always making a passage back at the parent node
to visit the siblings?  I mean that when he passes the node over and over
he wouldn't fill in the right column of the node, only then when he's
finished with all the node's leafs?

other question: why numbering lft and rgt sides of a leaf with two different
consecutive numbers?  why not just using the same number for they are
leafs anyway?  it would break the calculation in

Quote:> The root is always (left = 1, right = 2 * (SELECT COUNT(*)
> FROM TreeTable))

is that the reason?

Finally, i'll have to study the query examples, and try to convert them
to accustomed VO syntax, but with some SQL literature this wouldn't
be too difficult.  Any good and simple SQL instruction book known by
you?

Thank you very much so far, the technique shown is most welcome!

Regards

Frank