How to best represent dimensional data

How to best represent dimensional data

Post by Alexander Stau » Mon, 20 Sep 1999 04:00:00



Boy, do I wish MSSQL and Sybase supported dimensional data types. It
would make this so much easier.

I have something akin to a company table. It only exists for search
purposes; that is, it does not contain anything but search data -- the
important stuff is elsewhere, and that's not what I'm having problems
with. I have columns like "country", "location", "name", "fields", and
so forth.

Unfortunately, companies exist in multiple countries, multiple
locations, and often span multiple fields. Effectively, each field is a
list of strings.

So far what I've done is define a master table, like so:

Company      Country         Location
-------      -------         --------
Xyzzy        USA             New York;Los Angeles
Xyzzy        France          Paris;Marseilles

This isn't conveniently searchable in SQL terms -- unless I use "LIKE",
which isn't fast, and speed is paramount in my case -- so I have a
generator script that expands the master table into a "search table"
like so:

Company      Country         Location
-------      -------         --------
Xyzzy        USA             New York
Xyzzy        USA             Los Angeles
Xyzzy        France          Paris
Xyzzy        France          Marseilles

...which is easy to search. Needless to say, the data grows huge.
Really, really huge. A company may have hundreds of outlets. For every
"dimensional" column that I add a value to, the number of rows for that
company could double. There is a lot of redundant data here. To optimize
the storage size I could reduce the columns to integer IDs that I'd
resolve through multiple lookup tables, but this feels like it would be
a pretty clunky system.

Suggestions?

--
Alexander Staubo             http://www.mop.no/~alex/
"Give me an underground laboratory, half a dozen atom smashers and a
beautiful girl in a diaphanous veil waiting to be turned into a
chimpanzee, and I care not who writes the nation's laws."
--S. J. Perelman

 
 
 

How to best represent dimensional data

Post by BPMargoli » Mon, 20 Sep 1999 04:00:00


Alexander,

To quote you, "speed is paramount in my case", and it sounds as if you do
not have performance issues, but rather are concerned about storage space.

There is a saying in database circles (and elsewhere): "Cost, Speed,
Reliability - pick any two". It would appear that you have optimized for
speed and reliability, and are concerned about cost (excess disk space).
Well, most sites that I have worked at have concluded that "disk space is
cheap" and have no problem purchasing extra disks if it will resolve either
speed or reliability issues.

Your approach sounds relatively standard to me, and I can't offer any real
improvements.

Perhaps the following will help chase away your concerns. Databases could be
significantly smaller (less disk space) if we'd only eliminate all indexes.
As a matter of fact, indexes are not a necessity in RDBMS to the extent that
the ANSI SQL-92 standard never once mentions indexing. Indeed, it does not
even have a definition for the common CREATE INDEX command found, to the
best of my knowledge, in all RDBMS.

Well, if ANSI doesn't require indexes, and they take so much space, why do
we have them. Simple: speed and reliability. Indexing, when done properly,
supports much quicker accessing of data. Indexing also supports a very
convenient and reliable method for defining primary keys, which by
definition have to be unique. Notice that the requirement is that a table
have a primary key and that it enforces uniqueness. The ***implementation***
of this condition is left up to the developers of the RDBMS. The standard
and accepted technique is via indexing, but it doesn't have to be.
Uniqueness ***could*** be enforced via table locking and table scans,
although performance would suffer significantly.

The point of all this, is that from what you have outlined, you have done a
very fine job, and should be congratulated, rather than suffering pangs of
"can I do this better".

With all that, to follow along the lines of what you said in your post,
consider a surrogate key for the Company/Country columns. Although to be
honest, if, once again, performance is your primary goal, the
denormalization of data represented by the repeating groups of
Company/Country is probably the better approach.


Quote:> Boy, do I wish MSSQL and Sybase supported dimensional data types. It
> would make this so much easier.

> I have something akin to a company table. It only exists for search
> purposes; that is, it does not contain anything but search data -- the
> important stuff is elsewhere, and that's not what I'm having problems
> with. I have columns like "country", "location", "name", "fields", and
> so forth.

> Unfortunately, companies exist in multiple countries, multiple
> locations, and often span multiple fields. Effectively, each field is a
> list of strings.

> So far what I've done is define a master table, like so:

> Company      Country         Location
> -------      -------         --------
> Xyzzy        USA             New York;Los Angeles
> Xyzzy        France          Paris;Marseilles

> This isn't conveniently searchable in SQL terms -- unless I use "LIKE",
> which isn't fast, and speed is paramount in my case -- so I have a
> generator script that expands the master table into a "search table"
> like so:

> Company      Country         Location
> -------      -------         --------
> Xyzzy        USA             New York
> Xyzzy        USA             Los Angeles
> Xyzzy        France          Paris
> Xyzzy        France          Marseilles

> ...which is easy to search. Needless to say, the data grows huge.
> Really, really huge. A company may have hundreds of outlets. For every
> "dimensional" column that I add a value to, the number of rows for that
> company could double. There is a lot of redundant data here. To optimize
> the storage size I could reduce the columns to integer IDs that I'd
> resolve through multiple lookup tables, but this feels like it would be
> a pretty clunky system.

> Suggestions?

> --
> Alexander Staubo             http://www.mop.no/~alex/
> "Give me an underground laboratory, half a dozen atom smashers and a
> beautiful girl in a diaphanous veil waiting to be turned into a
> chimpanzee, and I care not who writes the nation's laws."
> --S. J. Perelman


 
 
 

How to best represent dimensional data

Post by Alexander Stau » Mon, 20 Sep 1999 04:00:00



Quote:> To quote you, "speed is paramount in my case", and it sounds as if you do
> not have performance issues, but rather are concerned about storage space.

> There is a saying in database circles (and elsewhere): "Cost, Speed,
> Reliability - pick any two". It would appear that you have optimized for
> speed and reliability, and are concerned about cost (excess disk space).
> Well, most sites that I have worked at have concluded that "disk space is
> cheap" and have no problem purchasing extra disks if it will resolve either
> speed or reliability issues.

Very true.

While I agree that disk space is cheap, however, we are facing a
potentially large amount of data. (I won't say how large, simply because
I don't know. But 10,000 companies distributed over many countries and
cities and fields is, I think, not unrealistic.) My concern is that the
size of the data will eventually render even the "disk space is cheap"
argument (a sentiment to which I subscribe, very much) somewhat silly.
My second concern is that data size affects performance -- the trueness
of which is certainly determined by the hardware applied. For example,
it would be detrimental to performance if the tables do not fit entirely
in RAM.

My second concern is that of cleanness. Generating an "optimized" table
-- in lack of a better name -- is, to me, a dirty solution to an elegant
problem. :)

Performance _is_ paramount in my case. Data mass and size tend to go
hand in hand at least part of the way. An extension of my second concern
is that the algorithm/design itself is inefficient, and that potential
speed gains can be derived from improving the design.

Quote:> Your approach sounds relatively standard to me, and I can't offer any real
> improvements.

Isn't normalized tables really the norm? It's clunky, I think, and I
have no seen proof -- theoretical or practical -- that it's more
efficient, but this is the kind of design I see everywhere. Or maybe
it's just SQL columns in computer rags. Hmm.

Quote:> Perhaps the following will help chase away your concerns. Databases could be
> significantly smaller (less disk space) if we'd only eliminate all indexes.
> As a matter of fact, indexes are not a necessity in RDBMS to the extent that
> the ANSI SQL-92 standard never once mentions indexing. Indeed, it does not
> even have a definition for the common CREATE INDEX command found, to the
> best of my knowledge, in all RDBMS.

[snip]

Indexes is a must. My guess is that this is definitely a case for
Sybase/MSSQL clustered indexes.

Quote:> The point of all this, is that from what you have outlined, you have done a
> very fine job, and should be congratulated, rather than suffering pangs of
> "can I do this better".

You know, I value this comment very much. Pangs of "can I do this
better" have occurred. Several times.

Oh, the joys of relational database design!

Quote:> With all that, to follow along the lines of what you said in your post,
> consider a surrogate key for the Company/Country columns. Although to be
> honest, if, once again, performance is your primary goal, the
> denormalization of data represented by the repeating groups of
> Company/Country is probably the better approach.

Can you explain this surrogate key concept? I'm not familiar with the
term.

Thanks for an intelligent response, as usual.

--
Alexander Staubo             http://www.mop.no/~alex/
"Give me an underground laboratory, half a dozen atom smashers and a
beautiful girl in a diaphanous veil waiting to be turned into a
chimpanzee, and I care not who writes the nation's laws."
--S. J. Perelman

 
 
 

How to best represent dimensional data

Post by Anthony Mandi » Tue, 21 Sep 1999 04:00:00



> Boy, do I wish MSSQL and Sybase supported dimensional data types. It
> would make this so much easier.

        Dunno about MSSQL (but then who does?) but Sybase do have a
        spatial data add-on. But this isn't what you want according
        to your subsequent description. "dimensional data types" is
        probably a poor choice of description.

Quote:> Unfortunately, companies exist in multiple countries, multiple
> locations, and often span multiple fields. Effectively, each field is a
> list of strings.
...
> ...which is easy to search. Needless to say, the data grows huge.
> Really, really huge. A company may have hundreds of outlets. For every
> "dimensional" column that I add a value to, the number of rows for that
> company could double. There is a lot of redundant data here. To optimize
> the storage size I could reduce the columns to integer IDs that I'd
> resolve through multiple lookup tables, but this feels like it would be
> a pretty clunky system.

        The usual solution would be to normalise the data. One table
        with the unique company data and one table with the unique
        location data and a join table with the primary keys from
        both tables for each combination. This join table will grow
        huge, as you put it, but if the keys are kept to small numeric
        or interger data types, the table won't be overly large.
        Whether this is pretty clunky or not is a subjective issue
        that would be of no concern to the end user since the actual
        data model's physical structure is hidden from them.

-am

 
 
 

How to best represent dimensional data

Post by Chris Case » Wed, 22 Sep 1999 04:00:00


<snip>

Quote:>There is a saying in database circles (and elsewhere): "Cost, Speed,
>Reliability - pick any two". >#

<snip lots>

Please be aware that this saying should be applied in general to Relational
databases only.
Take a look at a post-relational database like Cache (www.intersys.com) and
you will find that you can have all three.
By the way - to answer the original poster (why can't Sybase etc...), forget
Sybase, if you want multi-dimensional and high performance take a look at
the url above.

 
 
 

How to best represent dimensional data

Post by Alexander Stau » Wed, 22 Sep 1999 04:00:00





> <snip>
> >There is a saying in database circles (and elsewhere): "Cost, Speed,
> >Reliability - pick any two". >#
> <snip lots>

> Please be aware that this saying should be applied in general to Relational
> databases only.
> Take a look at a post-relational database like Cache (www.intersys.com) and
> you will find that you can have all three.
> By the way - to answer the original poster (why can't Sybase etc...), forget
> Sybase, if you want multi-dimensional and high performance take a look at
> the url above.

Nice plug.

If you can get me a database connector for Zope (http://www.zope.org/)
on Linux, I'll certainly consider it.

I don't much like relational databases (that includes Sybase), but I
have not found a replacement, and neither has the rest of the world, it
seems.

--
Alexander Staubo             http://www.mop.no/~alex/
"What the hell, he thought, you're only young once, and threw
himself out of the window. That would at least keep the element of
surprise on his side."
--Douglas Adams, _The Hitchhiker's Guide to the Galaxy_

 
 
 

How to best represent dimensional data

Post by Chris Case » Thu, 23 Sep 1999 04:00:00






>> <snip>
>> >There is a saying in database circles (and elsewhere): "Cost, Speed,
>> >Reliability - pick any two". >#
>> <snip lots>

>> Please be aware that this saying should be applied in general to
Relational
>> databases only.
>> Take a look at a post-relational database like Cache (www.intersys.com)
and
>> you will find that you can have all three.
>> By the way - to answer the original poster (why can't Sybase etc...),
forget
>> Sybase, if you want multi-dimensional and high performance take a look at
>> the url above.

>Nice plug.

>If you can get me a database connector for Zope (http://www.zope.org/)
>on Linux, I'll certainly consider it.

>I don't much like relational databases (that includes Sybase), but I
>have not found a replacement, and neither has the rest of the world, it
>seems.

This implies that you didn't visit the url or didn't read the material.
This stuff is real and is used by some VERY big rest of the world companies
including mine (Yes, I am  a real user not an employee of the vendor).
If you want a relational view then you can have it with this and it will
still give you the three corners of the triangle. If you want to get better
still then you look at the underlying scripting opportunities.
Also if you want to Webify then this makes it simple.
 
 
 

How to best represent dimensional data

Post by Alexander Stau » Thu, 23 Sep 1999 04:00:00





[snip]
> This stuff is real and is used by some VERY big rest of the world companies
> including mine (Yes, I am  a real user not an employee of the vendor).
> If you want a relational view then you can have it with this and it will
> still give you the three corners of the triangle. If you want to get better
> still then you look at the underlying scripting opportunities.
> Also if you want to Webify then this makes it simple.

It sounds spiffy, and I've downloaded an evaluation copy to play around
with, but I'm a bit wary.

The last thing I want is tying myself to something proprietary. (At
least with Sybase or MSSQL I can move with relative ease to another ANSI
SQL-compliant system.) Or something that insists you do business logic
and application development in it. I'm not a fan of stored procedures or
any of the stuff that graft programming logic into the database server.
Furthermore I am looking for something that can do the third tier
without messing with the other tiers, and without forcing me to tie
myself to Java, DCOM, CORBA, or a proprietary scripting language.
Relational databases are useful because they're relatively opaque
storage systems. How does Cach compare to that?

And before I go on, what does Cach cost?

--
Alexander Staubo             http://www.mop.no/~alex/
"He could open a tin of sardines with his teeth, strike a Swan
Vestas on his chin, rope steers, drive a steam locomotive and hum
all the works of Gilbert and Sullivan without becoming confused or
breaking down in tears."
--Robert Rankin, _The Book of Ultimate Truths_

 
 
 

How to best represent dimensional data

Post by Chris Case » Fri, 24 Sep 1999 04:00:00


<snip>

Quote:>It sounds spiffy, and I've downloaded an evaluation copy to play around
>with, but I'm a bit wary.

>The last thing I want is tying myself to something proprietary. (At
>least with Sybase or MSSQL I can move with relative ease to another ANSI
>SQL-compliant system.) Or something that insists you do business logic
>and application development in it. I'm not a fan of stored procedures or
>any of the stuff that graft programming logic into the database server.
>Furthermore I am looking for something that can do the third tier
>without messing with the other tiers, and without forcing me to tie
>myself to Java, DCOM, CORBA, or a proprietary scripting language.
>Relational databases are useful because they're relatively opaque
>storage systems. How does Cach compare to that?

>And before I go on, what does Cach cost?

I am glad that you said 'relative ease' because, as I am sure you are aware,
as soon as you try to make these databases do anything useful (i.e. real
world rather than classroom samples) you end  up having to use the
'proprietary' parts of the system. This makes it difficult to move.
One of the better things about Cache is that you can choose where you do
things and, in the unlikely event that you find anything better, you have
export facilities for ODL and DDL.

As I said before, if you want to use Cache as a pure RDBMS (ANSI SQL) then
you can do, it will still beat the socks of anything that you have seen
before. As an example from Intersystems web site the Berne justice
department replaced an existing Sybase (I think it was Sybase - one of the
standard DB's anyway) back end with Cache without changing any of the front
end application. The interfacing was pure ODBC.

In terms of where to put logic - it's up to you.
You can use the RDBMS features and embed the logic in the db. You can use
the scripting language which will allow you to put the logic at whichever
tier you want (even better - truly distributed data and logic). If you
really want to then you can use Micro$oft VB (or anything else - including
Java) for the front or front + logic. You say what you want , the system is
flexible enough to allow you to have it.

If you want objects then Cache does is fully object oriented and, unlike the
others, does not try to bolt this on top of relational. Object structure fit
naturally into the underlying multi-dimensional structure.

As to Cost you would be best talking to Intersystems. Like most companies
the cost depends on a number of things and may not be related to published
price lists. One thing to take into account when looking at costs is that
the hardware requirements to run an app or dB under Cache are of an order of
magnitude smaller than any of the so called mainstream databases.
As an example we used to happily run an intensive 96 user system on an old
VAX. When things got swapped around we placed an Oracle app on the same
machine - it crawled at 10 users. (We have learnt since then and removed all
traces of Oracle from our systems..

My best advice is to talk to Intersystems, they are very good at working in
partnership with customers (and potential customers) to see where the best
fits are.

Let me know how you get on.
Chris
----------------------------------------------------------------------------
-----------------------------------------------------------
Usual disclaimers re personal views not those of my employers etc.

 
 
 

How to best represent dimensional data

Post by Alexander Stau » Wed, 29 Sep 1999 04:00:00




[snip]

Quote:> One of the better things about Cache is that you can choose where you do
> things and, in the unlikely event that you find anything better, you have
> export facilities for ODL and DDL.

And do what? Once you're tied to Cach, in order to move somewhere else
you would have to build some kind of bridge between the object model and
the new engine -- through CORBA or DCOM, perhaps -- and what about the
SQL? What do you do on Linux, where there is no [OS-provided,
disregarding things like OpenLink] ODBC? Don't you write directly to
some kind of OCI/CLI-type API?

My point is, while RDBMSs today are different, they're not so different
that -- bar extensive use of stored procedures, triggers, and other
proprietary extensions to SQL (Transact-SQL, PL/SQL) that I tend to stay
away from -- moving is a major problem. With Cach, it seems that there
is nowhere to go. This is not really a criticism of Cach, rather me
griping about proprietary, closed systems, I guess.

[snip]

Quote:> In terms of where to put logic - it's up to you.
> You can use the RDBMS features and embed the logic in the db. You can use
> the scripting language which will allow you to put the logic at whichever
> tier you want (even better - truly distributed data and logic). If you
> really want to then you can use Micro$oft VB (or anything else - including
> Java) for the front or front + logic. You say what you want , the system is
> flexible enough to allow you to have it.

I will be using Zope <http://www.zope.org/> on Linux.

Unless Cach can mimic DB-Library or CT-Library or provides an ODBC-
compliant call-level API, I'll be condemned to writing my own Zope
driver, which does not tempt me. Tell me, is this part of Cach well
documented? I'm blatantly assuming that it has a native API for SQL
access.

I installed the Windows evaluation version and I'm planning to test it
thoroughly. Unfortunately, the Cach ODBC driver does not work on
Win2000k RC1, which I'm running -- it crashes. I have to run the damn
thing on the NT-based Zope server and access it through Zope's ODBC
connector. *sigh*

Quote:> If you want objects then Cache does is fully object oriented and, unlike the
> others, does not try to bolt this on top of relational. Object structure fit
> naturally into the underlying multi-dimensional structure.

This is all well and good.

How do you work with Cach special data types? Say you have a table (or
whatever term Cach uses for collections of objects) and each object has
an array-type member. How do you access this stuff? Let's say I want to
find all companies which are located in the cities Paris and New York.
In a relational, normalized database it might be something like

        select distinct name
       from companies
        where city in ('Paris', 'New York')

What would be the analogous Cach query?

Quote:> As to Cost you would be best talking to Intersystems. Like most companies
> the cost depends on a number of things and may not be related to published
> price lists. One thing to take into account when looking at costs is that
> the hardware requirements to run an app or dB under Cache are of an order of
> magnitude smaller than any of the so called mainstream databases.
> As an example we used to happily run an intensive 96 user system on an old
> VAX. When things got swapped around we placed an Oracle app on the same
> machine - it crawled at 10 users. (We have learnt since then and removed all
> traces of Oracle from our systems..

> My best advice is to talk to Intersystems, they are very good at working in
> partnership with customers (and potential customers) to see where the best
> fits are.

I'll shoot off an email to InterSystems, meanwhile can't you give me
even a hint? Is the pricing on par with other RDBMSs, would you say?

Quote:> Let me know how you get on.
> Chris
> ----------------------------------------------------------------------------
> -----------------------------------------------------------
> Usual disclaimers re personal views not those of my employers etc.

--
Alexander Staubo             http://www.mop.no/~alex/
"`This must be Thursday,' said Arthur to himself, sinking low over
his beer, `I never could get the hang of Thursdays.'"
--Douglas Adams, _The Hitchhiker's Guide to the Galaxy_
 
 
 

How to best represent dimensional data

Post by Anthony Mandi » Wed, 29 Sep 1999 04:00:00



> My point is, while RDBMSs today are different, they're not so different
> that -- bar extensive use of stored procedures, triggers, and other
> proprietary extensions to SQL (Transact-SQL, PL/SQL) that I tend to stay
> away from -- moving is a major problem. With Cach, it seems that there
> is nowhere to go. This is not really a criticism of Cach, rather me
> griping about proprietary, closed systems, I guess.

        Actually the issues are deeper than that but just concern simple
        economics. Its long been acknowledged that the database market
        is saturated. This is why venders are looking at markets other
        than their traditional ones in order to grow and remain viable
        (either that or try and take market share from their competitors).
        This being the case, its difficult for new venders to enter the
        market in anything more than at niche level. If they fail to
        gain a toe hold or are unable to build on it, they will eventually
        become unviable and become bankrupt. Unless they can maintain
        a sufficient income base they won't be able to fund new development
        or marketing projects. In a sense it no different to natural selection
        and survival of the fittest (some may argue the dinosaur/mammal
        model but a cataclysmic change is required to upset the balance).

        Company beancounters and management understand this (to varying
        degrees). So they would tend to favour traditional systems that
        won't be gone tomorrow (no one wants to be left without support -
        we should already be familiar with scenarios of companies going
        out of business after being unable to continue trading due to
        disruptions of as little as a day or two in core areas). Its a
        sad state of affairs but you can't really argue against rationalism
        and pragmatism. Thus its best to pick a vendor who can manage
        to do enough research and development to bring out new products
        and/or evolve existing products in the areas you wish to go.

-am

 
 
 

How to best represent dimensional data

Post by Chris Case » Thu, 30 Sep 1999 04:00:00






>>[snip]
>>> One of the better things about Cache is that you can choose where you do
>>> things and, in the unlikely event that you find anything better, you
have
>>> export facilities for ODL and DDL.

>>And do what? Once you're tied to Cach, in order to move somewhere else
>>you would have to build some kind of bridge between the object model and
>>the new engine -- through CORBA or DCOM, perhaps -- and what about the
>>SQL? What do you do on Linux, where there is no [OS-provided,
>>disregarding things like OpenLink] ODBC? Don't you write directly to
>>some kind of OCI/CLI-type API?

>>My point is, while RDBMSs today are different, they're not so different
>>that -- bar extensive use of stored procedures, triggers, and other
>>proprietary extensions to SQL (Transact-SQL, PL/SQL) that I tend to stay
>>away from -- moving is a major problem. With Cach, it seems that there
>>is nowhere to go. This is not really a criticism of Cach, rather me
>>griping about proprietary, closed systems, I guess.

>I am still not completely sure why you think moving from Cach is any
harder
>than from any other RDBMS. If you are using it as a pure RDBMS then you
need
>to export the data (supported) and the Data definitions (supported).
>If using the object faciltities then you need to export the object
>definitions (supported). In fact Cach will even allow you to export
>relational defintions from Objects if you want to move to an
>Object-relational tool (ouch).

>I will be using Zope <http://www.zope.org/> on Linux.

>>Unless Cach can mimic DB-Library or CT-Library or provides an ODBC-
>>compliant call-level API, I'll be condemned to writing my own Zope
>>driver, which does not tempt me. Tell me, is this part of Cach well
>>documented? I'm blatantly assuming that it has a native API for SQL
>>access.

>Sorry, I don't know ZOPE at all. You would need to point these questions to
>Intersystems.
>I know that previous incarnations (on VMS/UNIX) have a low level call
>interface (C based) but don't know if this helps you.

>>How do you work with Cach special data types? Say you have a table (or
>>whatever term Cach uses for collections of objects) and each object has
>>an array-type member. How do you access this stuff? Let's say I want to
>>find all companies which are located in the cities Paris and New York.
>>In a relational, normalized database it might be something like

>> select distinct name
>>       from companies
>> where city in ('Paris', 'New York')

>>What would be the analogous Cach query?

>You could do it exactly the same.
>If you are using the Object architect to develop object definitions then
>Cach will also generate Relational mappings that can be used for query
>based retrieval. I have to say that I am no expert on the object side and
>much prefer to use the underlying scripting for business logic and database
>access. If you have object questions then I will have to point you
elsewhere
>for help.

>>I'll shoot off an email to InterSystems, meanwhile can't you give me
>>even a hint? Is the pricing on par with other RDBMSs, would you say?

>I would always expect the price to be FAR LOWER than the Oracles of this
>world.

>> Chris
>>> ------------------------------------------------------------------------
-
>---
>>> -----------------------------------------------------------
>>> Usual disclaimers re personal views not those of my employers etc.

 
 
 

How to best represent dimensional data

Post by Chris Case » Thu, 30 Sep 1999 04:00:00





>[snip]
>> One of the better things about Cache is that you can choose where you do
>> things and, in the unlikely event that you find anything better, you have
>> export facilities for ODL and DDL.

>And do what? Once you're tied to Cach, in order to move somewhere else
>you would have to build some kind of bridge between the object model and
>the new engine -- through CORBA or DCOM, perhaps -- and what about the
>SQL? What do you do on Linux, where there is no [OS-provided,
>disregarding things like OpenLink] ODBC? Don't you write directly to
>some kind of OCI/CLI-type API?

>My point is, while RDBMSs today are different, they're not so different
>that -- bar extensive use of stored procedures, triggers, and other
>proprietary extensions to SQL (Transact-SQL, PL/SQL) that I tend to stay
>away from -- moving is a major problem. With Cach, it seems that there
>is nowhere to go. This is not really a criticism of Cach, rather me
>griping about proprietary, closed systems, I guess.

I am still not completely sure why you think moving from Cach is any harder
than from any other RDBMS. If you are using it as a pure RDBMS then you need
to export the data (supported) and the Data definitions (supported).
If using the object faciltities then you need to export the object
definitions (supported). In fact Cach will even allow you to export
relational defintions from Objects if you want to move to an
Object-relational tool (ouch).

I will be using Zope <http://www.zope.org/> on Linux.

Quote:

>Unless Cach can mimic DB-Library or CT-Library or provides an ODBC-
>compliant call-level API, I'll be condemned to writing my own Zope
>driver, which does not tempt me. Tell me, is this part of Cach well
>documented? I'm blatantly assuming that it has a native API for SQL
>access.

Sorry, I don't know ZOPE at all. You would need to point these questions to
Intersystems.
I know that previous incarnations (on VMS/UNIX) have a low level call
interface (C based) but don't know if this helps you.

Quote:>How do you work with Cach special data types? Say you have a table (or
>whatever term Cach uses for collections of objects) and each object has
>an array-type member. How do you access this stuff? Let's say I want to
>find all companies which are located in the cities Paris and New York.
>In a relational, normalized database it might be something like

> select distinct name
>       from companies
> where city in ('Paris', 'New York')

>What would be the analogous Cach query?

You could do it exactly the same.
If you are using the Object architect to develop object definitions then
Cach will also generate Relational mappings that can be used for query
based retrieval. I have to say that I am no expert on the object side and
much prefer to use the underlying scripting for business logic and database
access. If you have object questions then I will have to point you elsewhere
for help.

Quote:

>I'll shoot off an email to InterSystems, meanwhile can't you give me
>even a hint? Is the pricing on par with other RDBMSs, would you say?

I would always expect the price to be FAR LOWER than the Oracles of this
world.

- Show quoted text -

Quote:> Chris
>> -------------------------------------------------------------------------
---
>> -----------------------------------------------------------
>> Usual disclaimers re personal views not those of my employers etc.