Joining two tables using *two* columns (instead of one)

Joining two tables using *two* columns (instead of one)

Post by Mateus » Fri, 06 Apr 2012 22:21:08



Hi,

I'm stuck with a (maybe simple) problem of JOINing two tables.. The thing is that I need to join these tables on two different columns.

Let's say I have a table called "subnet" with three columns: id, probe1, probe2. probe1 and probe2 are both foreign keys to a 'probes' table that contains columns: id, value. Now, I
need to join all this together...

This is what I tried so far:

SELECT id, prob1.value AS p1, prob2.value AS p2
  FROM subnet
  LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
  LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
  WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

This seemed good to me, because that's what I would do if I would need to join tables in the most simple way... Unfortunately the command doesn't produce anything - when I launch it, I
have to wait several minutes, only to get kicked out of the pgsql shell with a "out of memory" message. Of course if I join tables only once, I get the result immediately.

I'm pretty sure I missed something obvious here... Any idea?

best regards,
Mateusz

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Lennart Jonsso » Fri, 06 Apr 2012 23:30:31



Quote:> Hi,

> I'm stuck with a (maybe simple) problem of JOINing two tables.. The thing is that I need to join these tables on two different columns.

> Let's say I have a table called "subnet" with three columns: id, probe1, probe2. probe1 and probe2 are both foreign keys to a 'probes' table that contains columns: id, value. Now, I
> need to join all this together...

> This is what I tried so far:

> SELECT id, prob1.value AS p1, prob2.value AS p2
>   FROM subnet
>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

Without giving it to much thought, this should be equal with:

SELECT id, prob1.value AS p1, prob2.value AS p2
FROM subnet
JOIN probes AS prob1
    ON  prob1.id=probe1
JOIN probes AS prob2
    ON prob2.id=probe2;

With proper indexes there's a good chance that this will be less
resource demanding

/Lennart

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mateus » Sat, 07 Apr 2012 01:01:52


Hi,

I see that you replaced my "LEFT OUTER JOIN" by short "JOIN" - I tested that, but it doesn't change (in any noticeable way at least) the behavior of the database.. I still have to
wait several minutes, to end up with a "out of memory" message.

If I perform such request:
SELECT subnet.id, probes.value
   FROM subnet
   LEFT OUTER JOIN probes ON probes.id=subnet.probe1;

I get the result after less than 1 second. So doing it on 2 columns should (I guess) still be well under 2s... I don't understand why adding a second column is so much different from
doing the request two times?

best regards,
Mateusz



>> Hi,

>> I'm stuck with a (maybe simple) problem of JOINing two tables.. The thing
>> is that I need to join these tables on two different columns.

>> Let's say I have a table called "subnet" with three columns: id, probe1,
>> probe2. probe1 and probe2 are both foreign keys to a 'probes' table that
>> contains columns: id, value. Now, I need to join all this together...

>> This is what I tried so far:

>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>   FROM subnet
>>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

> Without giving it to much thought, this should be equal with:

> SELECT id, prob1.value AS p1, prob2.value AS p2
> FROM subnet
> JOIN probes AS prob1
>     ON  prob1.id=probe1
> JOIN probes AS prob2
>     ON prob2.id=probe2;

> With proper indexes there's a good chance that this will be less
> resource demanding

> /Lennart

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mladen Gogal » Sat, 07 Apr 2012 04:48:50



> Hi,

> I'm stuck with a (maybe simple) problem of JOINing two tables.. The
> thing is that I need to join these tables on two different columns.

> Let's say I have a table called "subnet" with three columns: id, probe1,
> probe2. probe1 and probe2 are both foreign keys to a 'probes' table that
> contains columns: id, value. Now, I need to join all this together...

> This is what I tried so far:

> SELECT id, prob1.value AS p1, prob2.value AS p2
>   FROM subnet LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1 LEFT
>   OUTER JOIN probes AS prob2 ON prob2.id=probe2 WHERE probe1 IS NOT NULL
>   AND probe2 IS NOT NULL;

> This seemed good to me, because that's what I would do if I would need
> to join tables in the most simple way... Unfortunately the command
> doesn't produce anything - when I launch it, I have to wait several
> minutes, only to get kicked out of the pgsql shell with a "out of
> memory" message. Of course if I join tables only once, I get the result
> immediately.

> I'm pretty sure I missed something obvious here... Any idea?

> best regards,
> Mateusz

You are allowed to compare ordered pairs. That's the relational theory:

scott=# create table dept1 as select * from dept
scott-# ;
SELECT 4
scott=# select d.deptno,d1.loc
scott-# from dept d join dept1 d1 on ((d.deptno,d.loc)=
(d1.deptno,d1.loc));
 deptno |   loc    
--------+----------
     10 | NEW YORK
     20 | DALLAS
     30 | CHICAGO
     40 | BOSTON
(4 rows)

scott=#

--
http://mgogala.byethost5.com

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mladen Gogal » Sat, 07 Apr 2012 04:50:46



> You are allowed to compare ordered pairs. That's the relational theory:

> scott=# create table dept1 as select * from dept scott-# ;
> SELECT 4 scott=# select d.deptno,d1.loc scott-# from dept d join dept1
> d1 on ((d.deptno,d.loc)= (d1.deptno,d1.loc));
>  deptno |   loc
> --------+----------
>      10 | NEW YORK 20 | DALLAS 30 | CHICAGO 40 | BOSTON
> (4 rows)

> scott=#

That can also be done with outer joins:

scott=# select d.deptno,d1.loc
from dept d left outer join dept1 d1 on ((d.deptno,d.loc)=
(d1.deptno,d1.loc));
 deptno |   loc    
--------+----------
     10 | NEW YORK
     20 | DALLAS
     30 | CHICAGO
     40 | BOSTON
(4 rows)

I have never tried with joints, but then again....
--
http://mgogala.byethost5.com

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Robert Klemm » Sat, 07 Apr 2012 06:16:27



Btw, the subject does not match your problem description well: I would
consider a join on two columns as a join where there are criteria on two
columns.  You are doing two joins.

Quote:> I see that you replaced my "LEFT OUTER JOIN" by short "JOIN" - I tested that, but it doesn't change (in any noticeable way at least) the behavior of the database.. I still have to
> wait several minutes, to end up with a "out of memory" message.

> If I perform such request:
> SELECT subnet.id, probes.value
>     FROM subnet
>     LEFT OUTER JOIN probes ON probes.id=subnet.probe1;

> I get the result after less than 1 second. So doing it on 2 columns should (I guess) still be well under 2s... I don't understand why adding a second column is so much different from
> doing the request two times?

> best regards,
> Mateusz

Did you look at the execution plan?  What does it look like?

Kind regards

        robert

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Jasen Bett » Sat, 07 Apr 2012 06:20:41



Quote:

>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>   FROM subnet
>>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

> Without giving it to much thought, this should be equal with:

> SELECT id, prob1.value AS p1, prob2.value AS p2
> FROM subnet
> JOIN probes AS prob1
>     ON  prob1.id=probe1
> JOIN probes AS prob2
>     ON prob2.id=probe2;

No, that query may return different results, it won't return rows
where subnet.probe1 or subnet.probe2 have do not have a matches in
the probes table

--
?? 100% natural


 
 
 

Joining two tables using *two* columns (instead of one)

Post by Jasen Bett » Sat, 07 Apr 2012 06:28:13



Quote:> Hi,

> I'm stuck with a (maybe simple) problem of JOINing two tables.. The thing is that I need to join these tables on two different columns.

> Let's say I have a table called "subnet" with three columns: id, probe1, probe2. probe1 and probe2 are both foreign keys to a 'probes' table that contains columns: id, value. Now, I
> need to join all this together...

> This is what I tried so far:

> SELECT id, prob1.value AS p1, prob2.value AS p2
>   FROM subnet
>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

> This seemed good to me, because that's what I would do if I would need to join tables in the most simple way... Unfortunately the command doesn't produce anything - when I launch it, I
> have to wait several minutes, only to get kicked out of the pgsql shell with a "out of memory" message. Of course if I join tables only once, I get the result immediately.

> I'm pretty sure I missed something obvious here... Any idea?

for each row of the subnet table you're asking for the cross product
of all the probe1 values with all the probe2 values

I can't really say any more without sample data.

--
?? 100% natural


 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mladen Gogal » Sat, 07 Apr 2012 10:00:09



> Did you look at the execution plan?  What does it look like?

> Kind regards

>    robert

Robert, I don't understand? This looks like a syntax question, why is the
plan important? Did I misunderstand anything here?

--
http://mgogala.byethost5.com

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Jasen Bett » Sat, 07 Apr 2012 13:12:03




>> Did you look at the execution plan?  What does it look like?

>> Kind regards

>>        robert

> Robert, I don't understand? This looks like a syntax question, why is the
> plan important? Did I misunderstand anything here?

the syntax is correct.  the plan will (for those that can read
plans) contain a lot of information not included in the original
post such as the number and diversity of the records in the tables
involved. also getting a plan proves the syntax was accepted.

I suspect the query is missing some constraint.

--
?? 100% natural


 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mladen Gogal » Sat, 07 Apr 2012 13:44:10



> the syntax is correct.

I don't think it is the correct syntax for join on two columns. I have
shown an example of join on two columns. Syntax maybe correct, but I am
not so sure about the logical equivalence.

--
http://mgogala.byethost5.com

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mateus » Sat, 07 Apr 2012 15:48:03


Hi all,

I'd like to thank everyone for your answers. I believe I wasn't concise enough in my first post, which made you guys going in different directions (and I think none of them is the one
I am looking for) :)
So to avoid wasting your time any more, I prepared a clear (I hope now) explanation of my problem, with real SQL requests, the example schema, etc.

First I create my tables:

CREATE TABLE probes (id INTEGER PRIMARY KEY,
                     value INTEGER NOT NULL);

CREATE TABLE subnet (id CIDR PRIMARY KEY,
                     probe1 INTEGER REFERENCES probes(id),
                     probe2 INTEGER REFERENCES probes(id));

Then, I populate tables with some fake data:

INSERT INTO probes (id, value) VALUES (1, 11);
INSERT INTO probes (id, value) VALUES (2, 12);
INSERT INTO probes (id, value) VALUES (3, 13);
INSERT INTO probes (id, value) VALUES (4, 14);
INSERT INTO probes (id, value) VALUES (5, 15);
INSERT INTO probes (id, value) VALUES (6, 16);
INSERT INTO probes (id, value) VALUES (7, 17);
INSERT INTO probes (id, value) VALUES (8, 18);
INSERT INTO subnet (id, probe1, probe2) VALUES ('1.1.1.0/24', 1, 2);
INSERT INTO subnet (id, probe1, probe2) VALUES ('1.1.2.0/24', 3, 4);
INSERT INTO subnet (id, probe1, probe2) VALUES ('1.1.3.0/24', 5, 6);
INSERT INTO subnet (id, probe1, probe2) VALUES ('1.1.4.0/24', 7, 8);

Now, what I'd need is to perform a request that would provide me with the corresponding 'probes' to every 'subnet'.

This request does a perfect job, seemingly:

SELECT subnet.id,val1.value,val2.value
  FROM subnet
  LEFT OUTER JOIN probes AS val1 ON (subnet.probe1=val1.id)
  LEFT OUTER JOIN probes AS val2 ON (subnet.probe2=val2.id);
     id     | value | value
------------+-------+-------
 1.1.1.0/24 |    11 |    12
 1.1.2.0/24 |    13 |    14
 1.1.3.0/24 |    15 |    16
 1.1.4.0/24 |    17 |    18
(4 rows)

It works like a charm on my test-purpose database that I created for the need of this post - that is, when there are 4 entries in subnets. When I test this on my 'real' table (with
over 300K subnets), it hangs for several minutes, and exits with an 'out of memory' error message.
However, doing the two following requests provides results immediately:

SELECT subnet.id,val1.value
  FROM subnet
  LEFT OUTER JOIN probes AS val1 ON (subnet.probe1=val1.id);

SELECT subnet.id,val2.value
  FROM subnet
  LEFT OUTER JOIN probes AS val2 ON (subnet.probe2=val2.id);

And that's exactly what I can't understand: why is it soo harder for the SQL engine to perform the 2xJOINs requests than it is to perform 2 times a single JOIN request? In both cases
I get the data I need. This makes me think that I missed something obvious, and that my first version (2xJOINs) is doing much more work than what I need it to do...

My fallback option would be to create a temporary table, and then populate it with the result of the two SELECTs above (done one after the other, possibly inside a single
transaction), and then GROUP BY all entries to get my data... But is there really no way to perform the same job with one single magic SQL incantation?

Best regards,
Mateusz


> Hi,

> I'm stuck with a (maybe simple) problem of JOINing two tables.. The thing
> is that I need to join these tables on two different columns.

> Let's say I have a table called "subnet" with three columns: id, probe1,
> probe2. probe1 and probe2 are both foreign keys to a 'probes' table that
> contains columns: id, value. Now, I need to join all this together...

> This is what I tried so far:

> SELECT id, prob1.value AS p1, prob2.value AS p2
>   FROM subnet
>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

> This seemed good to me, because that's what I would do if I would need to
> join tables in the most simple way... Unfortunately the command doesn't
> produce anything - when I launch it, I have to wait several minutes, only
> to get kicked out of the pgsql shell with a "out of memory" message. Of
> course if I join tables only once, I get the result immediately.

> I'm pretty sure I missed something obvious here... Any idea?

> best regards,
> Mateusz

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mateus » Sat, 07 Apr 2012 16:00:39



Quote:> for each row of the subnet table you're asking for the cross product
> of all the probe1 values with all the probe2 values

Hello Jasen,

Thank you for your answer. This is interesting - you are saying that instead of doing two simple JOINs on my primary table (subnet), I am in fact performing a JOIN and then a JOIN of the JOIN (which, as far as I
understand, would increase the workload exponentially). If I get this right, it looks like the cause of my problem... Now, how do I tell it to 'just' perform two simple JOINs in one request?
I answered to my initial post, with complete explanations - would you mind taking a look?

thank you
Mateusz

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mateus » Sat, 07 Apr 2012 16:13:09


Hi Mladen,

This is neat, thanks! I don't think it answers to my problem because in fact I don't have two distinct pairs, but two pairs sharing one common column (so two pairs formed from three elements). But nonetheless I
wasn't aware of the possibility of using double pairs on JOINs. This will definitely be useful to me in other cases :)
I answered to my initial post, providing explanations of what exactly my problem (and need) are. I'd be happy if you could spare some minutes taking a look at it. I'm sorry if my initial post wasn't clear enough
- I should be more specific from the beginning.

thank you!
Mateusz



>> Hi,

>> I'm stuck with a (maybe simple) problem of JOINing two tables.. The
>> thing is that I need to join these tables on two different columns.

>> Let's say I have a table called "subnet" with three columns: id, probe1,
>> probe2. probe1 and probe2 are both foreign keys to a 'probes' table that
>> contains columns: id, value. Now, I need to join all this together...

>> This is what I tried so far:

>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>   FROM subnet LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1 LEFT
>>   OUTER JOIN probes AS prob2 ON prob2.id=probe2 WHERE probe1 IS NOT NULL
>>   AND probe2 IS NOT NULL;

>> This seemed good to me, because that's what I would do if I would need
>> to join tables in the most simple way... Unfortunately the command
>> doesn't produce anything - when I launch it, I have to wait several
>> minutes, only to get kicked out of the pgsql shell with a "out of
>> memory" message. Of course if I join tables only once, I get the result
>> immediately.

>> I'm pretty sure I missed something obvious here... Any idea?

>> best regards,
>> Mateusz

> You are allowed to compare ordered pairs. That's the relational theory:

> scott=# create table dept1 as select * from dept
> scott-# ;
> SELECT 4
> scott=# select d.deptno,d1.loc
> scott-# from dept d join dept1 d1 on ((d.deptno,d.loc)=
> (d1.deptno,d1.loc));
>  deptno |   loc    
> --------+----------
>      10 | NEW YORK
>      20 | DALLAS
>      30 | CHICAGO
>      40 | BOSTON
> (4 rows)

> scott=#

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Lennart Jonsso » Sat, 07 Apr 2012 16:30:38




>>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>>   FROM subnet
>>>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>>>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>>>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

>> Without giving it to much thought, this should be equal with:

>> SELECT id, prob1.value AS p1, prob2.value AS p2
>> FROM subnet
>> JOIN probes AS prob1
>>     ON  prob1.id=probe1
>> JOIN probes AS prob2
>>     ON prob2.id=probe2;

> No, that query may return different results, it won't return rows
> where subnet.probe1 or subnet.probe2 have do not have a matches in
> the probes table

WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

/Lennart

 
 
 

1. joining two columns from two different tables in a query - solved

Thanks Michale. The problem is solved.:)

__________________________________________________________________
McAfee VirusScan Online from the Netscape Network.
Comprehensive protection for your entire computer. Get your free trial today!
http://channels.netscape.com/ns/computing/mcafee/index.jsp?promo=393397

Get AOL Instant Messenger 5.1 free of charge.  Download Now!
http://aim.aol.com/aimnew/Aim/register.adp?promo=380455

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql

2. data models tc++ v3.0

3. joining two columns from two different tables in a query

4. Help! Programming Wangtek tape driver

5. Merge tables, was [two table's columns into one columns w/o

6. The eBook Community -- The Internet's largest eBook discussion forum

7. Joins to one table form two columns

8. one query, two tables, two aggregate functions

9. how to join two tables and include all records from one

10. Join Two tables in one database

11. query of two tables returns far too many rows, more than the two tables

12. One DLL, two servers, two Oracle 8i databases, VC++ 6.0