Joining two tables using *two* columns (instead of one)

Joining two tables using *two* columns (instead of one)

Post by Robert Klemm » Sat, 07 Apr 2012 21:17:48





>> the syntax is correct.

> I don't think it is the correct syntax for join on two columns. I have
> shown an example of join on two columns. Syntax maybe correct, but I am
> not so sure about the logical equivalence.

I hinted at the badly chosen subject above.  OP apparently really wants
two joins.  And now it seems the problem has shifted towards long
execution time.

Kind regards

        robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Robert Klemm » Sat, 07 Apr 2012 21:33:06



Quote:> And that's exactly what I can't understand: why is it soo harder for
> the SQL engine to perform the 2xJOINs requests than it is to perform
> 2 times a single JOIN request? In both cases I get the data I need.
> This makes me think that I missed something obvious, and that my
> first version (2xJOINs) is doing much more work than what I need it
> to do...

Then please do "explain select ..." or even "explain analyze select ..."
and post results to http://explain.depesz.com/ and refer them here.

Quote:> My fallback option would be to create a temporary table, and then
> populate it with the result of the two SELECTs above (done one after
> the other, possibly inside a single transaction), and then GROUP BY
> all entries to get my data... But is there really no way to perform
> the same job with one single magic SQL incantation?

Well, you proved that you *can* execute it with a single SELECT.  It's
just not as quick (yet) as you expect.  So you need to find out what's
wrong.

Btw, since there a foreign key on the subnet.probe1 and subnet.probe2
you can (and probably should) use a regular join.  No need for an outer
join because you know that every occurrence of subnet.probe1 and
subnet.probe2 has an entry in probes.  So the query would be

SELECT subnet.id,val1.value,val2.value
   FROM subnet
   JOIN probes AS val1 ON subnet.probe1 = val1.id
   JOIN probes AS val2 ON subnet.probe2 = val2.id

Kind regards

        robert

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Jasen Bett » Sun, 08 Apr 2012 08:20:27





>>>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>>>   FROM subnet
>>>>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>>>>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>>>>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

>>> Without giving it to much thought, this should be equal with:

>>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>> FROM subnet
>>> JOIN probes AS prob1
>>>     ON  prob1.id=probe1
>>> JOIN probes AS prob2
>>>     ON prob2.id=probe2;

>> No, that query may return different results, it won't return rows
>> where subnet.probe1 or subnet.probe2 have do not have a matches in
>> the probes table

> WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

that's looking at the subnet table not at the probes table.

--
?? 100% natural


 
 
 

Joining two tables using *two* columns (instead of one)

Post by Jasen Bett » Sun, 08 Apr 2012 08:38:49




>> for each row of the subnet table you're asking for the cross product
>> of all the probe1 values with all the probe2 values

> Hello Jasen,

> Thank you for your answer. This is interesting - you are saying that instead of doing two simple JOINs on my primary table (subnet), I am in fact performing a JOIN and then a JOIN of the JOIN (which, as far as I
> understand, would increase the workload exponentially). If I get this right, it looks like the cause of my problem... Now, how do I tell it to 'just' perform two simple JOINs in one request?
> I answered to my initial post, with complete explanations - would you mind taking a look?

your example has value.id as a primary key, that means it's got an
implicit unique constraint on it, and my conjecture does not hold.

--
?? 100% natural


 
 
 

Joining two tables using *two* columns (instead of one)

Post by Jasen Bett » Sun, 08 Apr 2012 08:42:16





>>>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>>>   FROM subnet
>>>>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>>>>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>>>>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

>>> Without giving it to much thought, this should be equal with:

>>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>> FROM subnet
>>> JOIN probes AS prob1
>>>     ON  prob1.id=probe1
>>> JOIN probes AS prob2
>>>     ON prob2.id=probe2;

>> No, that query may return different results, it won't return rows
>> where subnet.probe1 or subnet.probe2 have do not have a matches in
>> the probes table

> WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

That's looking at the subnet table not at the probes table, without a
foreign key constraint (which wasn't established at the time) they could
still be different results.

--
?? 100% natural


 
 
 

Joining two tables using *two* columns (instead of one)

Post by Lennart Jonsso » Tue, 10 Apr 2012 03:39:09






>>>>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>>>>   FROM subnet
>>>>>   LEFT OUTER JOIN probes AS prob1 ON prob1.id=probe1
>>>>>   LEFT OUTER JOIN probes AS prob2 ON prob2.id=probe2
>>>>>   WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

>>>> Without giving it to much thought, this should be equal with:

>>>> SELECT id, prob1.value AS p1, prob2.value AS p2
>>>> FROM subnet
>>>> JOIN probes AS prob1
>>>>     ON  prob1.id=probe1
>>>> JOIN probes AS prob2
>>>>     ON prob2.id=probe2;

>>> No, that query may return different results, it won't return rows
>>> where subnet.probe1 or subnet.probe2 have do not have a matches in
>>> the probes table

>> WHERE probe1 IS NOT NULL AND probe2 IS NOT NULL;

> that's looking at the subnet table not at the probes table.

Ah, you're right. I wrongly assumed the other way around since the where
clause does not make any sense as is.

/Lennart

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mladen Gogal » Tue, 10 Apr 2012 04:01:50



> I hinted at the badly chosen subject above.  OP apparently really wants
> two joins.  And now it seems the problem has shifted towards long
> execution time.

Well, controlling the execution plan is not one of the strong points of
PostgreSQL. This is as far as I am willing to go in this discussion.

--
http://mgogala.byethost5.com

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Torsten Kirschne » Wed, 11 Apr 2012 12:58:37


Den 06.04.2012 08:48, skrev Mateusz:
[...]

Quote:> So to avoid wasting your time any more, I prepared a clear (I hope
> now) explanation of my problem, with real SQL requests, the example
> schema, etc.
[...]
> This request does a perfect job, seemingly:

> SELECT subnet.id,val1.value,val2.value
>   FROM subnet
>   LEFT OUTER JOIN probes AS val1 ON (subnet.probe1=val1.id)
>   LEFT OUTER JOIN probes AS val2 ON (subnet.probe2=val2.id);
>      id     | value | value
> ------------+-------+-------
>  1.1.1.0/24 |    11 |    12
>  1.1.2.0/24 |    13 |    14
>  1.1.3.0/24 |    15 |    16
>  1.1.4.0/24 |    17 |    18
> (4 rows)

> It works like a charm on my test-purpose database that I created
> for the need of this post - that is, when there are 4 entries in
> subnets. When I test this on my 'real' table (with over 300K sub-
> nets), it hangs for several minutes, and exits with an 'out of
> memory' error message.

This exact query took 6173 ms (i.e. 6 seconds) with 300k subnets on an
off the shelf PostgreSQL 9.1 running with default settings on a very
modest Windows PC.

Could it be that it's not PostgreSQL that is causing the problem?

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Fredrik Jonso » Wed, 11 Apr 2012 17:53:48



Quote:>  This exact query took 6173 ms (i.e. 6 seconds) with 300k subnets on an
>  off the shelf PostgreSQL 9.1 running with default settings on a very
>  modest Windows PC.

>  Could it be that it's not PostgreSQL that is causing the problem?

Its starting to sound like that the postgresql configuration is wrong.

Read the tuning advice on the postgres wiki[0] and especially check
shared_buffers and effective_cache_size and compare those to the actual
available physical memory on the machine.

Don't forget to account for memory actively used by other applications on the
system, before allocating memory for postgresql. If you are excessively liberal
the machine will start to swap and may run out of memory. That said, don't be
too cheap either. If you set either of the values above too low, postgresql
will schedule for slow disk access and may move lots of data between disk and
memory unnecessarily.

[0] http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

--
Fredrik Jonson

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Torsten Kirschne » Thu, 12 Apr 2012 07:06:35


Den 4/10/12 10:53 AM, skrev Fredrik Jonson:

>>   This exact query took 6173 ms (i.e. 6 seconds) with 300k subnets on an
>>   off the shelf PostgreSQL 9.1 running with default settings on a very
>>   modest Windows PC.

> Its starting to sound like that the postgresql configuration is wrong.

[...]
Right. When increasing to 500k subnets (or was it 1M?), performance
deteriorates. Suddenly, it takes about 1 minute on that particular PC to
run the query, with what appears to be way more disk activity.

Maybe Mateusz' server is set up with even less resources.
But could this really lead to an "out of memory" situation on the
server? No graceful degredation?

Anyway, what a fun little problem to play with, in order to learn.

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Mladen Gogal » Thu, 12 Apr 2012 12:21:12



> SELECT subnet.id,val1.value,val2.value
>    FROM subnet JOIN probes AS val1 ON subnet.probe1 = val1.id JOIN
>    probes AS val2 ON subnet.probe2 = val2.id

If it was version 9.1, I would try with hash indexes on all involved
columns. Hash algorithm is greatly improved in recent versions and it may
provide the desired speed-up.

--
http://mgogala.byethost5.com

 
 
 

Joining two tables using *two* columns (instead of one)

Post by Robert Klemm » Sat, 14 Apr 2012 03:38:56



Quote:> Maybe Mateusz' server is set up with even less resources.
> But could this really lead to an "out of memory" situation on the
> server? No graceful degredation?

> Anyway, what a fun little problem to play with, in order to learn.

Did anyone see a plan from OP's DB yet?

Kind regards

        robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/