Problems with order by, limit, and indices

Problems with order by, limit, and indices

Post by Denis Perchi » Mon, 08 Jan 2001 16:41:17



Quote:> > Example with variant_id = 2

> > slygreetings=> explain select * from users where variant_id=2 AND
> > active='f' order by rcptdate limit 60;
> > NOTICE:  QUERY PLAN:

> > Limit  (cost=77117.18..77117.18 rows=60 width=145)
> >   -> Sort  (cost=77117.18..77117.18 rows=162640 width=145)
> >     -> Seq Scan on users  (cost=0.00..33479.65 rows=162640 width=145)

> This plan looks fine to me, considering that variant_id=2 is the vast
> majority of the table.  An indexscan will be slower, except perhaps if
> you've recently CLUSTERed the table on this index.  (If you don't
> believe me, try it with ENABLE_SEQSCAN set to OFF.)

I would agree with you if there was no limit specified. As far as I can
understand it is possible to traverse users_rcptdate_vid_key Forward,
and get 60 tuples, than finish. And that tuples will be already sorted (index
includes rcptdate also).

Quote:> > Example with variant_id = 5

> > slygreetings=> explain select * from users where variant_id=5 AND
> > active='f' order by rcptdate limit 60;
> > NOTICE:  QUERY PLAN:

> > Limit  (cost=13005.10..13005.10 rows=60 width=145)
> >   -> Sort  (cost=13005.10..13005.10 rows=3445 width=145)
> >     -> Index Scan using users_rcptdate_vid_key on users
> > (cost=0.00..12658.35 rows=3445 width=145)

> You could probably get a plan without the sort step if you said
>    ... order by variant_id, rcptdate;

No way, it just get all tuples for the qual, sort them, and the limiting.
That's horrible...

slygreetings=> explain select * from users where variant_id=5 AND active='f'
order by rcptdate,variant_id limit 60;
NOTICE:  QUERY PLAN:

Limit  (cost=13005.10..13005.10 rows=60 width=145)
  ->  Sort  (cost=13005.10..13005.10 rows=3445 width=145)
        ->  Index Scan using users_rcptdate_vid_key on users  
(cost=0.00..12658.35 rows=3445 width=145)

EXPLAIN
slygreetings=> explain select * from users where variant_id=5 AND active='f'
order by rcptdate,variant_id,active limit 60;
NOTICE:  QUERY PLAN:

Limit  (cost=13005.10..13005.10 rows=60 width=145)
  ->  Sort  (cost=13005.10..13005.10 rows=3445 width=145)
        ->  Index Scan using users_rcptdate_vid_key on users  
(cost=0.00..12658.35 rows=3445 width=145)

EXPLAIN

--
Sincerely Yours,
Denis Perchine

----------------------------------

HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

 
 
 

Problems with order by, limit, and indices

Post by Denis Perchi » Mon, 08 Jan 2001 20:05:41


Hi,

another interesting thing...
This is current 7.1.

slygreetings=> explain select * from users where variant_id=5 AND active='f'
order by rcptdate,variant_id,active limit 60;
NOTICE:  QUERY PLAN:

Limit  (cost=13005.10..13005.10 rows=60 width=145)
  ->  Sort  (cost=13005.10..13005.10 rows=3445 width=145)
        ->  Index Scan using users_rcptdate_vid_key on users  
(cost=0.00..12658.35 rows=3445 width=145)

EXPLAIN
slygreetings=> set enable_sort to off;
SET VARIABLE
slygreetings=> explain select * from users where variant_id=5 AND active='f'
order by rcptdate,variant_id,active limit 60;
NOTICE:  QUERY PLAN:

Limit  (cost=100013005.10..100013005.10 rows=60 width=145)
  ->  Sort  (cost=100013005.10..100013005.10 rows=3445 width=145)
        ->  Index Scan using users_rcptdate_vid_key on users  
(cost=0.00..12658.35 rows=3445 width=145)

EXPLAIN

Cost is something really wierd.... Why?

--
Sincerely Yours,
Denis Perchine

----------------------------------

HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

 
 
 

Problems with order by, limit, and indices

Post by Tom La » Tue, 09 Jan 2001 02:45:14



>> You could probably get a plan without the sort step if you said
>> ... order by variant_id, rcptdate;
> No way, it just get all tuples for the qual, sort them, and the limiting.
> That's horrible...
> slygreetings=> explain select * from users where variant_id=5 AND active='f'
> order by rcptdate,variant_id limit 60;
> NOTICE:  QUERY PLAN:
> Limit  (cost=13005.10..13005.10 rows=60 width=145)
>    -> Sort  (cost=13005.10..13005.10 rows=3445 width=145)
>       -> Index Scan using users_rcptdate_vid_key on users (cost=0.00..12658.35 rows=3445 width=145)

Now, that's not what I told you to do, is it?  It works fine for me:

regression=# create table users (variant_id int , active bool, rcptdate date);
CREATE
regression=# create index usersind on users( variant_id,rcptdate,active);
CREATE
regression=# explain  select * from users where variant_id=5 AND active='f'
regression-# order by rcptdate limit 1;
NOTICE:  QUERY PLAN:

Limit  (cost=8.22..8.22 rows=1 width=9)
  ->  Sort  (cost=8.22..8.22 rows=5 width=9)
        ->  Index Scan using usersind on users  (cost=0.00..8.16 rows=5 width=9)

EXPLAIN
regression=# explain  select * from users where variant_id=5 AND active='f'
regression-# order by variant_id,rcptdate limit 1;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..1.63 rows=1 width=9)
  ->  Index Scan using usersind on users  (cost=0.00..8.16 rows=5 width=9)

EXPLAIN

The specified sort order has to match the index if you hope to avoid
a sort step.

                        regards, tom lane

 
 
 

Problems with order by, limit, and indices

Post by Tom La » Tue, 09 Jan 2001 02:57:11



> Cost is something really wierd.... Why?

That's how the enable stuff works for plan types that can't be ignored
completely: it just adds a big constant to the estimated cost.  If there
is no other alternative plan, you get the unwanted plan type anyway.

                        regards, tom lane

 
 
 

Problems with order by, limit, and indices

Post by Denis Perchi » Tue, 09 Jan 2001 15:23:42


Quote:> > slygreetings=> explain select * from users where variant_id=5 AND
> > active='f' order by rcptdate,variant_id limit 60;
> > NOTICE:  QUERY PLAN:

> > Limit  (cost=13005.10..13005.10 rows=60 width=145)
> >    -> Sort  (cost=13005.10..13005.10 rows=3445 width=145)
> >       -> Index Scan using users_rcptdate_vid_key on users
> > (cost=0.00..12658.35 rows=3445 width=145)

> Now, that's not what I told you to do, is it?  It works fine for me:

> regression=# create table users (variant_id int , active bool, rcptdate
> date); CREATE
> regression=# create index usersind on users( variant_id,rcptdate,active);
> CREATE
> regression=# explain  select * from users where variant_id=5 AND active='f'
> regression-# order by rcptdate limit 1;
> NOTICE:  QUERY PLAN:

> Limit  (cost=8.22..8.22 rows=1 width=9)
>   ->  Sort  (cost=8.22..8.22 rows=5 width=9)
>         ->  Index Scan using usersind on users  (cost=0.00..8.16 rows=5
> width=9)

> EXPLAIN
> regression=# explain  select * from users where variant_id=5 AND active='f'
> regression-# order by variant_id,rcptdate limit 1;
> NOTICE:  QUERY PLAN:

> Limit  (cost=0.00..1.63 rows=1 width=9)
>   ->  Index Scan using usersind on users  (cost=0.00..8.16 rows=5 width=9)

> EXPLAIN

> The specified sort order has to match the index if you hope to avoid
> a sort step.

Dummy me... My aplogies for the time I stole...

--
Sincerely Yours,
Denis Perchine

----------------------------------

HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

 
 
 

1. Problem with indexes, LIMIT, ORDER BY ... DESC

Hi,

I'm having trouble with indexes in PostgreSQL 7.1.3.  Here is a
transcript:

==========================================================================
=
announce=# \d foo
                  Table "foo"
  Attribute |           Type           | Modifier
-----------+--------------------------+----------
  code      | character varying(4)     | not null
  date      | timestamp with time zone | not null
  price     | numeric(10,2)            | not null
  volume    | integer                  | not null
  other     | boolean                  | not null
Index: foo_code_date

announce=# \d foo_code_date
       Index "foo_code_date"
  Attribute |           Type
-----------+--------------------------
  code      | character varying(4)
  date      | timestamp with time zone
btree

announce=# explain select date from foo where date < '06/08/2001
23:59' and code = 'FOO' order by code, date limit 1;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..3.78 rows=1 width=20)
   ->  Index Scan using foo_code_date on foo  
(cost=0.00..23996.55 rows=6355 width=20)
==========================================================================
=

So far, so good.  The index is properly used, and the query is
fast.  However, if I want to sort *descending* (which is really
what I want to do, to find the closest row with a date less than
a given date), then a full table scan is done:

==========================================================================
=
announce=# explain select date from foo where date < '06/08/2001
23:59' and code = 'FOO' order by code, date DESC limit 1;
NOTICE:  QUERY PLAN:

Limit  (cost=24397.98..24397.98 rows=1 width=20)
   ->  Sort  (cost=24397.98..24397.98 rows=6355 width=20)
         ->  Index Scan using foo_code_date on foo  
(cost=0.00..23996.55 rows=6355 width=20)
==========================================================================
=

What can I do to improve this?

Might it help if I reversed the order of the fields in the
index?  There are only about 50 possible values for the 'code'
field.  I'm not sure which order is better in this case.  The
'date' field is well spread out over a year.

  -Ken

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

2. Searching replacement for DWB

3. Resultset order, ORDER BY, & Clustered Indexes

4. Calling stored procedure from MS Access

5. Clustered index, physical data order, order by clause

6. VFP Grids

7. clustered index order by problem

8. MDX from Excel

9. SQL Server index problem on order by

10. primary index order problem

11. INTERBASE, collation order and create index problem

12. SQL Server index problem on order by

13. Index order problem