SQL: delete rows based on a column being like-compared to other rows

SQL: delete rows based on a column being like-compared to other rows

Post by Richard A. DeVenezi » Sun, 22 Jun 2003 05:24:31



I have some metadata that is generated for me.
The metadata are like strings used to extract data for specialized
reporting.
Some of the metadata can cause one row to be reported more than once, which
I don't want.
In the following example I would want to 'reduce' the three rows to just one
row

data foo (label='Valley girl');
length like $20;
input like ;
cards;
jabba
jabba%
jabba%the%hut%
;
run;

proc sql;
  delete from foo
  where like < could be matched by any other different like >
  ;
quit;

jabba% should remain
"jabba" is like 'jabba%'  and should be removed from use as metadata
"jabba%the%hut" is like 'jabba%' and should be removed from use as metadata

Any suggestions ?

--
Richard A. DeVenezia, http://www.devenezia.com

 
 
 

SQL: delete rows based on a column being like-compared to other rows

Post by Huang, » Sun, 22 Jun 2003 05:42:53


I think it should be modified after reading your last paragraph:

proc sql;
  delete from foo
  where like like 'jabba%%' and not (like='jabba%')
  ;
quit;

This version will keep jabba% and drop jabba.

Ya

-----Original Message-----
From: Huang, Ya
Sent: Friday, June 20, 2003 1:37 PM

Subject: RE: SQL: delete rows based on a column being like-compared to

other rows

How about this?

proc sql;
  delete from foo
  where like like 'jabba%' and not (like='jabba')
  ;
quit;

It works for your sample data.

Ya
-----Original Message-----

Sent: Friday, June 20, 2003 1:25 PM

Subject: SQL: delete rows based on a column being like-compared to other
rows

I have some metadata that is generated for me.
The metadata are like strings used to extract data for specialized
reporting.
Some of the metadata can cause one row to be reported more than once, which
I don't want.
In the following example I would want to 'reduce' the three rows to just one
row

data foo (label='Valley girl');
length like $20;
input like ;
cards;
jabba
jabba%
jabba%the%hut%
;
run;

proc sql;
  delete from foo
  where like < could be matched by any other different like >
  ;
quit;

jabba% should remain
"jabba" is like 'jabba%'  and should be removed from use as metadata
"jabba%the%hut" is like 'jabba%' and should be removed from use as metadata

Any suggestions ?

--
Richard A. DeVenezia, http://www.devenezia.com


 
 
 

SQL: delete rows based on a column being like-compared to other rows

Post by Huang, » Sun, 22 Jun 2003 05:36:34


How about this?

proc sql;
  delete from foo
  where like like 'jabba%' and not (like='jabba')
  ;
quit;

It works for your sample data.

Ya

-----Original Message-----

Sent: Friday, June 20, 2003 1:25 PM

Subject: SQL: delete rows based on a column being like-compared to other
rows

I have some metadata that is generated for me.
The metadata are like strings used to extract data for specialized
reporting.
Some of the metadata can cause one row to be reported more than once, which
I don't want.
In the following example I would want to 'reduce' the three rows to just one
row

data foo (label='Valley girl');
length like $20;
input like ;
cards;
jabba
jabba%
jabba%the%hut%
;
run;

proc sql;
  delete from foo
  where like < could be matched by any other different like >
  ;
quit;

jabba% should remain
"jabba" is like 'jabba%'  and should be removed from use as metadata
"jabba%the%hut" is like 'jabba%' and should be removed from use as metadata

Any suggestions ?

--
Richard A. DeVenezia, http://www.devenezia.com

 
 
 

SQL: delete rows based on a column being like-compared to other rows

Post by Richard A. DeVenezi » Sun, 22 Jun 2003 07:15:54


I provided Ya Huang this additional clarification.

of data

jabba
jabba%
jabba%the%hut
xyz%abc
132%xyz
123%abc
12%3%

I would want to reduce to

jabba%
xyz%abc
132%xyz
12%3%

--
Richard A. DeVenezia, http://www.devenezia.com

 
 
 

SQL: delete rows based on a column being like-compared to other rows

Post by Huang, » Wed, 25 Jun 2003 07:14:25


Richard,

I just came out the following code. Under some strict assumptions,
it seems to work:

data foo;
length like $20;
input like ;
cards;
jabba
jabba%
jabba%the%hut
xyz%abc
xyz%abc%def
132%xyz
132%xyz%npq
132%dbc%asd%ijk
132
123%abc
12%3%
12345
12
234
23456
;
run;

proc sql;
create table root as
select distinct scan(tranwrd(like,'%','% '),1,' ') as root
from foo
having substr(reverse(trim(calculated root)),1,1)='%'
;
create table root1 as
select distinct a.root
from root a, root b
group by a.root
having max(compress(a.root,'%') contains trim(left(compress(b.root,'%')))
    and a.root ne b.root)=0
;
create table foo as
select a.*,
case when substr(a.like,1,length(b.root)-1)=compress(b.root,'%') then b.root
     else ' ' end as grp
from foo a, root1 b
having calculated grp ne ' '
;
create table foo as
select *
from foo
group by grp
having like contains trim(grp) and
   length(like)=min(length(case when like contains trim(grp) then like
     else repeat('*',100) end))
;

proc print;
run;
-----------------
  like      grp

 12%3%      12%
 132%xyz    132%
 jabba%     jabba%
 xyz%abc    xyz%

My assumptions are:

. There is always a common 'root' for similar strings,
  this 'root' has to be ended with a '%'. Therefore,
  jabba% is a root, but jabba is not.
. There is no strings with same 'root' and equal length,
  such as xyz%abc and xyz%bcd.
. Similar string without a 'root' will be dropped.
  For instance, 234 and 23456 are dropped because there is
  not a 234%.

I think regular express function might be better
for searching a common root, but I'm not good at those
function, so I can only build my code based on old fashion
string manipulation functions.

I know this is not even close to perfection, especially
if you have other roles in your mind. I hope this can provide
you some hints.

Kind regards,

Ya Huang

-----Original Message-----

Sent: Friday, June 20, 2003 3:16 PM

Subject: Re: delete rows based on a column being like-compared to other

rows

I provided Ya Huang this additional clarification.

of data

jabba
jabba%
jabba%the%hut
xyz%abc
132%xyz
123%abc
12%3%

I would want to reduce to

jabba%
xyz%abc
132%xyz
12%3%

--
Richard A. DeVenezia, http://www.devenezia.com

 
 
 

SQL: delete rows based on a column being like-compared to other rows

Post by Richard A. DeVenezi » Wed, 25 Jun 2003 10:57:19


Ya:

Thanks for all the investigations.
The solution I worked up involves an outer join.  But my problem space (N)
is small enough to not be worried about the size of the OJ (N**2).

data foo;
length like $20 deleteThis 8;
input like ;
cards;
jabba
jabba%
jabba%the%hut
xyz%abc
xyz%abc%def
132%xyz
132%xyz%npq
132%dbc%asd%ijk
132
123%abc
12%3%
12345
12
234
23456
;
run;

proc sql ;

  * likematrix is not used, but it demonstrates what the outer join looks
like
  * as well as the 'like-ness' measures;

  create table likematrix as
  select x.like as x
       , y.like as y
       , trim(x.like) like trim(y.like) as x_like_y
       , x.like eq y.like as x_equal_y
  from   foo as x
       , foo as y
  ;

  reset undo_policy=none;

  * delete all likes that are explicitly like a different like;

  delete from foo where
  like in (select x.like
           from foo as x
              , foo as y
           where trim(x.like) like trim(y.like)
             and (x.like ne y.like)
          ) ;
  quit;

--
Richard A. DeVenezia, http://www.devenezia.com


> Richard,

> I just came out the following code. Under some strict assumptions,
> it seems to work:

> data foo;
> length like $20;
> input like ;
> cards;
> jabba
> jabba%
> jabba%the%hut
> xyz%abc
> xyz%abc%def
> 132%xyz
> 132%xyz%npq
> 132%dbc%asd%ijk
> 132
> 123%abc
> 12%3%
> 12345
> 12
> 234
> 23456
> ;
> run;

> proc sql;
> create table root as
> select distinct scan(tranwrd(like,'%','% '),1,' ') as root
> from foo
> having substr(reverse(trim(calculated root)),1,1)='%'
> ;
> create table root1 as
> select distinct a.root
> from root a, root b
> group by a.root
> having max(compress(a.root,'%') contains trim(left(compress(b.root,'%')))
>     and a.root ne b.root)=0
> ;
> create table foo as
> select a.*,
> case when substr(a.like,1,length(b.root)-1)=compress(b.root,'%') then
b.root
>      else ' ' end as grp
> from foo a, root1 b
> having calculated grp ne ' '
> ;
> create table foo as
> select *
> from foo
> group by grp
> having like contains trim(grp) and
>    length(like)=min(length(case when like contains trim(grp) then like
>      else repeat('*',100) end))
> ;

> proc print;
> run;
> -----------------
>   like      grp

>  12%3%      12%
>  132%xyz    132%
>  jabba%     jabba%
>  xyz%abc    xyz%

> My assumptions are:

> . There is always a common 'root' for similar strings,
>   this 'root' has to be ended with a '%'. Therefore,
>   jabba% is a root, but jabba is not.
> . There is no strings with same 'root' and equal length,
>   such as xyz%abc and xyz%bcd.
> . Similar string without a 'root' will be dropped.
>   For instance, 234 and 23456 are dropped because there is
>   not a 234%.

> I think regular express function might be better
> for searching a common root, but I'm not good at those
> function, so I can only build my code based on old fashion
> string manipulation functions.

> I know this is not even close to perfection, especially
> if you have other roles in your mind. I hope this can provide
> you some hints.

> Kind regards,

> Ya Huang

> -----Original Message-----

> Sent: Friday, June 20, 2003 3:16 PM

> Subject: Re: delete rows based on a column being like-compared to other
> rows

> I provided Ya Huang this additional clarification.

> of data

> jabba
> jabba%
> jabba%the%hut
> xyz%abc
> 132%xyz
> 123%abc
> 12%3%

> I would want to reduce to

> jabba%
> xyz%abc
> 132%xyz
> 12%3%

> --
> Richard A. DeVenezia, http://www.devenezia.com

 
 
 

1. VBA code to delete row based on vaule of one cell within that row?

HI!
I am writing a little macro in Excel 97 and I have hit a little block.
I am sure this is do-able, I just lack the knowledge.
I have a cell (well actually I have a worksheet full of cells, but I
think I can work out the looping from one cell to the next...) that
contains a numerical value that is either positive or negative.
If the value is 0 or negitive I want the macro to delete the entire row.
Any suggestions? Thanks in advance!

Galen

2. HSM cache management

3. deleting duplicate rows and inserting rows based on cell differences

4. DSL & Hub

5. Delete the Rows Between the Last Row in Column A and the Last Row of Column B

6. looking for cross compiler

7. This newsgroup really really really really likes likes likes likes likes ddddduuuupppppllllliiiiiicccccaaaatttttiiiiiiooooonnnn

8. Closing MSDE database?

9. Delete entire column every time encounter N/A in row n

10. DELETE BLANK ROW AND COLUMN

11. VBA code to delete entire row based on missing value

12. EXCEL'97 VBA- Find and Delete Rows based on data in spreadsheet

13. Deleting Matching Rows Using SQL