Removing Duplicates

Removing Duplicates

Post by Rob Forste » Thu, 24 Jun 1999 04:00:00



I'm looking for a way to remove duplicates from a table, by putting the
duplicates in another table first, and then deleting them by a join.
I've done this in Oracle using the rowid pseudo-column.
I'd rather not delete them without examining them first.

What are my options in Sybase?

Thanks,
Rob.

 
 
 

Removing Duplicates

Post by TJ00 » Fri, 25 Jun 1999 04:00:00


This might work:

Delete a
Group by a.dup_col
Having count(a.dup_col) > 1

However, I don't think this is what you need. Another way to do it is to insert
into another table with an index with ignore_dup_row. Hope this helps.


>I'm looking for a way to remove duplicates from a table, by putting the
>duplicates in another table first, and then deleting them by a join.
>I've done this in Oracle using the rowid pseudo-column.
>I'd rather not delete them without examining them first.

>What are my options in Sybase?

>Thanks,
>Rob.


 
 
 

Removing Duplicates

Post by Anthony Mandi » Fri, 25 Jun 1999 04:00:00



> I'm looking for a way to remove duplicates from a table, by putting the
> duplicates in another table first, and then deleting them by a join.
> I've done this in Oracle using the rowid pseudo-column.
> I'd rather not delete them without examining them first.

> What are my options in Sybase?

        You can check for duplicates with a group by clause and "having
        count(X) > 1". The easiest way to delete them is by creating a
        clustered index that has the ignore_dup_row option set.

-am

 
 
 

1. NULLs and DISTINCT -- removing duplicates

I want to remove duplicates from a table.  I'm going to select one value
for each of a couple of unique identifying fields, but otherwise remove
duplicate rows on the basis of all the other fields.

Let's say TableX has 500,000 rows.

SELECT UniqueID=MIN(UniqueID),TxnTimestamp=MIN(TxnTimestamp),A,B,C,D,E
INTO DistinctTableX FROM TableX
GROUP BY A,B,C,D,E

That creates 400,000 rows.  Field D has some arbitrary IDs that I don't
want to consider (and mostly nulls), so I drop it from consideration....

SELECT UniqueID=MIN(UniqueID),TxnTimestamp=MIN(TxnTimestamp),A,B,C,E
INTO DistinctTableX FROM TableX
GROUP BY A,B,C,E

That creates 495,000 rows!  Somehow that is counter-intuiitive.  I would
have thought that ignoring a field would create more duplicates and
therefore eliminiate more rows, but instead the opposite has happened.

Can someone please explain this phenomenon?

Thanks, Jim Geissman

2. converting from Paradox to MS Access

3. Removing Duplicates

4. WordWrap in 4GL for Windows

5. remove Duplicates?

6. Connection Timeout

7. Removing Duplicate Rows?

8. Help with cross-tab data

9. remove duplicate records from a table

10. Removing duplicates from one field while sorting by a different field

11. removing duplicates from a table?

12. Trouble removing duplicates by Rank

13. Removing duplicate records