Complex string replacement of multiple values(?) Argh!

Complex string replacement of multiple values(?) Argh!

Post by Michael MacGrego » Wed, 10 Apr 2002 23:13:50



I'm going batty trying to figure this one out, in fact I thought I had found
a solution last week but when I revisited this to actually implement it
yesterday, my presumed solution was not at all, a solution that is, i.e. it
didn't work!

Here's the scenario (sorry narrative first then DDL and all the good stuff):
I have data that has been imported from another source and has to be
massaged before it can be moved into our production tables. There is a
Description column that contains HTML codes that we need to be translated
into their actual character representations, e.g. " should become ".
The problem I have encountered is that there could be several different HTML
codes in the Description and I can't seem to find an easy way to replace all
of them at one go.

Ok, DDL and stuff:

CREATE TABLE Items
 (ItemID   INT PRIMARY KEY,
 Description VARCHAR(100))

CREATE TABLE HTMLCodes
 (HTMLCode CHAR(6) PRIMARY KEY,
 HTMLChar CHAR(1))

INSERT INTO Items
 VALUES (1, 'Some HTML Codes " ™ ½')

INSERT INTO Items
 VALUES (2, 'Some more HTML Codes " ¼ ¾')

INSERT INTO HTMLCodes
 VALUES ('"', '"')

INSERT INTO HTMLCodes
 VALUES ('™', 'T')

INSERT INTO dbo.HTMLMapping
 VALUES ('¼', '?')

INSERT INTO dbo.HTMLMapping
 VALUES ('½', '?')

INSERT INTO dbo.HTMLMapping
 VALUES ('¾', '?')

Obviously what I would like to see is this:

ItemID      Description
-------      --------------------------------------------
1              Some HTML Codes " T ?
2              Some more HTML Codes " ? ?

I tried various UPDATEs using REPLACE(Description, HTMLCode, HTMLChar),
here's one example I tried:

UPDATE Items SET
 Description = REPLACE(Description, HTMLCode, HTMLChar)
 FROM Items AS I JOIN HTMLCodes AS H ON I.Description LIKE '%' + H.HTMLCode
+ '%'

But of course this doesn't work, well not completely, it updates one of the
HTML Codes found but not all of them.

Any idea as to how I can do this?

TIA

Michael MacGregor
Database Architect
SalesDriver

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by lindawi » Wed, 10 Apr 2002 23:54:54


Mike,





    UPDATE  Items SET
            Description = REPLACE(Description, HTMLCode, HTMLChar)
    FROM    Items AS I
    JOIN    HTMLCodes AS H
    ON      I.Description LIKE '%' + H.HTMLCode + '%'

end
SELECT * FROM Items
go
drop table HTMLCodes
drop table Items

Linda

Quote:> I'm going batty trying to figure this one out, in fact I thought I
> had found a solution last week but when I revisited this to
> actually implement it yesterday, my presumed solution was not at
> all, a solution that is, i.e. it didn't work!

> Here's the scenario (sorry narrative first then DDL and all the
> good stuff): I have data that has been imported from another source
> and has to be massaged before it can be moved into our production
> tables. There is a Description column that contains HTML codes that
> we need to be translated into their actual character
> representations, e.g. " should become ". The problem I have
> encountered is that there could be several different HTML codes in
> the Description and I can't seem to find an easy way to replace all
> of them at one go.

> Ok, DDL and stuff:

> CREATE TABLE Items
>  (ItemID   INT PRIMARY KEY,
>  Description VARCHAR(100))

> CREATE TABLE HTMLCodes
>  (HTMLCode CHAR(6) PRIMARY KEY,
>  HTMLChar CHAR(1))

> INSERT INTO Items
>  VALUES (1, 'Some HTML Codes " ™ ½')

> INSERT INTO Items
>  VALUES (2, 'Some more HTML Codes " ¼ ¾')

> INSERT INTO HTMLCodes
>  VALUES ('"', '"')

> INSERT INTO HTMLCodes
>  VALUES ('™', 'T')

> INSERT INTO dbo.HTMLMapping
>  VALUES ('¼', '?')

> INSERT INTO dbo.HTMLMapping
>  VALUES ('½', '?')

> INSERT INTO dbo.HTMLMapping
>  VALUES ('¾', '?')

> Obviously what I would like to see is this:

> ItemID      Description
> -------      --------------------------------------------
> 1              Some HTML Codes " T ?
> 2              Some more HTML Codes " ? ?

> I tried various UPDATEs using REPLACE(Description, HTMLCode,
> HTMLChar), here's one example I tried:

> UPDATE Items SET
>  Description = REPLACE(Description, HTMLCode, HTMLChar)
>  FROM Items AS I JOIN HTMLCodes AS H ON I.Description LIKE '%' +
> H.HTMLCode + '%'

> But of course this doesn't work, well not completely, it updates
> one of the HTML Codes found but not all of them.

> Any idea as to how I can do this?

> TIA

> Michael MacGregor
> Database Architect
> SalesDriver


 
 
 

Complex string replacement of multiple values(?) Argh!

Post by dorianwol.. » Wed, 10 Apr 2002 23:51:00


Michael,

You'll need a cursor
based on the HTMLCodes table, and then issue an update statement for every loop in the cursor.

The update statement within the loop should be:

UPDATE Items SET

FROM Items where Description LIKE '%' + H.HTMLCode
+ '%'

Hope that helps

Dorian

**********************************************************************

Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Michael MacGrego » Thu, 11 Apr 2002 00:03:15


Linda,

I already tried that and performance is awful. The real Items table actually
has about 1300 rows and the Description column can contain upto 1500
characters. The loop is executed at least five times and in total takes well
over a minute to complete.

Problem is that this isn't the only massaging I have to do and I was really
hoping to cut that time down dramatically.

Michael MacGregor
Database Architect
SalesDriver

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Michael MacGrego » Thu, 11 Apr 2002 00:10:17


Dorian,

I was really, really, really hoping not to have to resort to a cursor.

I had already tried Linda's suggestion of looping the UPDATE until no more
rows are updated but the performance is terrible. I'm currently working on a
cursor solution to see how that compares but I'm still hoping that some
bright spark might come up with a brilliant and highly efficient solution.

Michael MacGregor
Database Architect
SalesDriver

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Isaac Blan » Thu, 11 Apr 2002 00:56:09


If it's SQL2000, you can create a UDF.  You can have all HTML codes
hardcoded there and loop through all possible codes one at a time.  Not
pretty, but it will all be in memory, so maybe performance won't hurt.
 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Joe Celk » Thu, 11 Apr 2002 01:29:16


Some minor DDL changes to avoid nulls and be flexible on the html_code
strings -- you had it set to give you trailing blanks.  

CREATE TABLE Items
(item_id INTEGER NOT NULLPRIMARY KEY,
 description VARCHAR(100) NOT NULL);

CREATE TABLE HTMLCodes
(html_code VARCHAR(6) NOT NULL PRIMARY KEY,
 html_char CHAR(1) NOT NULL);

Got a good "cut and paste" feature on your text editor?

UPDATE Items
  SET description
   =
   REPLACE(
    REPLACE(
    REPLACE(
      ...
     REPLACE(description, <html_code #n>, <html_char #n>),
      , <html_code #n-1>, <html_char #n-1>),
      ...
    , <html_code #2>, <html_char #2>),
   , <html_code #n>, <html_char #1>))

Someone told me how far you can go down in nesting levels in a funciton
call; but I cannot remember the number.  I think it is 32 for stored
procedures.  

If you have to write 2-3 updates to get the job done, it is still going
to be faster than a cursor.  

--CELKO--
 ===========================
 Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Steve Dassi » Thu, 11 Apr 2002 02:10:06


To easily replace parts of a string
check out Relational Application Companion at:
www.rac4sql.com
 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Michael MacGrego » Thu, 11 Apr 2002 02:13:56


Unfortunately I am totally unfamiliar with UDFs as I have not had any
recourse to use them yet. I don't particularly want to hardcode the HTML
codes either (there are at least 170+ HTML codes that I've found), I want to
be flexible enough to use a table.

Although I don't particularly like cursors, I do know how to code those and
I've tried it already and it does the job very nicely in about 20 seconds.
The looped update approach takes about 60 seconds.

As I don't have the luxury to experiment with UDFs, I'm going to go with the
cursor. Maybe later when I have some time to play, I'll try the UDF
solution.

Thanks anyway.

Michael MacGregor
Database Architect
SalesDriver

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Michael MacGrego » Thu, 11 Apr 2002 02:29:14


Joe,

As always, thanks for you response.

The DDL that I posted was simplified and I hoped, sufficient to illustrate
the problem.

I have so far found 171 HTML codes, though not all of them are actually
being used, probably only about 20 - 30 of them so far. However I did
consider the idea of nested REPLACE() but rejected it because the overall
data massaging process is complex enough already and I don't want to have
nightmare SQL to maintain (I noticed that thread about the worst SQL ever
seen and your response - you really ought to write a book on some of your
experiences you know, it could be a real treat to read!)

Multiple updates, strangely enough, do not perform as well as the cursor
approach that I have come up with, and the cursor code takes up much less
real estate in the SP.

Michael MacGregor
Database Architect
SalesDriver

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Michael MacGrego » Thu, 11 Apr 2002 02:32:01


Hey Watson,

How're you doing?

I wondered when I'd see a response from you or oj.

MTM

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Isaac Blan » Thu, 11 Apr 2002 02:44:13


    You can build your UPDATE dynamically based on the data in HTMLCodes.
It should be pretty straightforward - for each new code prefix the
expression wiht "REPLACE(" and suffix it with ",<html_code> ,<char_code>)".
And to eliminate dummy updates (when no replacements actually occured), you
can build a dynamic WHERE clause as well.  So you have a cursor on your HTML
codes, build an UPDATE string dynamically and execute it each time you reach
the 32 nesting level limit.

I think the main reason your multiple updates strategy was so slow lies in
the fact that it was updating the same row several times in each pass -
overriding each other.  So 5 iterations of the loop could amount to 15
updates.



Quote:> Joe,

> As always, thanks for you response.

> The DDL that I posted was simplified and I hoped, sufficient to illustrate
> the problem.

> I have so far found 171 HTML codes, though not all of them are actually
> being used, probably only about 20 - 30 of them so far. However I did
> consider the idea of nested REPLACE() but rejected it because the overall
> data massaging process is complex enough already and I don't want to have
> nightmare SQL to maintain (I noticed that thread about the worst SQL ever
> seen and your response - you really ought to write a book on some of your
> experiences you know, it could be a real treat to read!)

> Multiple updates, strangely enough, do not perform as well as the cursor
> approach that I have come up with, and the cursor code takes up much less
> real estate in the SP.

> Michael MacGregor
> Database Architect
> SalesDriver

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Michael MacGrego » Thu, 11 Apr 2002 03:01:12


I am averse to using dynamic SQL even more than I am averse to using
cursors.

I think that you are correct about the multiple updates and I could probably
tweak that to be more efficient by the use of nested REPLACE as per Joe's
suggestions, or dynamic SQL, but, and I reiterate, the cursor works very
nicely and is easier to read and maintain, so unless someone comes up with
an astoundingly brilliant but simple solution, I think the cursor has won
out.

Michael MacGregor
Database Architect
SalesDriver

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Michael MacGrego » Thu, 11 Apr 2002 03:07:34


Thanks to everyone so far who has responded and provided suggestions, your
input is very much appreciated.

Here's the cursor approach I've come up with that is currently leading the
solution race. If anyone can improve upon this, I would love to hear from
you.



DECLARE html_crsr CURSOR
 FOR SELECT I.ItemID, I.Description
  FROM Items AS I, HTMLCodes AS H
  WHERE I.Description LIKE '%' + H.HTMLCode + '%'
 FOR UPDATE OF I.Description

OPEN html_crsr

FETCH NEXT FROM html_crsr


BEGIN

  FROM HTMLCodes

 UPDATE Items SET

  WHERE CURRENT OF html_crsr

 FETCH NEXT FROM html_crsr

END

CLOSE html_crsr
DEALLOCATE html_crsr

The beauty of this solution, at least IMHO, is the

  FROM HTMLCodes
which actually will replace all HTMLCodes in the variable in a single pass.

If I knew more, well actually anything at all, about how to code UDFs, I'd
try to do the same thing with that but I've spent way too much time on this
already.

Michael MacGregor
Database Architect
SalesDriver

 
 
 

Complex string replacement of multiple values(?) Argh!

Post by Joe Celk » Thu, 11 Apr 2002 03:27:48


Quote:>> I have so far found 171 HTML codes, though not all of them are

actually being used, probably only about 20 - 30 of them so far. However
I did consider the idea of nested REPLACE() but rejected it because the
overall data massaging process is complex enough already and I don't
want to have nightmare SQL to maintain <<

I think you would find that the nesting is pretty easy to see if you
format the code neatly.  

Another idea: Make a pass to find all of the html actually used in the
table.  Build nested REPLACE calls using only them.  

Remember that UPDATEs cause logging, so each row is being hit several
times when you do muliple updates, but only once with a cursor or nested
replace() calls.  Maybe doing a COMMIT at then end of each UPDATE would
help?

--CELKO--
 ===========================
 Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

 
 
 

1. string replacement to allow multiple item query

I've got a query that I want to amend to allow multiple search
terms......Here it is..........had a look at quoted identifiers stuff but I
dont think thats the issue.

--------------------------------------------------------------------
begin



--add code to allow multiple search terms


ProblemAnswer.AnswerText  LIKE ''%')

--check what this gives


returns the following :      phone%' OR ProblemAnswer.AnswerText  LIKE '%tv

--so try to run the query with the new string


+'%'

--this does not return any rows

--but when I insert the text manulaly it does run the wury sucessfully

select * from problemanswer where answertext like '%phone%' OR
ProblemAnswer.AnswerText  LIKE '%tv%'

end

TIA

----------------------
Adam Frantzis

2. ODBC fails, whilst Direct ADO does not

3. Creating a complex WHERE clause in a sp with parameters that have multiple values

4. Problem

5. Creating complex WHERE statement in sp with multiple parameters and values

6. service name

7. Argh Bit Values 1/True

8. cursel 0.0.8 : form and menu interpreter

9. String replacement

10. Memo String-Replacement ?

11. PL/SQL problem: parameter replacements in string

12. String parser needed for Full-Text replacement

13. Help with String Replacements in Selects??