Uhhh ... I'm really new to SQL Prrgramming, but since T-SQL supports loops
and branching, couldn't you write a recursive stored procedure that takes a
delimited string and a cursor as a parameters and returns the index of the
starting words of successful matches
In psuedo-code:
Declaration:
my_proc(string:words_to_match;cursor:my_cursor)
if length(words_match) = 0
return index of current my_cursor position
if word at my_cursor position = first word in delimited list
then
increment cursor
return my_proc(words_to_match - first word,my_cursor)
else
return -1
used something like
set my_cursor to first record
while my_cursor not at EOF
index_of_match = my_proc("string,to,match",my_cursor)
if index_of_match > 0
then insert into match table
increment cursor
This is only a very rough approximation of how the recursion would work. You
would need parse the string and translate everything into T-SQL, and debug
the heck out of it since recursion loves to overflow the stack if you don't
provide a correct exit.
Also, I am new to T-SQL and maybe none of this is supported.
Bob
> I have a 30 million record table containing (in sequence) all of the
> words from a set of texts. Looks like the following:
> ID word
> 1 Once
> 2 upon
> 3 a
> 4 time
> 5 there
> 6 was
> . . . .
> I want to be able to find sequences of words, i.e. all of the three
> word strings having the pattern:
> ate the [xxx]
> or all of the four word strings with the pattern:
> threw the [xxx] away
> The problem, of course, is that the words are in different (though
> adjacent and sequential) records. I've been playing around with
> subqueries, i.e.:
> select word
> from [table]
> where ID in
> (select ID from [table] where word = 'threw')
> and
> ID+1 in
> (select ID from [table] where word = 'the')
> and
> ID+3 in
> (select ID from [table] where word = 'away')
> But this is obviously fundamentally flawed.
> P.S. One option is to SELECT all of the records with WORD1, WORD2, and
> WORD 3 INTO three separate temp tables, and then do something like:
> select table1.word, table2.word, table3.word
> from table1, table2, table3
> where table1.word = 'threw' and table2.word = 'the' and
> table3.word = 'away'
> and table1.ID = table2.ID-1 and table1.ID = table3.ID-3
> But this is going to be too slow for my needs, especially since the
> table will eventually have 100 million records for a 100 million word
> corpus of texts.
> Any possible solutions?? Thanks in advance for your help.
Quote:> I have a 30 million record table containing (in sequence) all of the
> words from a set of texts. Looks like the following:
> ID word
> 1 Once
> 2 upon
> 3 a
> 4 time
> 5 there
> 6 was
> . . . .
> I want to be able to find sequences of words, i.e. all of the three
> word strings having the pattern:
> ate the [xxx]
> or all of the four word strings with the pattern:
> threw the [xxx] away
> The problem, of course, is that the words are in different (though
> adjacent and sequential) records. I've been playing around with
> subqueries, i.e.:
> select word
> from [table]
> where ID in
> (select ID from [table] where word = 'threw')
> and
> ID+1 in
> (select ID from [table] where word = 'the')
> and
> ID+3 in
> (select ID from [table] where word = 'away')
> But this is obviously fundamentally flawed.
> P.S. One option is to SELECT all of the records with WORD1, WORD2, and
> WORD 3 INTO three separate temp tables, and then do something like:
> select table1.word, table2.word, table3.word
> from table1, table2, table3
> where table1.word = 'threw' and table2.word = 'the' and
> table3.word = 'away'
> and table1.ID = table2.ID-1 and table1.ID = table3.ID-3
> But this is going to be too slow for my needs, especially since the
> table will eventually have 100 million records for a 100 million word
> corpus of texts.
> Any possible solutions?? Thanks in advance for your help.