Best way to validate/process this lang-like data list?

Best way to validate/process this lang-like data list?

Post by M Sweg » Wed, 04 Oct 2000 04:00:00



--
        Mike,

 
 
 

Best way to validate/process this lang-like data list?

Post by M Sweg » Wed, 04 Oct 2000 04:00:00


Hi,
   I'm trying to write a program that can read a data table that is used as a
template to validate an incoming data sequence agains't this structure and
generate error messages. The data table describing the structure only contains
two language-like patterns: a statement sequence, and a nested loop. The data
pattern read in, is compared agains't this structure and error messages are
generated stating the current data value is out-of-order or unknown when there
is a mismatch. This would imply some sort of state machine which would be built
on the fly when the template is read in so that incoming data can sequence
through it. Another possibility is to read  in the template and treat it as
a regular expression and generate alot of REXP rules and use a generalized
REXP algorithm (any REXP algorithms that I should consider here?). An additionalpossibility would be, some sort of tree structure (any suggestions or combos
of tree sturcture types) with a recursion to handle the loops - this is still
vague in concept yet but similar to Abstract syntax trees in compilers.
Note: there will be no instruction set generation or optimization since it
isn't computing anything, although code could be generated and compiled for the
template read-in to speed up the interpretation/validation of the incoming data
pattern. Moreover, I don't want to use Lex/Yacc since it can't handle dynamic
Lex rules and grammar unless a generalized algorithm can be written that is
applicable to the two basic language-like patterns such that it runs the
state machine like algorithm to buld a virtual state machine in memory
for validation purposes of the incoming data.

The other thing to consider is that the patterns in the template can be
mandatory or optional on a per pattern entry basis. The only restriction being
that if there is an end-loop data pattern, the loop start pattern must be
present. The same goes for any pattern within the loop, that the loop start
pattern must be present; otherwise all data patterns that make up the loop -
including the loop end pattern - can be optional. If the start loop pattern
is not present [optional] then all other patterns that make up the loop must
not be present [optional/mandatory], whereby the mandatory is only
applicable if the loop is present in the incoming data pattern.

One use of this algorithm is to validate the structure of a data file that
resembles a language-like structure so that once validated the data can be
extracted and processed. An example would be a DBMS that dumps multiple
tables in a certain order with single/multiple records into a merged file. Then
once it gets to another DBMS, this algorithm validates and extracts the merged
data and loads it into the appropriate DBMS table for that data pattern. Thus
one can think of the data pattern as being the table name or some alias
-assuming the two DBMS' tables aren't the same. Thus, each application
controlling their respective DBMS would know about the alias data pattern and
know which table to put it in based on the alis to table name mapping.

Here is an example of what the data pattern template structure might look like
graphically and by which the incoming merged data file containing these data
patterns would have to be validated agains't. The possible
solutions to easily do it,  is what I'm not sure about.

Any suggestions are appreciated.

Notes:  The D# is only for purposes here to give line #'s and won't be
        validated. The letters A0,A-F are the data patterns. The numbers
        after the brackets illustrates the loop structure and nesting
        along with the number of times maximum this loop can be repeated.
        The mandatory/optional requirement isn't shown for each data
        pattern, but controls whether the data pattern in the sequence
        needs to be validated or not to determine if the validation
        algorithm needs to generate an error message or just skip over it.

D0      A0

D1      A ------------------------------,
D2      B  --------------,              |
D3      C  --,           |              |
D3      C    |    3      |  20          |
D3      C  --' ----------,              |
D2      B                               |
D3      C                               |
D3      C                               |
D3      C                               |
                                        |
        .                               |
        .                               |         99
        .                               |
D4      D  -----,                       |
D4      D       |     10                |
D4      D       |                       |
D4      D -----'                        |
        .                               |
        .                               |
        .                               |
                                        |
D5      E ------------------------------'

D6      F

--
        Mike,


 
 
 

Best way to validate/process this lang-like data list?

Post by M Sweg » Fri, 06 Oct 2000 04:00:00


Hi,
   I'm trying to write a program that can read a data table that is used as a
template to validate an incoming data sequence agains't this structure and
generate error messages. The data table describing the structure only contains
two language-like patterns: a statement sequence, and a nested loop. The data
pattern read in, is compared agains't this structure and error messages are
generated stating the current data value is out-of-order or unknown when there
is a mismatch. This would imply some sort of state machine which would be built
on the fly when the template is read in so that incoming data can sequence
through it. Another possibility is to read  in the template and treat it as
a regular expression and generate alot of REXP rules and use a generalized
REXP algorithm (any REXP algorithms that I should consider here?). An
additional possibility would be, some sort of tree structure (any
suggestions or combos of tree sturcture types) with a recursion to handle
the loops - this is still vague in concept yet but similar to Abstract
syntax trees in compilers.

Note: there will be no instruction set generation or optimization since it
isn't computing anything, although code could be generated and compiled for the
template read-in to speed up the interpretation/validation of the incoming data
pattern. Moreover, I don't want to use Lex/Yacc since it can't handle dynamic
Lex rules and grammar unless a generalized algorithm can be written that is
applicable to the two basic language-like patterns such that it runs the
state machine like algorithm to buld a virtual state machine in memory
for validation purposes of the incoming data.

The other thing to consider is that the patterns in the template can be
mandatory or optional on a per pattern entry basis. The only restriction being
that if there is an end-loop data pattern, the loop start pattern must be
present. The same goes for any pattern within the loop, that the loop start
pattern must be present; otherwise all data patterns that make up the loop -
including the loop end pattern - can be optional. If the start loop pattern
is not present [optional] then all other patterns that make up the loop must
not be present [optional/mandatory], whereby the mandatory is only
applicable if the loop is present in the incoming data pattern.

One use of this algorithm is to validate the structure of a data file that
resembles a language-like structure so that once validated the data can be
extracted and processed. An example would be a DBMS that dumps multiple
tables in a certain order with single/multiple records into a merged file. Then
once it gets to another DBMS, this algorithm validates and extracts the merged
data and loads it into the appropriate DBMS table for that data pattern. Thus
one can think of the data pattern as being the table name or some alias
-assuming the two DBMS' tables aren't the same. Thus, each application
controlling their respective DBMS would know about the alias data pattern and
know which table to put it in based on the alis to table name mapping.

Here is an example of what the data pattern template structure might look like
graphically and by which the incoming merged data file containing these data
patterns would have to be validated agains't. The possible
solutions to easily do it,  is what I'm not sure about.

Any suggestions are appreciated.

Notes:  The D# is only for purposes here to give line #'s and won't be
        validated. The letters A0,A-F are the data patterns. The numbers
        after the brackets illustrates the loop structure and nesting
        along with the number of times maximum this loop can be repeated.
        The mandatory/optional requirement isn't shown for each data
        pattern, but controls whether the data pattern in the sequence
        needs to be validated or not to determine if the validation
        algorithm needs to generate an error message or just skip over it.

D0      A0

D1      A ------------------------------,
D2      B  --------------,              |
D3      C  --,           |              |
D3      C    |    3      |  20          |
D3      C  --' ----------,              |
D2      B                               |
D3      C                               |
D3      C                               |
D3      C                               |
                                        |
        .                               |
        .                               |         99
        .                               |
D4      D  -----,                       |
D4      D       |     10                |
D4      D       |                       |
D4      D -----'                        |
        .                               |
        .                               |
        .                               |
                                        |
D5      E ------------------------------'

D6      F

--
        Mike,