Database Design (image/txt)

Database Design (image/txt)

Post by Jason Davi » Fri, 25 Oct 2002 10:22:38


Hello,

I have hundreds of "rich" (doc/rtf/ppt/xls/pdf) and html documents.
using a special tool, "rich" documents are marked by us and html documents
are parsed and stripped (to textual content, without tags or code).

After the tool has finished, we plan to insert each file content into the
database.

I'm wondering what is the best database design for this?
should I use two columns (text and image) to store textual (parsed) content
and the rich files seperatlly? or can I use a text field for the "rich"
files as well?

I have no plan to use ms-sql full text search, just store the "rich" files
AS IS in the database for further processing.

Thanks!

 
 
 

Database Design (image/txt)

Post by John Kan » Fri, 25 Oct 2002 12:09:43


Jason,
Hmm, interesting... After using your "special tool", can you still open,
read and edit all of the "rich" (doc/rtf/ppt/xls/pdf) and html documents
with the original applications, i.e., winword.exe, excel.exe, powerpnt.exe
and Acrord32.exe?

If so, then you should be able to store these "rich" documents in a column
with an IMAGE datatype and have then correctly FT Indexed the IMAGE column..
If not, then you best bet is "strip" out the raw text from these
altered/rich documents and put the text in a column defined with either
varchar or TEXT (depending upon size) and then place the documents in an
IMAGE column and FT Index the varchar or TEXT column.

Even if you don't use FTS, the above recommendations are good for all
readers of this fulltext newsgroup.
Regards,
John


Quote:> Hello,

> I have hundreds of "rich" (doc/rtf/ppt/xls/pdf) and html documents.
> using a special tool, "rich" documents are marked by us and html documents
> are parsed and stripped (to textual content, without tags or code).

> After the tool has finished, we plan to insert each file content into the
> database.

> I'm wondering what is the best database design for this?
> should I use two columns (text and image) to store textual (parsed)
content
> and the rich files seperatlly? or can I use a text field for the "rich"
> files as well?

> I have no plan to use ms-sql full text search, just store the "rich" files
> AS IS in the database for further processing.

> Thanks!


 
 
 

1. Database design (image/text)

Hello,

I have hundreds of "rich" (doc/rtf/ppt/xls/pdf) and html documents.
using a special tool, "rich" documents are marked by us and html documents
are parsed and stripped (to textual content, without tags or code).

After the tool has finished, we plan to insert each file content into the
database.

I'm wondering what is the best database design for this?
should I use two columns (text and image) to store textual (parsed) content
and the rich files seperatlly? or can I use a text field for the "rich"
files as well?

I have no plan to use ms-sql full text search, just store the "rich" files
AS IS in the database for further processing.

Thanks!

2. EPICK Alternative

3. Seeking advice on database table design for storing images

4. Patch for multi-key GiST

5. single quote

6. Q:how to design and build image databases?

7. RBS full

8. Image database design problem

9. Info on designing image databases required

10. Exporting tbl to txt: Naming txt file

11. Exporting from TBL to TXT: Renaming TXT file

12. ODBC API calls from VB4.0 Professional Edition - ODBC16.TXT and ODBC32.TXT