On Fri, 2006-06-16 at 10:10 -0400, A LeDonne wrote: > > > --- Craig White wrote: > > > > > > > I have a text file which I exported in tab delimited format from > > > > Filemaker Pro on Windows, cleaned up in openoffice.org and want to > > > > import into postgres. > > > > > > > > the first few characters in the file are killing me and I haven't a clue > > > > on how to rid the file of them... > > > > > > > > 0000000 357 273 277 1 \t B l o o d B o r n e > > > > > > > > it's the 357 273 277 that don't belong...the data should start with "1" > > > > > > > > where did they come from and how do I get rid of them? > > > > > > > > Craig > > To answer part one of your question, those three bytes (Hex: EF BB BF) > are a UTF-8 encoded Byte Order Mark. ( > http://www.unicode.org/faq/utf_bom.html#BOM ). They're an indicator > that the file you're looking at is, in fact, UTF-8-encoded Unicode > text, rather than something in some other local codepage. Notepad.exe > adds them as a matter of course when saving as Unicode text; perhaps > OO.o is adding them when it exports to UTF-8 text as well. > > Unicode-compliant text processors will ignore the BOM when considering > text. If there's a way to tell the Postgres import process that the > file is UTF-8, the import *should* ignore those bytes completely. > > Or you can safely remove them any time they appear in a text stream, > if you no longer need signalling in the stream that it is UTF-8 > encoded. (The BOM is "default ignorable", and should never appear in > the midst of Unicode text.) ---- thanks for the info. I was able to remove them (the UTF-8 BOM) with vi whereas kate/emacs/etc. simply gave no indication that they were there and when postgres gagged on the start of the file, 'od' was a good viewer to tell me what I was dealing with. I have changed the methodology of cleaning up the exported text and thankfully, Notepad.exe is no longer part of the process ;-) I only brought in Notepad.exe because of something that I can't explain within openoffice.org... I could use regular expressions to use "\n" as a [return/linefeed] in OOo's 'Replace' but couldn't figure out how to 'Find' "\n" - I finally gave up. It does have a really nice feature '^$' to find blank lines though so I had to shift my thinking and now I am working. Craig --------------------------------------------------- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change you mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss