Author |
Message
|
giorginus80 |
Posted: Tue Feb 03, 2009 10:27 am Post subject: JCN Unzip Change the encoding? |
|
|
 Centurion
Joined: 08 Jul 2008 Posts: 124 Location: Rome, Italy
|
Hello,
I have a Java Compute Node that Unzip the files, the file is in the BLOB domain, I unzip it with standard java api, in the file there are lot of tif images and 1 txt file fixed length. I did the the mxsd for this fixed length, but if I unzip this txt from the blob, it seems to be different if I unzip it in the broker. I don't know why, but I use broker under linux, and if I unzip the file with JCN I have a file, if I unzip with windows (from my machine) I have a file with different encoding I think, because something is different but I can't see it if I open both, and when I'm going to parse it in the MRM I got a ParseException. |
|
Back to top |
|
 |
smdavies99 |
Posted: Tue Feb 03, 2009 11:03 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Perhaps it is because the unzip of a .txt file within Broker is created in Unicode. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Feb 03, 2009 12:18 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I don't know what the problem is, but I can help you to think clearly about it.
There are only two variables here:
- the bytes which emerge from the Java unzip routine
- the code page which you tell the MRM parser to use
If those two are the same on Linux/other then the result of the parse will be the same. |
|
Back to top |
|
 |
giorginus80 |
Posted: Wed Feb 04, 2009 1:39 am Post subject: |
|
|
 Centurion
Joined: 08 Jul 2008 Posts: 124 Location: Rome, Italy
|
kimbert wrote: |
I don't know what the problem is, but I can help you to think clearly about it.
There are only two variables here:
- the bytes which emerge from the Java unzip routine
- the code page which you tell the MRM parser to use
If those two are the same on Linux/other then the result of the parse will be the same. |
may be isn't a broker problem, because if I unzip it with windows (unzipped with Unzip software I mean) and if I unzip it with linux (NOT with broker or java, but with system unzip) the files are different. May be the ccsid of S.O.? |
|
Back to top |
|
 |
kimbert |
Posted: Wed Feb 04, 2009 2:08 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
if I unzip it with windows (unzipped with Unzip software I mean) and if I unzip it with linux (NOT with broker or java, but with system unzip) the files are different |
How are you viewing/comparing them? What are the differences? |
|
Back to top |
|
 |
giorginus80 |
Posted: Wed Feb 04, 2009 2:12 am Post subject: |
|
|
 Centurion
Joined: 08 Jul 2008 Posts: 124 Location: Rome, Italy
|
kimbert wrote: |
Quote: |
if I unzip it with windows (unzipped with Unzip software I mean) and if I unzip it with linux (NOT with broker or java, but with system unzip) the files are different |
How are you viewing/comparing them? What are the differences? |
It seems the same files at first watch, but I got an error parsing so I was wondering on it! I watched it with notepad, and I saw the single bytes with a java routine...I don't know how to show it here, I can't do attachments. but may be there are no <cr><lf> at the end of file. If I see it with wordpad they are the same, but if you see it with notepad are different, and if you get the bytes with a java routine, they are different. |
|
Back to top |
|
 |
giorginus80 |
Posted: Wed Feb 04, 2009 2:30 am Post subject: |
|
|
 Centurion
Joined: 08 Jul 2008 Posts: 124 Location: Rome, Italy
|
|
Back to top |
|
 |
giorginus80 |
Posted: Wed Feb 04, 2009 2:31 am Post subject: |
|
|
 Centurion
Joined: 08 Jul 2008 Posts: 124 Location: Rome, Italy
|
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Feb 04, 2009 3:14 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
It looks like what you are displaying is hex and I did not see any difference.
Are you using something like <lf>, or <cr><lf> as a delimiter?
Unix/Linux and Windows treat cursor/linefeed differently. This is why it will be important for you to specify the CCSID of the BLOB before translating it into text...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
giorginus80 |
Posted: Wed Feb 04, 2009 3:19 am Post subject: |
|
|
 Centurion
Joined: 08 Jul 2008 Posts: 124 Location: Rome, Italy
|
fjb_saper wrote: |
It looks like what you are displaying is hex and I did not see any difference.
Are you using something like <lf>, or <cr><lf> as a delimiter?
Unix/Linux and Windows treat cursor/linefeed differently. This is why it will be important for you to specify the CCSID of the BLOB before translating it into text...  |
yes I use <cr><lf> how can I pass the problem? |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Feb 04, 2009 3:28 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
fjb_saper wrote: |
This is why it will be important for you to specify the CCSID of the BLOB before translating it into text...  |
or before parsing...
You need to specify the CCSID of your content / bitstream before parsing...
It's a shame you zipped it because that way there is no automatic CCSID transformation for text... but ...
I suggest you extract it as a BLOB, and then parse it using:
Create field ... Parse (blob, CCSID 437, ...)... or your windows CCSID...
This might just work...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|