Author |
Message
|
mqbrks |
Posted: Wed Oct 24, 2018 10:56 am Post subject: |
|
|
Voyager
Joined: 17 Jan 2012 Posts: 75
|
Vitor wrote: |
Connect Direct doesn't do a binary to UTF-8 conversion; it either does an EBCDIC to ASCII conversion or no conversion. If it's converting a file with binary data as if the binary is a set of EBCDIC characters, all sorts of weirdness will occur. Like strange EOF marks turning up in the middle of a file.
|
This is the process that is being used by CD. It is converting the IBM-037 to UTF as we had several data problem when being sent as binary with lot of invalid characters sent by mainframes. UTF-8 is taken as smiley in the below code snippet.
COPYCONR PROCESS
STEP1 COPY FROM(&NODE DSN=&DSN1 DISP=SHR -
SYSOPTS="CODEPAGE=(IBM-037,UTF- ") -
TO ( DSN=&DSN2 DISP=(&DISP1,&DISP2) -
SYSOPTS=":datatype=text:xlate=no:strip.blanks=no:")
IF (STEP1=0) THEN
STEP2 RUN TASK SNODE (PGM=UNIX) SYSOPTS="mv &DSN2 &DSN3"
EIF
Vitor wrote: |
Get whoever owns the mainframe JCL that's doing the Connect Direct transfer to add a Sort jobstep before the Connect Direct jobstep. It's 5 lines of JCL and half a dozen sort control cards. The Connect Direct step is probably double that. |
Yeah this requires a change on mainframes and resources aren't available to make any changes  |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 24, 2018 11:15 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqbrks wrote: |
This is the process that is being used by CD. It is converting the IBM-037 to UTF as we had several data problem when being sent as binary with lot of invalid characters sent by mainframes. UTF-8 is taken as smiley in the below code snippet.
COPYCONR PROCESS
STEP1 COPY FROM(&NODE DSN=&DSN1 DISP=SHR -
SYSOPTS="CODEPAGE=(IBM-037,UTF- ") -
TO ( DSN=&DSN2 DISP=(&DISP1,&DISP2) -
SYSOPTS=":datatype=text:xlate=no:strip.blanks=no:")
IF (STEP1=0) THEN
STEP2 RUN TASK SNODE (PGM=UNIX) SYSOPTS="mv &DSN2 &DSN3"
EIF
|
Exactly my point. That 'datatype=text' is telling Connect Direct that the file is composed of nothing but EBCDIC characters and it should move them all from CCSID 037 to UTF-8. If the mainframe file is not in fact all text then this conversion will result in spurious character sequences and the results you are seeing.
mqbrks wrote: |
It is converting the IBM-037 to UTF as we had several data problem when being sent as binary with lot of invalid characters sent by mainframes |
You mean all of the characters?
EBCDIC (IBM-037) has nothing in common with ASCII (and UTF-8 is just a superset of ASCII). If you move an entirely text file from the mainframe, as binary, onto a Unix or any other distributed box, it will appear to be gibberish with a limited number of printable characters and almost no alphanumerics. This is simply because alaphnumerics are represented by different hex values in EBCDIC.
mqbrks wrote: |
Vitor wrote: |
Get whoever owns the mainframe JCL that's doing the Connect Direct transfer to add a Sort jobstep before the Connect Direct jobstep. It's 5 lines of JCL and half a dozen sort control cards. The Connect Direct step is probably double that. |
Yeah this requires a change on mainframes and resources aren't available to make any changes  |
Then you're doomed. The fix to your XML truncation problem is to change the Connect Direct datatype to binary and remove that codepage clause (which if memory serves me causes a syntax error if the datatype isn't text). You can then read the file through the File Input node by setting the File Input node code page to '037' not 'Broker Default'.
If you can't make that 2 line change because there are no resources then learn to live with this truncation problem.
If you find someone, get them to cut and paste this above the Connect Direct step (the EXEC card above where those parameters go into SYSIN):
Code: |
//SIMPLE EXEC PGM=SORT
//*
//* THIS IS MUCH MORE EFFICIENT THAN DOING A SORT IN IIB
//*
//SORTIN DD DSN=&DSN1,DISP=(MOD,KEEP,KEEP)
//SORTOUT DD DSN=&DSN1,DISP=(MOD,KEEP,KEEP)
//SYSOUT DD SYSOUT=*
//SYSIN DD *
however the file needs to be sorted
/*
|
Normally I charge $$ for coding. You're welcome. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Oct 24, 2018 12:53 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
mqbrks wrote: |
This is the process that is being used by CD. It is converting the IBM-037 to UTF as we had several data problem when being sent as binary with lot of invalid characters sent by mainframes. UTF-8 is taken as smiley in the below code snippet.
COPYCONR PROCESS
STEP1 COPY FROM(&NODE DSN=&DSN1 DISP=SHR -
SYSOPTS="CODEPAGE=(IBM-037,UTF- ") -
TO ( DSN=&DSN2 DISP=(&DISP1,&DISP2) -
SYSOPTS=":datatype=text:xlate=no:strip.blanks=no:")
IF (STEP1=0) THEN
STEP2 RUN TASK SNODE (PGM=UNIX) SYSOPTS="mv &DSN2 &DSN3"
EIF
|
I beg you to notice that in the SYSOPTS for the copy process you have specified XLATE=NO.
So why would you expect to see the text in UTF-8 at the other end if you specifically told Connect Direct not to translate it???  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Vitor |
Posted: Thu Oct 25, 2018 5:07 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Specifying 2 code pages causes code page transformation. Irrespective of xlate setting.
Don't ask - apparently it's a "feature".
We send everything through Connect Direct as binary and figure it out someplace else, and use Connect Direct for the management & auditing capabilities. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqbrks |
Posted: Thu Oct 25, 2018 6:09 pm Post subject: |
|
|
Voyager
Joined: 17 Jan 2012 Posts: 75
|
Vitor wrote: |
Specifying 2 code pages causes code page transformation. Irrespective of xlate setting.
Don't ask - apparently it's a "feature".
We send everything through Connect Direct as binary and figure it out someplace else, and use Connect Direct for the management & auditing capabilities. |
Really appreciate your knowledge Vitor regarding this! I am still investigating between my deadlines for other projects, Seems like the file does have invalid characters. Need to explore more.
Question : Previously we tried to receive the file as Binary but mainframes app was sending many invalid characters like || (something like pipe) which were throwing parsing errors while the data is getting converted. We approached CD solution as CD experts advised to use code page converts in CD rather than IIB DFDL which will filter most of the invalid or gibberish characters. How can IIB filter the gibberish characters ? |
|
Back to top |
|
 |
Vitor |
Posted: Fri Oct 26, 2018 5:03 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqbrks wrote: |
Previously we tried to receive the file as Binary but mainframes app was sending many invalid characters like || (something like pipe) which were throwing parsing errors while the data is getting converted. |
Vitor wrote: |
EBCDIC (IBM-037) has nothing in common with ASCII (and UTF-8 is just a superset of ASCII). If you move an entirely text file from the mainframe, as binary, onto a Unix or any other distributed box, it will appear to be gibberish |
Notice anything? Like the use of the word "gibberish"?
mqbrks wrote: |
How can IIB filter the gibberish characters ? |
It doesn't need to filter them, it needs to correctly interpret them. Like I said:
Vitor wrote: |
You can then read the file through the File Input node by setting the File Input node code page to '037' not 'Broker Default'. |
If you tell the FileInput node to read a file and use an ASCII code page to read it (and the 'BrokerDefault' code page on any distributed platform is an ASCII one) then by the time it gets to the DFDL the message tree is hosed. Internally IIB uses UTF-16 so to build the message tree it's converting what it thinks is some kind of ASCII file into that and you'll get gibberish.
If you tell the FileInput node to use CCSID 037 (which I'm assuming is the code page the file was written in as it's the source for the Connect Direct translation), the FileInput node will correctly interpret the byte stream and you'll get a clean message tree in UTF-16. You can then feed this to the DFDL model (which should be based on the actual record layout and identify which fields are text and which fields are binary). _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
timber |
Posted: Sat Oct 27, 2018 1:12 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Just to clarify something...and please ignore if you know this already.
The DFDL parser has access to all of IIB's extensive range of character encodings. It is no less (and no more) capable than any other IIB parser in this respect. It has full access to all ICU character tables, and can therefore read and write characters in any encoding (and it will not mind reading in one encoding and writing in a different one, like any IIB parser).
I suspect that the sender is sending an *invalid* EBCDIC character stream. Sounds as if there are UTF-8 characters mixed into the EBCDIC character stream. In which case, there is no tool in the world that can handle such a stream - not Java, not CD. Only custom code that is aware of the sender's format can deal with invalid character streams, and it will require a lot of care. Usually it's a lot simpler and cheaper to send valid a character stream in the first place. |
|
Back to top |
|
 |
fjb_saper |
Posted: Sat Oct 27, 2018 8:33 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
I also saw 2 sysopts statements where I would have expected only one and some very bizarre formatting of the sysopts field:
It looks like the poster did not translate it to text and copy the full text.
The error could also have to do with form.
Did the OP right click and validate the process? _________________ MQ & Broker admin |
|
Back to top |
|
 |
|