ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexWebSphere Message Broker SupportXML Message Truncated while routing through HTTP Request

Post new topicReply to topic Goto page Previous  1, 2
XML Message Truncated while routing through HTTP Request View previous topic :: View next topic
Author Message
mqbrks
PostPosted: Wed Oct 24, 2018 10:56 am Post subject: Reply with quote

Acolyte

Joined: 17 Jan 2012
Posts: 74

Vitor wrote:

Connect Direct doesn't do a binary to UTF-8 conversion; it either does an EBCDIC to ASCII conversion or no conversion. If it's converting a file with binary data as if the binary is a set of EBCDIC characters, all sorts of weirdness will occur. Like strange EOF marks turning up in the middle of a file.


This is the process that is being used by CD. It is converting the IBM-037 to UTF as we had several data problem when being sent as binary with lot of invalid characters sent by mainframes. UTF-8 is taken as smiley in the below code snippet.

COPYCONR PROCESS
STEP1 COPY FROM(&NODE DSN=&DSN1 DISP=SHR -
SYSOPTS="CODEPAGE=(IBM-037,UTF-") -
TO ( DSN=&DSN2 DISP=(&DISP1,&DISP2) -
SYSOPTS=":datatype=text:xlate=no:strip.blanks=no:")
IF (STEP1=0) THEN
STEP2 RUN TASK SNODE (PGM=UNIX) SYSOPTS="mv &DSN2 &DSN3"
EIF

Vitor wrote:

Get whoever owns the mainframe JCL that's doing the Connect Direct transfer to add a Sort jobstep before the Connect Direct jobstep. It's 5 lines of JCL and half a dozen sort control cards. The Connect Direct step is probably double that.


Yeah this requires a change on mainframes and resources aren't available to make any changes
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Oct 24, 2018 11:15 am Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 25310
Location: Ohio, USA

mqbrks wrote:

This is the process that is being used by CD. It is converting the IBM-037 to UTF as we had several data problem when being sent as binary with lot of invalid characters sent by mainframes. UTF-8 is taken as smiley in the below code snippet.

COPYCONR PROCESS
STEP1 COPY FROM(&NODE DSN=&DSN1 DISP=SHR -
SYSOPTS="CODEPAGE=(IBM-037,UTF-") -
TO ( DSN=&DSN2 DISP=(&DISP1,&DISP2) -
SYSOPTS=":datatype=text:xlate=no:strip.blanks=no:")
IF (STEP1=0) THEN
STEP2 RUN TASK SNODE (PGM=UNIX) SYSOPTS="mv &DSN2 &DSN3"
EIF


Exactly my point. That 'datatype=text' is telling Connect Direct that the file is composed of nothing but EBCDIC characters and it should move them all from CCSID 037 to UTF-8. If the mainframe file is not in fact all text then this conversion will result in spurious character sequences and the results you are seeing.

mqbrks wrote:
It is converting the IBM-037 to UTF as we had several data problem when being sent as binary with lot of invalid characters sent by mainframes


You mean all of the characters?

EBCDIC (IBM-037) has nothing in common with ASCII (and UTF-8 is just a superset of ASCII). If you move an entirely text file from the mainframe, as binary, onto a Unix or any other distributed box, it will appear to be gibberish with a limited number of printable characters and almost no alphanumerics. This is simply because alaphnumerics are represented by different hex values in EBCDIC.

mqbrks wrote:
Vitor wrote:

Get whoever owns the mainframe JCL that's doing the Connect Direct transfer to add a Sort jobstep before the Connect Direct jobstep. It's 5 lines of JCL and half a dozen sort control cards. The Connect Direct step is probably double that.


Yeah this requires a change on mainframes and resources aren't available to make any changes


Then you're doomed. The fix to your XML truncation problem is to change the Connect Direct datatype to binary and remove that codepage clause (which if memory serves me causes a syntax error if the datatype isn't text). You can then read the file through the File Input node by setting the File Input node code page to '037' not 'Broker Default'.

If you can't make that 2 line change because there are no resources then learn to live with this truncation problem.

If you find someone, get them to cut and paste this above the Connect Direct step (the EXEC card above where those parameters go into SYSIN):

Code:

//SIMPLE  EXEC PGM=SORT
//*
//* THIS IS MUCH MORE EFFICIENT THAN DOING A SORT IN IIB
//*
//SORTIN    DD DSN=&DSN1,DISP=(MOD,KEEP,KEEP)
//SORTOUT DD DSN=&DSN1,DISP=(MOD,KEEP,KEEP)
//SYSOUT   DD  SYSOUT=*
//SYSIN     DD  *
    however the file needs to be sorted
/*


Normally I charge $$ for coding. You're welcome.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Oct 24, 2018 12:53 pm Post subject: Reply with quote

Grand Poobah

Joined: 18 Nov 2003
Posts: 19770
Location: LI,NY

mqbrks wrote:

This is the process that is being used by CD. It is converting the IBM-037 to UTF as we had several data problem when being sent as binary with lot of invalid characters sent by mainframes. UTF-8 is taken as smiley in the below code snippet.

COPYCONR PROCESS
STEP1 COPY FROM(&NODE DSN=&DSN1 DISP=SHR -
SYSOPTS="CODEPAGE=(IBM-037,UTF-") -
TO ( DSN=&DSN2 DISP=(&DISP1,&DISP2) -
SYSOPTS=":datatype=text:xlate=no:strip.blanks=no:")
IF (STEP1=0) THEN
STEP2 RUN TASK SNODE (PGM=UNIX) SYSOPTS="mv &DSN2 &DSN3"
EIF


I beg you to notice that in the SYSOPTS for the copy process you have specified XLATE=NO.
So why would you expect to see the text in UTF-8 at the other end if you specifically told Connect Direct not to translate it???
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Vitor
PostPosted: Thu Oct 25, 2018 5:07 am Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 25310
Location: Ohio, USA

Specifying 2 code pages causes code page transformation. Irrespective of xlate setting.

Don't ask - apparently it's a "feature".

We send everything through Connect Direct as binary and figure it out someplace else, and use Connect Direct for the management & auditing capabilities.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
mqbrks
PostPosted: Thu Oct 25, 2018 6:09 pm Post subject: Reply with quote

Acolyte

Joined: 17 Jan 2012
Posts: 74

Vitor wrote:
Specifying 2 code pages causes code page transformation. Irrespective of xlate setting.

Don't ask - apparently it's a "feature".

We send everything through Connect Direct as binary and figure it out someplace else, and use Connect Direct for the management & auditing capabilities.


Really appreciate your knowledge Vitor regarding this! I am still investigating between my deadlines for other projects, Seems like the file does have invalid characters. Need to explore more.

Question : Previously we tried to receive the file as Binary but mainframes app was sending many invalid characters like || (something like pipe) which were throwing parsing errors while the data is getting converted. We approached CD solution as CD experts advised to use code page converts in CD rather than IIB DFDL which will filter most of the invalid or gibberish characters. How can IIB filter the gibberish characters ?
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Oct 26, 2018 5:03 am Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 25310
Location: Ohio, USA

mqbrks wrote:
Previously we tried to receive the file as Binary but mainframes app was sending many invalid characters like || (something like pipe) which were throwing parsing errors while the data is getting converted.


Vitor wrote:
EBCDIC (IBM-037) has nothing in common with ASCII (and UTF-8 is just a superset of ASCII). If you move an entirely text file from the mainframe, as binary, onto a Unix or any other distributed box, it will appear to be gibberish


Notice anything? Like the use of the word "gibberish"?

mqbrks wrote:
How can IIB filter the gibberish characters ?


It doesn't need to filter them, it needs to correctly interpret them. Like I said:

Vitor wrote:
You can then read the file through the File Input node by setting the File Input node code page to '037' not 'Broker Default'.


If you tell the FileInput node to read a file and use an ASCII code page to read it (and the 'BrokerDefault' code page on any distributed platform is an ASCII one) then by the time it gets to the DFDL the message tree is hosed. Internally IIB uses UTF-16 so to build the message tree it's converting what it thinks is some kind of ASCII file into that and you'll get gibberish.

If you tell the FileInput node to use CCSID 037 (which I'm assuming is the code page the file was written in as it's the source for the Connect Direct translation), the FileInput node will correctly interpret the byte stream and you'll get a clean message tree in UTF-16. You can then feed this to the DFDL model (which should be based on the actual record layout and identify which fields are text and which fields are binary).
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
timber
PostPosted: Sat Oct 27, 2018 1:12 pm Post subject: Reply with quote

Sentinel

Joined: 25 Aug 2015
Posts: 868

Just to clarify something...and please ignore if you know this already.

The DFDL parser has access to all of IIB's extensive range of character encodings. It is no less (and no more) capable than any other IIB parser in this respect. It has full access to all ICU character tables, and can therefore read and write characters in any encoding (and it will not mind reading in one encoding and writing in a different one, like any IIB parser).

I suspect that the sender is sending an *invalid* EBCDIC character stream. Sounds as if there are UTF-8 characters mixed into the EBCDIC character stream. In which case, there is no tool in the world that can handle such a stream - not Java, not CD. Only custom code that is aware of the sender's format can deal with invalid character streams, and it will require a lot of care. Usually it's a lot simpler and cheaper to send valid a character stream in the first place.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Sat Oct 27, 2018 8:33 pm Post subject: Reply with quote

Grand Poobah

Joined: 18 Nov 2003
Posts: 19770
Location: LI,NY

I also saw 2 sysopts statements where I would have expected only one and some very bizarre formatting of the sysopts field:
Code:
sysopts="XXX")

It looks like the poster did not translate it to text and copy the full text.
The error could also have to do with form.
Did the OP right click and validate the process?
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:
Post new topicReply to topic Goto page Previous  1, 2 Page 2 of 2

MQSeries.net Forum IndexWebSphere Message Broker SupportXML Message Truncated while routing through HTTP Request
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.