ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Issue modeling DIME format in DFDL

Post new topic  Reply to topic
 Issue modeling DIME format in DFDL « View previous topic :: View next topic » 
Author Message
Be2Be
PostPosted: Sun Feb 11, 2018 4:55 am    Post subject: Issue modeling DIME format in DFDL Reply with quote

Newbie

Joined: 11 Feb 2018
Posts: 4

The DIME draft spec (a developerworks link to this doc: cn/webservices/ws-dime/ws-dime.pdf) relates to my question. There is an IETF link as well somewhere.

I'm able to parse an input DIME message with my DFDL model till as far as the DATA_LENGTH field. However, when it comes to parsing the TYPE field (the SOAP v1.1 absolute URI of 41 bytes length), i can't get the entire field value in the TYPE field as DFDL parser considers the leading three null bytes of padding as included in the length. So i get the TYPE field missing the end three bytes 'pe/'. These are then parsed to the next field, DATA. This behavior is not affected by changing the value of property 'Truncate string of specified length' of field TYPE from 'yes' to 'no'.

It is evident that DFDL is not concerned with considering an additional x bytes in the length of a padded field it has truncated of x bytes. This seems intentional as DFDL does not assume knowledge of the quirks of any spec.

DIME spec states that it considers any length value given in the length fields (like TYPE_LENGTH) to exclude any padding (padding with zero octets; for a max of 3 zero octets to achieve alignment to a field length which is a multiple of 4 octets) of the related field.

In the sample message i have been given by the legacy app team, i can see that some fields like ID are not padded to conform to a length of multiple of 4 octets. So the axis library that they are using may be a bit generous in parsing such fields. It looks like i have to do something similar:

TYPE field length property: IF exists (leading three|two|one NUL bytes in current field) then TYPE_LENGTH + 3|2|1 else TYPE_LENGTH

i'm not thinking of using another expression that takes a mod of TYPE_LENGTH/4 and then adds 3 if MOD=1 or adds 2 if MOD=2 etc as i suspect the data i'll get will most likely not conform to the padding rules a la the sample message.

Can the perts please confirm if there is a better way to achieve my goal (i'm assuming my expression above is doable in DFDL)?

My apologies for not posting my sample data and DFDL model snippets as i don't have access to my workstation right now.
Back to top
View user's profile Send private message
timber
PostPosted: Mon Feb 12, 2018 1:56 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

Good problem description.

It would certainly help if you posted some examples. Probably best to replace nulls with *NULL* in order to make them visible. And please do use [c o d e] tags when posting example messages and/or schema snippets.

My guess is that DFDL can probably handle this, but not using the standard lengthKind=prefixed. You will probably need to parse the length field as a separate field, and then set the length property to a DFDL expression using {some xpath that returns an integer length}.

If the sender is not strictly following the specification then you may have further problems...or it may be easy to be lenient. Hard to say without seeing examples.
Back to top
View user's profile Send private message
Be2Be
PostPosted: Mon Feb 12, 2018 5:57 pm    Post subject: Reply with quote

Newbie

Joined: 11 Feb 2018
Posts: 4

timber, I had modelled it as you have suggested (based on IBMdefined/GeneralPurposeFormat.xsd).

TYPE_LENGTH:
Code:
<xsd:element dfdl:alignment="2" dfdl:length="2" dfdl:lengthKind="explicit" dfdl:lengthUnits="bytes" dfdl:representation="binary" name="TYPE_LENGTH" type="xsd:unsignedShort"/>


TYPE:
Code:
<xsd:element dfdl:alignment="1" dfdl:length="{xs:int( /DIMEmessage/DimeRec/DIMEHdr/HdrRow1/Type_Length)}" dfdl:lengthKind="explicit" dfdl:lengthUnits="bytes" dfdl:textPadKind="padChar" dfdl:textStringJustification="right" dfdl:textStringPadCharacter="%#r00;" dfdl:textTrimKind="padChar" dfdl:truncateSpecifiedLengthString="yes" name="Type" type="xsd:string"/>


Parser Output (truncated just long enough to illustrate):
Code:
<?xml version="1.0" encoding="UTF-8" ?>
  <DIMEmessage xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <DimeRec>
  <DIMEHdr>
  <HdrRow1>
  <version xsi:type="xs:unsignedByte">1</version>
  <MB xsi:type="xs:boolean">true</MB>
  <ME xsi:type="xs:boolean">false</ME>
  <CF xsi:type="xs:boolean">false</CF>
  <Type_T xsi:type="xs:unsignedByte">2</Type_T>
  <Resrvd xsi:type="xs:unsignedByte">0</Resrvd>
  <Opt_Length xsi:type="xs:unsignedShort">0</Opt_Length>
  <ID_Length xsi:type="xs:unsignedShort">41</ID_Length>
  <Type_Length xsi:type="xs:unsignedShort">41</Type_Length>
  </HdrRow1>
  <DataLength xsi:type="xs:unsignedInt">1884</DataLength>
  <ID xsi:type="xs:string">uuid:714C6C40-4531-442E-A498-3AC614200295</ID>
  <Type xsi:type="xs:string">http://schemas.xmlsoap.org/soap/envelo</Type>
  </DIMEHdr>
  <payload xsi:type="xs:string">pe/<![CDATA[*NULL**NULL**NULL*]]><?xml versio


where
Code:
*NULL* = & # 0 ;


Input that gave Output above (Hex):
Code:

         0C 20 00 00 00 29 00 29 00 00 07 5C 75 75 69 64
         3A 37 31 34 43 36 43 34 30 2D 34 35 33 31 2D 34
         34 32 45 2D 41 34 39 38 2D 33 41 43 36 31 34 32
         30 30 32 39 35 00 00 00 68 74 74 70 3A 2F 2F 73
         63 68 65 6D 61 73 2E 78 6D 6C 73 6F 61 70 2E 6F
         72 67 2F 73 6F 61 70 2F 65 6E 76 65 6C 6F 70 65
         2F 00 00 00 3C 3F 78 6D 6C 20 76 65 72 73 69 6F


DIME format (from http://xml.coverpages.org/draft-nielsen-dime-02.txt):
Code:

  0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         |M|M|C|       |       |                               |
     | VERSION |B|E|F| TYPE_T| RESRVD|         OPTIONS_LENGTH        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |            ID_LENGTH          |           TYPE_LENGTH         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          DATA_LENGTH                          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               /
     /                     OPTIONS + PADDING                         /
     /                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               /
     /                          ID + PADDING                         /
     /                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               /
     /                        TYPE + PADDING                         /
     /                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               /
     /                        DATA + PADDING                         /
     /                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


        Figure 3: DIME Record Layout. The use of "/" indicates a
        field length which is a multiple of 4 octets.


DFDL clearly considers the length units to be the absolute length, not relative to any padding and regardless of whether of whether pad char is requested to be trimmed.

So my question is: Is the only way to make my DFDL model consider that the length units in bytes represent the field length minus any padding is to write some xpath expression that calculates the length units to include the padding byte count for the dfdl:length property of field TYPE (and for other similarly padded fields)?
Back to top
View user's profile Send private message
timber
PostPosted: Tue Feb 13, 2018 4:29 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

Quote:
DFDL clearly considers the length units to be the absolute length, not relative to any padding and regardless of whether of whether pad char is requested to be trimmed.
That is true always, and regardless of how the length is determined. The length is the length of the 'content region' which always includes padding characters.

However...the null bytes in your message format are not padding. They are alignment bytes. DFDL has specific facilities for controlling alignment and if you set the alignment properties correctly on the OPTIONS, ID, TYPE and DATA elements then it should 'just work'.

As always, the DFDL trace view is invaluable in diagnosing what's going on inside the parser. I recommend using it, if you're not already doing so.
Back to top
View user's profile Send private message
Be2Be
PostPosted: Sun Feb 25, 2018 4:38 pm    Post subject: Reply with quote

Newbie

Joined: 11 Feb 2018
Posts: 4

Thanks timber. I looked at the Alignment properties and these don't seem to align ( ) with my requirements or understanding.

I consider the null bytes to be padding instead of alignment (fill) bytes as these are added at the end of content if the content length is not a multiple of 4 bytes.
I have used Alignment properties to strain out the individual bits from the first byte of data where the first 5 bits indicate the "Version", the 6th bit indicates "Message Begin" flag, the 7th bit indicates "Message End" flag and the last bit indicates the "Chunk Flag".

My main motive was to get the content without the null bytes so that I don't have to do additional massaging of the data in ESQL. This has been achieved for the ID and Type fields only as these are character data and so are modelled as "String". I have used the length of these fields parsed earlier on and used a mod calculation to infer the presence of null bytes. Then set the String Pad character as %#r00; and specify it as "Trim Kind". Works beautifully.

However this is not doable for DATA field as it could have any content (text, image) which is not always "String" data. So I modelled the DATA as "hexBinary" with length again derived using mod of DATA_LENGTH field. Downside is that i get the null bytes in the logical tree for payload which I have to truncate if payload is XML so that it can be validated to a schema.

Alignment does not also work for my purpose as the DATA field is of variable length so there is no straightforward property value to be specified for Alignment (bytes).
Back to top
View user's profile Send private message
timber
PostPosted: Mon Feb 26, 2018 1:27 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

Quote:
I have used Alignment properties to strain out the individual bits from the first byte of data where the first 5 bits indicate the "Version", the 6th bit indicates "Message Begin" flag, the 7th bit indicates "Message End" flag and the last bit indicates the "Chunk Flag".
Any reason why you did not simply model this byte using 4 separate elements, each with LengthUnits='bits'?
Quote:
My main motive was to get the content without the null bytes so that I don't have to do additional massaging of the data in ESQL. This has been achieved for the ID and Type fields only as these are character data and so are modelled as "String". I have used the length of these fields parsed earlier on and used a mod calculation to infer the presence of null bytes. Then set the String Pad character as %#r00; and specify it as "Trim Kind". Works beautifully.
That's exactly what I would have done. I think that's the best use of DFDL facilities.
Quote:
However this is not doable for DATA field as it could have any content (text, image) which is not always "String" data. So I modelled the DATA as "hexBinary"
I agree with the choice of hexBinary as the data type - that's pretty much forced on you if you don't know what the type is. Watch out for multi-byte character encodings, though.
Quote:
...with length again derived using mod of DATA_LENGTH field. Downside is that i get the null bytes in the logical tree for payload which I have to truncate if payload is XML so that it can be validated to a schema.
Have you tried modelling the data and the padding using separate elements? That would avoid the hand-coded removal of the null bytes.
Quote:
I consider the null bytes to be padding instead of alignment (fill) bytes as these are added at the end of content if the content length is not a multiple of 4 bytes.
That only matters if the final element in the message must be a multiple of 4 bytes. Otherwise, it's a matter of opinion whether you choose to think about the null bytes as 'leading alignment' or 'trailing padding'.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Issue modeling DIME format in DFDL
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.