MQSeries.net :: View topic - XPath syntax in DFDL expression

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » XPath syntax in DFDL expression

Goto page 1, 2 Next

XPath syntax in DFDL expression

« View previous topic :: View next topic »

Author

Message

petervh1

Posted: Wed Dec 19, 2018 7:04 am Post subject: XPath syntax in DFDL expression

Centurion

Joined: 19 Apr 2010
Posts: 135

I'm trying to code a dynamic length value of element "Field4" as follows, but I'm unsure of the correct syntax. I can't find enough detailed info on the syntax.

Record layout:

Code:

Field1 name = recordlength type = nonNegativeInteger etc.
Field2 name = xyz type = string
Field3 name = abc type = string
Field4 name = payload type = hexBinary

The entire record length is given in Field1. I need the length for Field4 only, so I need to set up a dfdl:length XPath statement for Field4 that has this type of pseudocode:

<xsd:element dfdl:initiator="nnn:" dfdl:length="{(../recordlength - (length(recordlength) + length(xyz)
+length(abc) ))}"

I've got the entire length returned correctly in ../recordlength
Can someone tell me how to code the XPath for the

Code:

(length(recordlength) + length(xyz)
+length(abc)

part?

Thanks

timber

Posted: Wed Dec 19, 2018 8:51 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

You have the right idea, and the DFDL specification does allow that kind of thing to be done. However...according to https://www.ibm.com/support/knowledgecenter/en/SSMKHH_10.0.0/com.ibm.etools.mft.doc/df00150_.htm the DFDL functions dfdl:contentLength() and dfdl:valueLength() are not supported by IBM DFDL.

I think your best bet is to precalculate (if possible) the lengths of the 'extra' fields and subtract that constant value from the value of recordLength.

petervh1

Posted: Thu Dec 20, 2018 3:03 am Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 135

I've discovered that unfortunately the 'extra' fields are not always the same length.

timber

Posted: Thu Dec 20, 2018 9:18 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

I was afraid you might say that. The next thing that I would try is:
- create a sub-element 'payloadContainer' after recordLength with lengthKind=explicit, length=./recordLength
- on element payloadContainer/payload, set lengthKind=endOfParent
Worth a try, anyway.

petervh1

Posted: Thu Dec 20, 2018 11:33 pm Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 135

I'm a little confused - did you mean modify Field4 to include

lengthKind=explicit, length=./recordLength?

Where do I add

lengthKind=endOfParent?

As I understand it, IBM DFDL does not support lengthKind=endOfParent

Thanks again for your assistance

petervh1

Posted: Tue Jan 08, 2019 12:09 am Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 135

I'm still unable to parse this.

I saw timber's suggestion in

Quote:

http://www.mqseries.net/phpBB2/viewtopic.php?t=74722

and have tried using

Code:

dfdl:lengthKind="pattern" dfdl:lengthPattern="\x1C"

to find the last occurrence of hex 1C (ie ASCII FS) in the string.

This gives me an error:

Quote:

Element xxx with lengthKind='pattern' could not be found using the pattern '\x1C'.

Am I missing something in my regex here?

timber

Posted: Tue Jan 08, 2019 11:23 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

Am I missing something in my regex here?

Yes - your regex only describes the terminator. It should match *all* of the content of the element. You need something like this (not tested):

Code:

dfdl:lengthKind="pattern" dfdl:lengthPattern="[^\x1C]*\x1C"

petervh1

Posted: Tue Jan 08, 2019 10:10 pm Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 135

Thanks for the update.

When I use the following element definition

Code:

<xsd:element dfdl:initiator="14.999:" dfdl:lengthKind="pattern" dfdl:lengthPattern="[^\x1C]*\x1C" name="T14.DAT" type="xsd:string"/>

I get the same error:

Quote:

Element T14.DAT with lengthKind='pattern' could not be found using the pattern '[^\x1C]*\x1C'.

The data I am trying to model has multiple records, each starting with an intiator = 14.001 and terminated with x1C. The element T14.DAT described above is the last one in the record. The element contains binary data.

timber

Posted: Wed Jan 09, 2019 2:15 pm Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

You have not said so (yet), but I assume that
a) the 0x1C cannot appear in the binary value
b) all of the other elements have dfdl:representation="text", and so you are able to define a DFDL terminator to represent the 0x1C byte.

You might want to consider this option...
- define the binary element with representation="text" and
- set dfd:encoding to ISO8859-1 for this element only (unless the entire file happens to use that encoding, of course)
- set the terminator to %#x1C; (as I assume you have done for the preceding elements)

This will parse the binary as a string of characters. Normally, this would not be safe because random binary data does not translate safely into most encodings. But ISO8859-1 is a single-byte encoding with 256 defined characters. So you can never get an 'illegal character' error. Just a rather unreadable string value for your element.

petervh1

Posted: Wed Jan 09, 2019 11:35 pm Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 135

Looking through more data samples, I've established that there are 0x1C sequences appearing in the final, binary field. This means that the parser stops when it thinks it recognises this sequence as the end of the record.

Yes, all of the other fields have dfdl:representation="text".

Before I try and code this, would this work:

1 Define a separator for the end of the binary field as "0x1C" followed by "14.001" in hex (this is the initiator for a subsequent record as stated in my earlier post)

The problem that I see with this is - how do I parse the data if there is no second 14.001 record, i.e. the file contains:

Code:

14.001abcdef14.999binarystuff[0x1C] - end of file

as opposed to:

Code:

14.001abcdef14.999binarystuff[0x1C]14.001abcdef14.999binarystuff[0x1C] - end of file

Once again, your assistance is much appreciated.

petervh1

Posted: Thu Jan 10, 2019 12:00 am Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 135

Update:

There is another record in the file that contains a field as follows:

1.003nnn14n01n14n02 etc. This gives a count of the number of type 14 records appearing later in the file. Can I use this count somehow to determine how to parse for the end of 14.999 as stated earlier?

timber

Posted: Thu Jan 10, 2019 1:04 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Some of your problems are not DFDL problems, they are problems with your understanding of the data format. You will struggle to parse a complex format like this one without understanding exactly what the format specification says.

Quote:

Define a separator for the end of the binary field as "0x1C" followed by "14.001"

That will not work. The binary could still contain your separator value (it is less likely but still possible). Is there a name for this format that you are parsing? What does the format specification say about the length of these binary fields?

petervh1

Posted: Thu Jan 10, 2019 2:37 am Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 135

The format of the data supplies the following, amongst others:

Record type 1.003 - this indicates the number of records of other types, e.g. 1.003nnn03n01n14n01n14n02 shows that there is 1 type 3 record and 2 type 14 records in the file.

Question: Can I use an fn:count or XPath statement to count the number of type 14 records as indicated by this type 1.003 record?

As I said earlier in this post, the record format contains the length of the binary field (Field4) in Field1. The problem with using this is that dfdl:contentLength() and dfdl:valueLength()
are not supported by IBM DFDL as you've already said.

This means that I can't use the record length field to know where to delimit the binary field.

The binary field is terminated by 0x1C but also can contain values that equate to 0x1C (in binary), so I can't use a terminator of 0x1C.

The name of the format is NIST (Data Format for the Interchange of Fingerprint, Facial & Other Biometric Information).

Any assistance would be appreciated.

timber

Posted: Fri Jan 11, 2019 1:22 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Thanks - that's very helpful. We can forget about using separators/terminators to find the end of these binary fields. I think this will probably work:

Code:

message
complexField
lengthfield
complexElement length=${../lengthField-4}
otherField1
otherField2
binaryField lengthKind='delimited'

The use of lengthKind='delimited' on the binary element is deliberate. If the binary field is the final field in the complex element, then it works in the same was as lengthKind='endOfParent' (which is one reason why IBM DFDL does not yet support lengthKind='endOfParent').
Do take care to avoid defining any separators/terminators except for members of complexField. Otherwise the lengthKind='delimited' on binaryField will attempt to scan for them!

Please give it a try and let me know.

petervh1

Posted: Sun Jan 13, 2019 11:31 pm Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 135

I have tried what I think is what you are suggesting. This is what I coded:

Code:

Type 1 records (successfully parsed)
Type 2 records (successfully parsed)
.
.
<xsd:element dfdl:initiator="" maxOccurs="unbounded" name="Type14" dfdl:length="{../Type14-4}">
<xsd:complexType>
<xsd:sequence dfdl:separator="" >
<xsd:element dfdl:initiator="14.001:" dfdl:terminator="%#x1D;" name="T14.LEN" type="xsd:string"/>
<xsd:element dfdl:initiator="14.002:" dfdl:terminator="%#x1D;" name="T14.IDC" type="xsd:string"/>
<xsd:element dfdl:initiator="14.999:" dfdl:lengthKind="delimited" name="T14.DAT" type="xsd:hexBinary"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

The reulst of this is that a message with 2 type 14 records parses successfully according to the DFDL Test Parse Model. However, it appears that only the first of the 2 type 14 records is parsed.

Questions:

1) I assume I have correctly followed your instruction "avoid defining any separators/terminators except for members of complexField"

2) I'm a bit confused about the placement of the "dfdl:length="{../Type14-4}" - it appears to make no difference whether this is coded or not (only the first type 14 record is parsed in both cases). Is this in the right place, as the DFDL trace does not show this calculation being executed?

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » XPath syntax in DFDL expression

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP