Author |
Message
|
petervh1 |
Posted: Wed Dec 19, 2018 7:04 am Post subject: XPath syntax in DFDL expression |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
I'm trying to code a dynamic length value of element "Field4" as follows, but I'm unsure of the correct syntax. I can't find enough detailed info on the syntax.
Record layout:
Code: |
Field1 name = recordlength type = nonNegativeInteger etc.
Field2 name = xyz type = string
Field3 name = abc type = string
Field4 name = payload type = hexBinary
|
The entire record length is given in Field1. I need the length for Field4 only, so I need to set up a dfdl:length XPath statement for Field4 that has this type of pseudocode:
<xsd:element dfdl:initiator="nnn:" dfdl:length="{(../recordlength - (length(recordlength) + length(xyz)
+length(abc) ))}"
I've got the entire length returned correctly in ../recordlength
Can someone tell me how to code the XPath for the
Code: |
(length(recordlength) + length(xyz)
+length(abc) |
part?
Thanks |
|
Back to top |
|
 |
timber |
Posted: Wed Dec 19, 2018 8:51 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
|
Back to top |
|
 |
petervh1 |
Posted: Thu Dec 20, 2018 3:03 am Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
I've discovered that unfortunately the 'extra' fields are not always the same length. |
|
Back to top |
|
 |
timber |
Posted: Thu Dec 20, 2018 9:18 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
I was afraid you might say that. The next thing that I would try is:
- create a sub-element 'payloadContainer' after recordLength with lengthKind=explicit, length=./recordLength
- on element payloadContainer/payload, set lengthKind=endOfParent
Worth a try, anyway. |
|
Back to top |
|
 |
petervh1 |
Posted: Thu Dec 20, 2018 11:33 pm Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
I'm a little confused - did you mean modify Field4 to include
lengthKind=explicit, length=./recordLength?
Where do I add
lengthKind=endOfParent?
As I understand it, IBM DFDL does not support lengthKind=endOfParent
Thanks again for your assistance |
|
Back to top |
|
 |
petervh1 |
Posted: Tue Jan 08, 2019 12:09 am Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
I'm still unable to parse this.
I saw timber's suggestion in
Quote: |
http://www.mqseries.net/phpBB2/viewtopic.php?t=74722 |
and have tried using
Code: |
dfdl:lengthKind="pattern" dfdl:lengthPattern="\x1C" |
to find the last occurrence of hex 1C (ie ASCII FS) in the string.
This gives me an error:
Quote: |
Element xxx with lengthKind='pattern' could not be found using the pattern '\x1C'. |
Am I missing something in my regex here? |
|
Back to top |
|
 |
timber |
Posted: Tue Jan 08, 2019 11:23 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
Am I missing something in my regex here? |
Yes - your regex only describes the terminator. It should match *all* of the content of the element. You need something like this (not tested):
Code: |
dfdl:lengthKind="pattern" dfdl:lengthPattern="[^\x1C]*\x1C" |
|
|
Back to top |
|
 |
petervh1 |
Posted: Tue Jan 08, 2019 10:10 pm Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
Thanks for the update.
When I use the following element definition
Code: |
<xsd:element dfdl:initiator="14.999:" dfdl:lengthKind="pattern" dfdl:lengthPattern="[^\x1C]*\x1C" name="T14.DAT" type="xsd:string"/>
|
I get the same error:
Quote: |
Element T14.DAT with lengthKind='pattern' could not be found using the pattern '[^\x1C]*\x1C'. |
The data I am trying to model has multiple records, each starting with an intiator = 14.001 and terminated with x1C. The element T14.DAT described above is the last one in the record. The element contains binary data. |
|
Back to top |
|
 |
timber |
Posted: Wed Jan 09, 2019 2:15 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
The data I am trying to model has multiple records, each starting with an intiator = 14.001 and terminated with x1C. The element T14.DAT described above is the last one in the record. The element contains binary data.
|
You have not said so (yet), but I assume that
a) the 0x1C cannot appear in the binary value
b) all of the other elements have dfdl:representation="text", and so you are able to define a DFDL terminator to represent the 0x1C byte.
You might want to consider this option...
- define the binary element with representation="text" and
- set dfd:encoding to ISO8859-1 for this element only (unless the entire file happens to use that encoding, of course)
- set the terminator to %#x1C; (as I assume you have done for the preceding elements)
This will parse the binary as a string of characters. Normally, this would not be safe because random binary data does not translate safely into most encodings. But ISO8859-1 is a single-byte encoding with 256 defined characters. So you can never get an 'illegal character' error. Just a rather unreadable string value for your element. |
|
Back to top |
|
 |
petervh1 |
Posted: Wed Jan 09, 2019 11:35 pm Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
Looking through more data samples, I've established that there are 0x1C sequences appearing in the final, binary field. This means that the parser stops when it thinks it recognises this sequence as the end of the record.
Yes, all of the other fields have dfdl:representation="text".
Before I try and code this, would this work:
1 Define a separator for the end of the binary field as "0x1C" followed by "14.001" in hex (this is the initiator for a subsequent record as stated in my earlier post)
The problem that I see with this is - how do I parse the data if there is no second 14.001 record, i.e. the file contains:
Code: |
14.001abcdef14.999binarystuff[0x1C] - end of file |
as opposed to:
Code: |
14.001abcdef14.999binarystuff[0x1C]14.001abcdef14.999binarystuff[0x1C] - end of file |
Once again, your assistance is much appreciated. |
|
Back to top |
|
 |
petervh1 |
Posted: Thu Jan 10, 2019 12:00 am Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
Update:
There is another record in the file that contains a field as follows:
1.003nnn14n01n14n02 etc. This gives a count of the number of type 14 records appearing later in the file. Can I use this count somehow to determine how to parse for the end of 14.999 as stated earlier? |
|
Back to top |
|
 |
timber |
Posted: Thu Jan 10, 2019 1:04 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Some of your problems are not DFDL problems, they are problems with your understanding of the data format. You will struggle to parse a complex format like this one without understanding exactly what the format specification says.
Quote: |
Define a separator for the end of the binary field as "0x1C" followed by "14.001" |
That will not work. The binary could still contain your separator value (it is less likely but still possible). Is there a name for this format that you are parsing? What does the format specification say about the length of these binary fields? |
|
Back to top |
|
 |
petervh1 |
Posted: Thu Jan 10, 2019 2:37 am Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
The format of the data supplies the following, amongst others:
Record type 1.003 - this indicates the number of records of other types, e.g. 1.003nnn03n01n14n01n14n02 shows that there is 1 type 3 record and 2 type 14 records in the file.
Question: Can I use an fn:count or XPath statement to count the number of type 14 records as indicated by this type 1.003 record?
As I said earlier in this post, the record format contains the length of the binary field (Field4) in Field1. The problem with using this is that dfdl:contentLength() and dfdl:valueLength()
are not supported by IBM DFDL as you've already said.
This means that I can't use the record length field to know where to delimit the binary field.
The binary field is terminated by 0x1C but also can contain values that equate to 0x1C (in binary), so I can't use a terminator of 0x1C.
The name of the format is NIST (Data Format for the Interchange of Fingerprint, Facial & Other Biometric Information).
Any assistance would be appreciated. |
|
Back to top |
|
 |
timber |
Posted: Fri Jan 11, 2019 1:22 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Thanks - that's very helpful. We can forget about using separators/terminators to find the end of these binary fields. I think this will probably work:
Code: |
message
complexField
lengthfield
complexElement length=${../lengthField-4}
otherField1
otherField2
binaryField lengthKind='delimited'
|
The use of lengthKind='delimited' on the binary element is deliberate. If the binary field is the final field in the complex element, then it works in the same was as lengthKind='endOfParent' (which is one reason why IBM DFDL does not yet support lengthKind='endOfParent').
Do take care to avoid defining any separators/terminators except for members of complexField. Otherwise the lengthKind='delimited' on binaryField will attempt to scan for them!
Please give it a try and let me know. |
|
Back to top |
|
 |
petervh1 |
Posted: Sun Jan 13, 2019 11:31 pm Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
I have tried what I think is what you are suggesting. This is what I coded:
Code: |
Type 1 records (successfully parsed)
Type 2 records (successfully parsed)
.
.
<xsd:element dfdl:initiator="" maxOccurs="unbounded" name="Type14" dfdl:length="{../Type14-4}">
<xsd:complexType>
<xsd:sequence dfdl:separator="" >
<xsd:element dfdl:initiator="14.001:" dfdl:terminator="%#x1D;" name="T14.LEN" type="xsd:string"/>
<xsd:element dfdl:initiator="14.002:" dfdl:terminator="%#x1D;" name="T14.IDC" type="xsd:string"/>
<xsd:element dfdl:initiator="14.999:" dfdl:lengthKind="delimited" name="T14.DAT" type="xsd:hexBinary"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
|
The reulst of this is that a message with 2 type 14 records parses successfully according to the DFDL Test Parse Model. However, it appears that only the first of the 2 type 14 records is parsed.
Questions:
1) I assume I have correctly followed your instruction "avoid defining any separators/terminators except for members of complexField"
2) I'm a bit confused about the placement of the "dfdl:length="{../Type14-4}" - it appears to make no difference whether this is coded or not (only the first type 14 record is parsed in both cases). Is this in the right place, as the DFDL trace does not show this calculation being executed? |
|
Back to top |
|
 |
|