Author |
Message
|
TKIY |
Posted: Tue Aug 23, 2016 8:18 am Post subject: Healthcare Pack - Truncating fields with subfields |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
I can figure out how to brute force this by setting up a counter and doing it manually, but I'm sure there must be a more elegant solution to this particular issue. Basically, I need to be able to truncate a DFDL element in ESQL by it's total length, including any sub-elements, but I'm not sure how to do this cleanly.
Using the Healthcare Pack, HL7 data is being parsed as an XML structure, but field length limits in HL7 are set for an entire field, not for individual subfield.
For instance in HL7, a patient identifier field would be represented as a field with subfields being separated by the '^' character:
PID|||123456789^^MOD 10^MRN-HOSP||LAST^FIRST^MIDDLE^^MRS|...
The downstream system in this example only accepts 20 characters so I need to truncate that field to '123456789^^MOD 10^MR' but since the structure is XML in WMB I'm not sure how to do this without copying each subfield individually and adding up their length to truncate it manually. In this case, the field is referenced as InputRoot.DFDL.ADT_A01.PID."PID.3.PatientIdentifierList" and contains up to 12 subfields, some of which contain three or four subfields of their own.
Hopefully this makes sense, but I'm not sure how to proceed from here. |
|
Back to top |
|
 |
shanson |
Posted: Thu Aug 25, 2016 2:24 am Post subject: |
|
|
 Partisan
Joined: 17 Oct 2003 Posts: 344 Location: IBM Hursley
|
Is the length limit of 20 for the downstream system applicable to every field, or does the length limit vary per field? |
|
Back to top |
|
 |
TKIY |
Posted: Thu Aug 25, 2016 4:25 am Post subject: |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
shanson wrote: |
Is the length limit of 20 for the downstream system applicable to every field, or does the length limit vary per field? |
It's on a field by field basis, so Address is 40, Phone Number is 20 for instance. |
|
Back to top |
|
 |
shanson |
Posted: Thu Aug 25, 2016 5:43 am Post subject: |
|
|
 Partisan
Joined: 17 Oct 2003 Posts: 344 Location: IBM Hursley
|
Truncating structures is not supported by DFDL, it's a inherently dangerous thing to do as you can render the message impossible to re-parse. Truncation of string elements that are defined with a specified length is possible though. I think you will need 3 variants of the DFDL HL7 schemas to make this work.
a) The schemas as shipped. These are used to serialize the message tree and create the HL7 message in un-truncated form.
b) The schemas modified to remove components and sub-components, so that all fields in all segments are defined as strings. The dfdl:lengthKind of all fields is still 'delimited'. These are used to re-parse the message from a) creating strings in the message tree (eg, "123456789^^MOD 10^MRN-HOSP").
c) The schemas from b) modified to so the dfdl:lengthKind of all fields is 'explicit' with dfdl:length set to the desired length and dfdl:truncateSpecifiedLengthString set to 'yes', for each field. These are used to re-serialize the message tree from b) and will cause DFDL to truncate oversize fields (eg, "123456789^^MOD 10^MR"). |
|
Back to top |
|
 |
TKIY |
Posted: Thu Aug 25, 2016 5:48 am Post subject: |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
Okay, thank you for the advice. I was thinking there would be a simple way to just put a maximum length on the structure but I understand why that's not possible. |
|
Back to top |
|
 |
shanson |
Posted: Thu Aug 25, 2016 7:07 am Post subject: |
|
|
 Partisan
Joined: 17 Oct 2003 Posts: 344 Location: IBM Hursley
|
You can make a structure fixed length using dfdl:lengthKind 'explicit' but an attempt to serialize the structure which exceeds the length will give a processing error. You can only truncate fixed length strings. |
|
Back to top |
|
 |
TKIY |
Posted: Thu Aug 25, 2016 7:11 am Post subject: |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
|
Back to top |
|
 |
timber |
Posted: Tue Aug 30, 2016 2:03 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
I'm with shanson; it is dangerous to blindly truncate data. It may render the message unparseable. You might be throwing away clinically important information. You might make two identical records look subtly different, thus causing a data mining program to miss the fact that they are identical...
I understand that it's the downstream application that is imposing this constraint. I also understand that the HL7 standard imposes this length limit. But HL7 2.x standards are rarely followed to the letter by any organisation. I think this length-limit rule is frankly bizarre, and the ideal solution would be to remove all length constraints on complex elements that contain delimited data. |
|
Back to top |
|
 |
TKIY |
Posted: Tue Aug 30, 2016 4:52 am Post subject: |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
timber wrote: |
I'm with shanson; it is dangerous to blindly truncate data. It may render the message unparseable. You might be throwing away clinically important information. You might make two identical records look subtly different, thus causing a data mining program to miss the fact that they are identical...
I understand that it's the downstream application that is imposing this constraint. I also understand that the HL7 standard imposes this length limit. But HL7 2.x standards are rarely followed to the letter by any organisation. I think this length-limit rule is frankly bizarre, and the ideal solution would be to remove all length constraints on complex elements that contain delimited data. |
I get the risks, but as you said it's the downstream system that is setting the limit.
I've been doing HL7 integration for the better part of 15 years now and this is hardly the first time I've seen a completely arbitrary data requirement imposed by a vendor, but we have only recent migrated to WMB so our team here is current learning about it's capabilities and limitations. I'm sure I'll have to accommodate some other odd vendor request in the very near future. |
|
Back to top |
|
 |
|