Author |
Message
|
akil |
Posted: Mon Jun 30, 2014 2:14 am Post subject: Rookie Question: Modelling System Generated Reports |
|
|
 Partisan
Joined: 27 May 2014 Posts: 338 Location: Mumbai
|
Hi
I know how to model simple pipe separates file with 1 header, and 1 footer.
I am now having to model a system generated report, which has 10 header lines ( with report name , generated date, and so on ), followed by a couple of blank lines and then the column headers (separate by pipe), and eventually the details, followed by a couple of blank lines and then one total summary at the end.
How do I do that? (1) is there a way to model in DFDL (2) do I remove some lines via ESQL and model just the actual report ?
Version : 9.0.0.0.
Sample:
Bank Name MYBANK
Report Name SYSTEM REPORT
Branch Name MUMBAI
Zone Set Id 1
Zone Date 26-06-2014
Zone Code CODE1
Zone Description MUMBAI ZONE
Generation date 26-06-2014 08:44:00
Account Sol ID|Account Number|Account Name|Scheme code|Segment|Sub Segment|Cheque number|Cheque amount|Exception code and Description|RM Code|
0012|1231312313 |QUANTIGUOUS SOLUTIONS |QG|CORP |SME |3 | 18,00,000.00|Error : Insufficient Avail Bal Excp | 1233|
**************** SUMMARY ***************
The Total No. of Instruments : 1
The Total Instrument Amount : 18,00,000.00
*********** END OF SUMMARY ***************
Signature _________________ Regards |
|
Back to top |
|
 |
kimbert |
Posted: Mon Jun 30, 2014 5:54 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Yes, DFDL can parse almost any format. You will need to model this manually - there is no wizard to help you with a custom format.
Should be easy enough - just treat them as normal delimited records. Use %NL; to model the newline character(s) unless you are certain about which style of newline you will get.
Quote: |
followed by a couple of blank lines |
That's what %WSP*; is for.
If some of the lines are optional then you may need to guide the DFDL parser ( using initiators / discriminators ) in ways that you did not initially expect. That is normal and expected. When in doubt, be very explicit about what is expected. You will be rewarded with better error messages from the DFDL parser. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
akil |
Posted: Tue Jul 01, 2014 1:01 am Post subject: |
|
|
 Partisan
Joined: 27 May 2014 Posts: 338 Location: Mumbai
|
Hi
I am building a DFDL using choice/discriminators as you suggested. I am stuck at a particular point, the report contains Life feed characters (x12) after every 22 odd lines, while this is not stopping the parsing, it gives me an error that the logical instance cannot be displayed, because 'character reference &x12 in an invalid XML character".
The part that gives this is the following ( see the first character second line )
..
..
0091|1111 |CUST 1 |SBPSA|MASSMKT |ASPIRINGAFFLUE|9 | 55,650.00|Error : Insufficient Avail Bal Excp | 3464|
^L0108|22222 |CUST 2 |CAGEN|AFFLUENT |ASPIRINGAFFLUE|1204 | 55,422.00|Error : Insufficient Avail Bal Excp | 2917|
0108|33333 |CUST 3 |CAGEN|AFFLUENT |ASPIRINGAFFLUE|1181 | 55,266.00|Error : Insufficient Avail Bal Excp | 2917|
...
..
Is there a way to ignore/skip certain characters during the parse stage in DFDL ?
What else can I do ? _________________ Regards |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jul 01, 2014 2:08 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
the logical instance cannot be displayed, because 'character reference &x12 in an invalid XML character' |
Sounds as if you have encountered a defect in the logical instance view. It is true that the character 0x12 is not allowed in an XML document. But the DFDL specification allows all Unicode characters. So it should be safe to ignore the error...
...unless your message flow is going to create an XML document. You should not create an XML document containing the 0x12 character. IIB will not prevent you from doing it but the downstream application should, and almost certainly will, reject it as badly-formed.
Having said all of that, I think you need to be clearer about the purpose of these 0x12 characters. Why are they present in the file? Can you model them as initiators? ( which would mean that they would not appear in the data ). Are they line-continuation characters for extra-long records? _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
akil |
Posted: Tue Jul 01, 2014 11:06 am Post subject: |
|
|
 Partisan
Joined: 27 May 2014 Posts: 338 Location: Mumbai
|
Hi
The file that I am reading is a report generated by Finacle , Finacle is a core banking application from Infosys. My guess is that these form feed characters are being put by the application to help align paper pages in legacy printers. These characters repeat after every 22 lines.
I was not able to use these as initiators since not every line contains it . ( I suppose initiators aren't optional, and if specified need to be there for all records )
Once the rows are parsed, I will be using the Graphical Data Mapper to enrich this record by appending additional data from database selects, and the output message (which is a CSV modelled in DFDL) will then be saved as a file for further processing by another system (IBM BPM in this case). I have to remove these characters from the output message to avoid downstream errors.
Should I use an Compute node to remove these characters or should I use the GDM Xpath functions to remove these characters? _________________ Regards |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jul 01, 2014 11:10 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
make them an optional part of the line terminator? |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jul 01, 2014 12:27 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I suppose initiators aren't optional, and if specified need to be there for all records |
Yes...sort of. But there is a lot of flexibility built into the DFDL language. For instance, you can specify more than one initiator, and it can contain optional white space.
Quote: |
Once the rows are parsed, I will be using the Graphical Data Mapper to enrich this record by appending additional data from database selects, and the output message (which is a CSV modelled in DFDL) will then be saved as a file for further processing by another system |
So far, so good. The message flow is using each component for its assigned task. The flow does not know or care about the physical format of the data before it entered the flow.
Quote: |
Should I use an Compute node to remove these characters or should I use the GDM Xpath functions to remove these characters? |
No. Using a Compute node/GDM to remove the offending characters is effectively putting knowledge about the physical format into the message flow. So best avoided if possible.
The solution is similar to mqjeff's suggestion, and it exploits the flexibility of DFDL. Some lines end with a newline. Other lines end with a newline and 0x12. So model it that way. The DFDL 'terminator' property is a list of terminators, and DFDL will always take the longest match. Please refer to the DFDL specification for syntax details
https://redmine.ogf.org/dmsf_files/13115?download=
You will find that section 6.3.1 is good background reading, and 6.3.1.2 contains the information that you need. Looks as if your character is %DC2; in DFDL. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
akil |
Posted: Wed Jul 02, 2014 2:02 am Post subject: |
|
|
 Partisan
Joined: 27 May 2014 Posts: 338 Location: Mumbai
|
Thank you.
I followed your advice, and kept the message flow clean of the physical representation and modelled this in DFDL.
While I was not able to make the terminator work (I'll figure that out soon), I converted the first element from a simpleType to a 2 element choice. The first element has the FormFeed character as the initiator, it's length is set as zero and its optional, and the second element is a simple String. The parser figures matches the FormFeed when it is present to the optional element, and the zero length ensures that it's not part of the logical view. This works pretty well. _________________ Regards |
|
Back to top |
|
 |
kimbert |
Posted: Wed Jul 02, 2014 2:34 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
The terminator property is a space-separated list of alternatives. So you need to set it to
Code: |
%CR;%LF; %CR;%LF;%DC2; |
or the equivalent:
Code: |
%CR;%LF; %CR;%LF;%#x12; |
_________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
akil |
Posted: Wed Jul 02, 2014 9:41 am Post subject: |
|
|
 Partisan
Joined: 27 May 2014 Posts: 338 Location: Mumbai
|
Hi
Got it, thank you,
I missed it in the documentation, I have to focus harder, read it more than once.. thank you for your help..
Regards _________________ Regards |
|
Back to top |
|
 |
|