Author |
Message
|
GARNOLD5551212 |
Posted: Tue Jul 30, 2013 7:06 am Post subject: DFDL Graphical Mapping multi Records to same XML subsegment |
|
|
Novice
Joined: 23 Jul 2013 Posts: 13
|
I have a large file of records as input using DFDL defined input. I wish to convert it to a single XML output document. What I need help in understanding is how to rollup several records into on sub loop of the xml schema. I am mapping the key to the MSG assembly->LocalEnvironment->Variable section so I know when I hit the point that I want to create a new higher level section, but I don't know how to set the cardinality so that I stay at the same level and not close out my tags when writing to file.
Example:
Input.
[Group 1][IDField=1][DATA-1]
[Group 1][IDField=1][DATA-2]
[Group 1][IDField=1][DATA-3]
[Group 1][IDField=2][DATA-1]
[Group 1][IDField=2][DATA-2]
Desired Output
<Group ID=11>
<ID Value=1>
<Data Value=1/>
<Data Value=2/>
<Data Value=3/>
</ID>
<ID Value=2>
<Data Value=1/>
<Data Value=2/>
</ID>
</Group>
I'm coming from a TIBCO background, went through the week IBM training. Been creating flows using IBM graphical maps for about 8 weeks. I was able to solve this problem for one flow by mapping to just the ID section and appending to file, then creating a map that added hardcoded XML Prolog/Declaration, Opening <Group>, append the sub section file, then append a hard coded single xml string </Group>
In this instance I need to change values in the opening tags <Group > attribute, and keep a pointer in my output to not close out the segments till I detect a change in ID key. I hope that is clear. This seems like it would be a common pattern, and I'm hoping it is as simple as mapping to the Msg Assembly to create multiple XML docs out of one, but this is the opposite. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jul 30, 2013 7:27 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Create a for loop over the idfield structure.
In that submap, create a for loop over the data fields. |
|
Back to top |
|
 |
GARNOLD5551212 |
Posted: Tue Jul 30, 2013 10:10 am Post subject: |
|
|
Novice
Joined: 23 Jul 2013 Posts: 13
|
You cannot put a for each at the Data level because from the flat file on the left side Map from DFDL Data is [1..1]. Since each record in the file is processed one at a time in the flow read from file, I only have the one record to interrogate along with the last IDField processed so I have something to create a break on. This being the ID. Since this is all done in the context of one flow how can I direct the output to close the current DATA loop and ID, and start a new ID based on finding a new value in IDField?
One record per line = ([Group 1][IDField=1][DATA-1] )
The rights side XMLNSC DATA is defined as [1..*]
The file is so large that I cannot read the entire file into the Tree and do a for each on the ID as Primary input and then loop through the matching data as supplementary mapped data.
So when reading a file one record at a time and I want to combine records into matching ID level on Right side XML map. |
|
Back to top |
|
 |
dogorsy |
Posted: Tue Jul 30, 2013 10:30 am Post subject: |
|
|
Knight
Joined: 13 Mar 2013 Posts: 553 Location: Home Office
|
Are you trying to do it using a mapping node ? if so, that is the wrong approach. Use a compute node. |
|
Back to top |
|
 |
GARNOLD5551212 |
Posted: Tue Jul 30, 2013 11:23 am Post subject: |
|
|
Novice
Joined: 23 Jul 2013 Posts: 13
|
That was what I was trying to avoid. I was pretty sure it could be coded in ESQL or Java Compute, but I wanted to stay in a pure graphical Mapping if possible. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jul 30, 2013 11:40 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
If each record is a new invocation of the flow, you need to store the "current" group id in global cache, or you need to stick it in a database or etc where you can read it so you can check if it's the next group id or not.
You can't access global cache from Mapping node, without writing ESQL or java compute.
This is really a collection pattern. You should consider using collector node to assemble the records that belong to each group and then output those at once. |
|
Back to top |
|
 |
dogorsy |
Posted: Tue Jul 30, 2013 9:44 pm Post subject: |
|
|
Knight
Joined: 13 Mar 2013 Posts: 553 Location: Home Office
|
mqjeff wrote: |
If each record is a new invocation of the flow, you need to store the "current" group id in global cache, or you need to stick it in a database or etc where you can read it so you can check if it's the next group id or not.
You can't access global cache from Mapping node, without writing ESQL or java compute.
This is really a collection pattern. You should consider using collector node to assemble the records that belong to each group and then output those at once. |
agree, but why use a collector node when the whole file can be read ( rather than a record at a time ). then use a compute node to loop through the records and create the xml output. while looping, the consumed input records can be deleted to free up memory. but you may be right, if several output records are required (i.e. one per group) |
|
Back to top |
|
 |
kimbert |
Posted: Wed Jul 31, 2013 12:58 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
why use a collector node when the whole file can be read ( rather than a record at a time ). |
Because the file might get very large?
If it was me, I would write a few lines of ESQL to do this. The current group id could be stored in a SHARED ROW variable. It's not graphical mapping, but it's not many lines of code either. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
dogorsy |
Posted: Wed Jul 31, 2013 1:31 am Post subject: |
|
|
Knight
Joined: 13 Mar 2013 Posts: 553 Location: Home Office
|
kimbert wrote: |
Quote: |
why use a collector node when the whole file can be read ( rather than a record at a time ). |
Because the file might get very large?
If it was me, I would write a few lines of ESQL to do this. The current group id could be stored in a SHARED ROW variable. It's not graphical mapping, but it's not many lines of code either. |
yes, agree. But not only the current group id needs to be stored in a shared variable, the output xml needs to be, until the group is complete ( I know that is what you meant Tim, that's why you said ROW, but just clarifying.)
Having said that, if it is known that the file size will not be large, then life would be a lot easier by reading the whole file. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jul 31, 2013 4:45 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
dogorsy wrote: |
Having said that, if it is known that the file size will not be large, then life would be a lot easier by reading the whole file. |
I have a great suspicion of things which are "known". In a lot of instances what is "known" is not in fact true a few years later (a file which is "never" more than 10 Mb grows to 100Mb a year later as the business evolves) or is not in fact known at all.
Case in point: one of my customers developed a flow which read the whole file for exactly the reason given here; they needed to group records & it was "known" that the file in question never contained more than 30 or so business accounts with never more than 50 transactions per account per day. They got through 3 months of QA & business testing & went live with much fanfare, then blew out my broker processing a file the day after. Investigation quickly revealed the broker had run out of memory trying to swallow their file whole, the file was in fact huge & 1 account in it had 7,500 transactions. When the business area was queried, and reminded that they'd been certain about the 50 transaction limit, they replied:
"Oh yes, never more than 50. Apart from 5 or 6 accounts we have to handle manually because they have thousands of transactions. No, the people you were speaking to probably didn't know about them because they always go through the exception process. We're really glad your new system is in place because those accounts are a real pain to deal with". _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
dogorsy |
Posted: Wed Jul 31, 2013 7:32 am Post subject: |
|
|
Knight
Joined: 13 Mar 2013 Posts: 553 Location: Home Office
|
Vitor wrote: |
dogorsy wrote: |
Having said that, if it is known that the file size will not be large, then life would be a lot easier by reading the whole file. |
I have a great suspicion of things which are "known". In a lot of instances what is "known" is not in fact true a few years later (a file which is "never" more than 10 Mb grows to 100Mb a year later as the business evolves) or is not in fact known at all.
Case in point: one of my customers developed a flow which read the whole file for exactly the reason given here; they needed to group records & it was "known" that the file in question never contained more than 30 or so business accounts with never more than 50 transactions per account per day. They got through 3 months of QA & business testing & went live with much fanfare, then blew out my broker processing a file the day after. Investigation quickly revealed the broker had run out of memory trying to swallow their file whole, the file was in fact huge & 1 account in it had 7,500 transactions. When the business area was queried, and reminded that they'd been certain about the 50 transaction limit, they replied:
"Oh yes, never more than 50. Apart from 5 or 6 accounts we have to handle manually because they have thousands of transactions. No, the people you were speaking to probably didn't know about them because they always go through the exception process. We're really glad your new system is in place because those accounts are a real pain to deal with". |
Nice one !  |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Jul 31, 2013 7:35 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
dogorsy wrote: |
Vitor wrote: |
dogorsy wrote: |
Having said that, if it is known that the file size will not be large, then life would be a lot easier by reading the whole file. |
I have a great suspicion of things which are "known". In a lot of instances what is "known" is not in fact true a few years later (a file which is "never" more than 10 Mb grows to 100Mb a year later as the business evolves) or is not in fact known at all.
Case in point: one of my customers developed a flow which read the whole file for exactly the reason given here; they needed to group records & it was "known" that the file in question never contained more than 30 or so business accounts with never more than 50 transactions per account per day. They got through 3 months of QA & business testing & went live with much fanfare, then blew out my broker processing a file the day after. Investigation quickly revealed the broker had run out of memory trying to swallow their file whole, the file was in fact huge & 1 account in it had 7,500 transactions. When the business area was queried, and reminded that they'd been certain about the 50 transaction limit, they replied:
"Oh yes, never more than 50. Apart from 5 or 6 accounts we have to handle manually because they have thousands of transactions. No, the people you were speaking to probably didn't know about them because they always go through the exception process. We're really glad your new system is in place because those accounts are a real pain to deal with". |
Nice one !  |
It's not actually nice. It happens *all of the time*, out in the real world, rather than sitting in a lab somewhere in the south of blighty.
"Requirements? Yes, we should have some of those. We need you to go live in two weeks, so keep building the software.". |
|
Back to top |
|
 |
dogorsy |
Posted: Wed Jul 31, 2013 7:45 am Post subject: |
|
|
Knight
Joined: 13 Mar 2013 Posts: 553 Location: Home Office
|
mqjeff wrote: |
dogorsy wrote: |
Vitor wrote: |
dogorsy wrote: |
Having said that, if it is known that the file size will not be large, then life would be a lot easier by reading the whole file. |
I have a great suspicion of things which are "known". In a lot of instances what is "known" is not in fact true a few years later (a file which is "never" more than 10 Mb grows to 100Mb a year later as the business evolves) or is not in fact known at all.
Case in point: one of my customers developed a flow which read the whole file for exactly the reason given here; they needed to group records & it was "known" that the file in question never contained more than 30 or so business accounts with never more than 50 transactions per account per day. They got through 3 months of QA & business testing & went live with much fanfare, then blew out my broker processing a file the day after. Investigation quickly revealed the broker had run out of memory trying to swallow their file whole, the file was in fact huge & 1 account in it had 7,500 transactions. When the business area was queried, and reminded that they'd been certain about the 50 transaction limit, they replied:
"Oh yes, never more than 50. Apart from 5 or 6 accounts we have to handle manually because they have thousands of transactions. No, the people you were speaking to probably didn't know about them because they always go through the exception process. We're really glad your new system is in place because those accounts are a real pain to deal with". |
Nice one !  |
It's not actually nice. It happens *all of the time*, out in the real world, rather than sitting in a lab somewhere in the south of blighty.
"Requirements? Yes, we should have some of those. We need you to go live in two weeks, so keep building the software.". |
sorry, I was being sarcastic. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jul 31, 2013 8:00 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
dogorsy wrote: |
sorry, I was being sarcastic. |
We need a better emoticon for that. A lot of us could make use of it. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
GARNOLD5551212 |
Posted: Thu Oct 17, 2013 12:18 pm Post subject: |
|
|
Novice
Joined: 23 Jul 2013 Posts: 13
|
Just to close this loop, my solution for this large file processing was a mix of Java Compute Node and maps to tmp files. I mapped the new Id Field from each record as it was read to the LocalEnv->Variables.
Then used a Java Compute node to store the CurrentID in a Class Static variable and detect when it changed. That let me use a route node to decide if I needed to close this sub group and append to my final output file. This let me group the ID's and only write about 10-15 lines of Java. The flow used a total of 20 out of the box nodes and one Java Compute. Memory use has stayed very low.
Thanks to all that commented. |
|
Back to top |
|
 |
|