MQSeries.net :: View topic - creating xml messages with data from mulitiple files

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » creating xml messages with data from mulitiple files

creating xml messages with data from mulitiple files

« View previous topic :: View next topic »

Author

Message

anild

Posted: Tue Apr 07, 2009 11:14 pm Post subject: creating xml messages with data from mulitiple files

Novice

Joined: 19 Sep 2007
Posts: 13

I am having data from two different files and I need to construct a xml messages. The scenario is similar to construct an xml message from two database tables.we can't use buffers because the file sizes are large (ex: files size 50MB).

Can any one suggest how to handle this kind of scenario's.

Thanks in advance.

Vitor

Posted: Tue Apr 07, 2009 11:54 pm Post subject: Re: creating xml messages with data from mulitiple files

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

anild wrote:

we can't use buffers because the file sizes are large (ex: files size 50MB).

That's not a big file.
_________________
Honesty is the best policy.
Insanity is the best defence.

anild

Posted: Wed Apr 08, 2009 1:23 am Post subject: Re: creating xml messages with data from mulitiple files

Novice

Joined: 19 Sep 2007
Posts: 13

Vitor wrote:

anild wrote:

we can't use buffers because the file sizes are large (ex: files size 50MB).

That's not a big file.

Invoking MRM parser it will take very longer time. because each file is containing around 900000 + records storing into system memory(setting into environment variable) its very system memory consuming.

The scenario will be like this:
we are receiving 2 .txt files from FTP, each file size around 50MB and around 900000+ rows in each file.
We need to construct xml message using 2 files with m*n data rows into individual xml message.

WMBDEV1

Posted: Wed Apr 08, 2009 1:25 am Post subject: Re: creating xml messages with data from mulitiple files

Sentinel

Joined: 05 Mar 2009
Posts: 888
Location: UK

Vitor wrote:

anild wrote:

we can't use buffers because the file sizes are large (ex: files size 50MB).

That's not a big file.

I agree but you will need to think about how you handle the memory within broker. This should provide some pointers.....

http://www.ibm.com/developerworks/websphere/library/techarticles/0505_storey/0505_storey.html

Vitor

Posted: Wed Apr 08, 2009 1:33 am Post subject: Re: creating xml messages with data from mulitiple files

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

anild wrote:

storing into system memory(setting into environment variable) its very system memory consuming.

There's no possibility that you can construct an interim message from File A (parsing MRM -> XML) then augment the message with the contents of File B?
_________________
Honesty is the best policy.
Insanity is the best defence.

WMBDEV1

Posted: Wed Apr 08, 2009 1:33 am Post subject: Re: creating xml messages with data from mulitiple files

Sentinel

Joined: 05 Mar 2009
Posts: 888
Location: UK

anild wrote:

The scenario will be like this:
we are receiving 2 .txt files from FTP, each file size around 50MB and around 900000+ rows in each file.
We need to construct xml message using 2 files with m*n data rows into individual xml message.

Seen this after my other post.

This will be difficult. Doing a quick sum of 50 * 50 (although as the input is not XML this number could be even bigger) gives you an output message / file of 2.5 GB and this doesnt include the memory required by the parsers. I think you're going to need some sort of streaming solution, especially on the output of the data and use the link before for help.

Why does the output file require a cartesian product? sounds inefficient to me. Can this be changed or broken down into smaller chunks?

Vitor

Posted: Wed Apr 08, 2009 1:36 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

You could also cheat a little, use a feeder flow to put both files in queues, read (and parse) File A from it's queue, use an MQGet to read (and parse) File B from it's queue then manipulate the output message from the 2 trees.

This would leave the door open for the files to be processed on a record by record basis later. Which the users may one day want, and you might actually need!

Just a thought, untested, willing to be shot at by anyone on this, no liability accepted for loss, damage or headaches resulting.
_________________
Honesty is the best policy.
Insanity is the best defence.

WMBDEV1

Posted: Wed Apr 08, 2009 1:44 am Post subject:

Sentinel

Joined: 05 Mar 2009
Posts: 888
Location: UK

Vitor wrote:

Sounds doable but you're still going to need to propagate chunks of the output a bit at a time else you will exhaust the heap.

Vitor

Posted: Wed Apr 08, 2009 1:52 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

WMBDEV1 wrote:

Vitor wrote:

Sounds doable but you're still going to need to propagate chunks of the output a bit at a time else you will exhaust the heap.

Even if you're not using Java nodes?

There's a central design issue here in that there are 2 files that apparently need to be processed simultaniously in memory, hence my question about augmentation.

There's an even more central question around why a site with WMB/WMQ is processing files not messages, and in large chunks to boot.

This ties back to your point that irrespective of memory you'd be "better" to take the files and propogate them into individual messages for downstream processing. I sense the spectre of affinity rising up, where the records need to be combined and processed in order......
_________________
Honesty is the best policy.
Insanity is the best defence.

WMBDEV1

Posted: Wed Apr 08, 2009 2:01 am Post subject:

Sentinel

Joined: 05 Mar 2009
Posts: 888
Location: UK

Vitor wrote:

Even if you're not using Java nodes?

Absolutely, you're really gonna stuggle to allocate the 2.5gb (estimated) output without streaming bits of it out.

Vitor wrote:

There's a central design issue here in that there are 2 files that apparently need to be processed simultaniously in memory, hence my question about augmentation.

Sure, its not nice but this is probably not the hardest thing to overcome in this case.

Quote:

There's an even more central question around why a site with WMB/WMQ is processing files not messages, and in large chunks to boot.

Not one I can answer

Quote:

This ties back to your point that irrespective of memory you'd be "better" to take the files and propogate them into individual messages for downstream processing. I sense the spectre of affinity rising up, where the records need to be combined and processed in order......

Agree again, message sementation or grouping may help.

The big issue for me remains the large size of the output though.

Vitor

Posted: Wed Apr 08, 2009 2:07 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

WMBDEV1 wrote:

Vitor wrote:

Even if you're not using Java nodes?

Absolutely, you're really gonna stuggle to allocate the 2.5gb (estimated) output without streaming bits of it out.

I didn't think so, and would welcome input from others here.

WMBDEV1 wrote:

Vitor wrote:

There's a central design issue here in that there are 2 files that apparently need to be processed simultaniously in memory, hence my question about augmentation.

Sure, its not nice but this is probably not the hardest thing to overcome in this case.

No, but it's a question the poster would find value in thinking about.

WMBDEV1 wrote:

Quote:

There's an even more central question around why a site with WMB/WMQ is processing files not messages, and in large chunks to boot.

Not one I can answer

Nor me - I was being rhetorical!

Also trying to provoke thought in the poster.

WMBDEV1 wrote:

Quote:

Agree again, message sementation may help.

The big issue for me remains the large size of the output though.

I remain unconvinced it's that much of an issue but I still agree with your point that propogation is the better way forward. Again, the poster needs to think about this, especially on how to break the affinity I suspect exists.
_________________
Honesty is the best policy.
Insanity is the best defence.

mqpaul

Posted: Wed Apr 08, 2009 5:03 am Post subject: How about a good old two-file match?

Acolyte

Joined: 14 Jan 2008
Posts: 66
Location: Hursley, UK

This may be well off mark, so my apologies in advance if it's useless. Also very sorry if this is teaching grandmother to suck eggs.

I'm assuming you're not using the Broker V6.1 file nodes. You might be able to use one File node to start you flow and process one record at a time, but I can't think of a way you could have two file nodes in the same flow, as Broker only provides a FileInput node, not a FileRead node. So I guess you're using Java.

One approach already mentioned is to copy one or both file's records to queues and then merge them, but for the purposes of this response, that's just changing the plumbing for reading records.

I presume the application is similar in structure to a ledger update, with a master file with account records, and a transaction file with 0 or more records for each account, possibly including transactions for new accounts. (Substitute other key fields for "account" if your application is not financial.)

The trick is always to sort the input (I don't have suggestions on how to do that from broker) into account number/transaction date sequence. Then you read one record from each file. If both records have the same account record, do your update (eg, increment ledger balance by transaction amount), and read the next account record. Continue while the account numbers are the same. When ledger account number is lower, emit the ledger record you just processed, and read the next ledger record. If the transaction number is lower, you have a new account, so build a new ledger record, and repeat the above process for matching account transactions. The nice bit is you only need storage for three records - the ledger record you're working on, the next one you've read, and the transaction record you're working on.

You can find lots of stuff about this in old Data Processing text books. In particular Michael Jackson (not the moonwalker

) Structured Programming.
_________________
Paul

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » creating xml messages with data from mulitiple files

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP