Author |
Message
|
yaakovd |
Posted: Thu Feb 11, 2010 9:16 am Post subject: Urgent! Very big XML processing |
|
|
Partisan
Joined: 20 Jan 2003 Posts: 319 Location: Israel
|
Hi
I have few scenarios require processing/generation of huge XML files 300-700 MB. According to my experience even 4 MB XML requires huge memeory allocation in MB.
Will appreciate best practice and patterns to handle:
1. Reading huge XML (in portions?)
2. Generation of big XML (e.g. from flat file)
3. Sorting within XML or generated output
Additional fact - client is Windows oriented and preferrably uses starter edition (limited to 2 CPU and single exeqution group). _________________ Best regards.
Yaakov
SWG, IBM Commerce, Israel |
|
Back to top |
|
 |
Gaya3 |
Posted: Thu Feb 11, 2010 9:28 am Post subject: Re: Urgent! Very big XML processing |
|
|
 Jedi
Joined: 12 Sep 2006 Posts: 2493 Location: Boston, US
|
yaakovd wrote: |
Hi
I have few scenarios require processing/generation of huge XML files 300-700 MB. According to my experience even 4 MB XML requires huge memeory allocation in MB.
Will appreciate best practice and patterns to handle:
1. Reading huge XML (in portions?)
2. Generation of big XML (e.g. from flat file)
3. Sorting within XML or generated output
Additional fact - client is Windows oriented and preferrably uses starter edition (limited to 2 CPU and single exeqution group). |
XML in portions or splitting the same xml in to number of portions, but here we have to understand about the XML business Data.
say if you are getting number of records in a single XML, we could think of dividing those. _________________ Regards
Gayathri
-----------------------------------------------
Do Something Before you Die |
|
Back to top |
|
 |
Vitor |
Posted: Thu Feb 11, 2010 10:28 am Post subject: Re: Urgent! Very big XML processing |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
yaakovd wrote: |
I have few scenarios require processing/generation of huge XML files 300-700 MB. According to my experience even 4 MB XML requires huge memeory allocation in MB. |
This has been discussed a few times in here (The Search Facility Is Your Friend) and there's a developerworks article somewhere that talks about this.
In summary, make sure you have the parsing set to on demand, don't use [index] to access the XML (which you shouldn't really be doing anyway) and prune the tree once you've processed a given section.
Have fun.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Feb 11, 2010 10:30 am Post subject: Re: Urgent! Very big XML processing |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Gaya3 wrote: |
say if you are getting number of records in a single XML, we could think of dividing those. |
You'd still need to bring the entire message in so that you could PROPOGATE the individual records. But yes, this is a good way of handling the situation if there's no affinity between XML stanzas & doesn't contradict what I said above (in this example you'd remove the given record once it was propogated). _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Feb 11, 2010 1:45 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
|
Back to top |
|
 |
Vitor |
Posted: Thu Feb 11, 2010 1:54 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
kimbert wrote: |
http://www-128.ibm.com/developerworks/websphere/library/techarticles/0505_storey/0505_storey.html |
This time I must remember to bookmark this!  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
yaakovd |
Posted: Thu Feb 11, 2010 3:50 pm Post subject: |
|
|
Partisan
Joined: 20 Jan 2003 Posts: 319 Location: Israel
|
Hi ALL
thanks for replies and basics of working with mesage tree.
It really helps with 5 MB messages.
Of course I tried to find something helpfull on search.
My question if anybody had experience working with 500 MB?
Any idea how long it may take on 2 CPU / 8 GB WIN machine if at all...
I can think also about SAX based input plugin... _________________ Best regards.
Yaakov
SWG, IBM Commerce, Israel |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Feb 11, 2010 7:16 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
yaakovd wrote: |
Hi ALL
thanks for replies and basics of working with mesage tree.
It really helps with 5 MB messages.
Of course I tried to find something helpfull on search.
My question if anybody had experience working with 500 MB?
Any idea how long it may take on 2 CPU / 8 GB WIN machine if at all...
I can think also about SAX based input plugin... |
In my experience a 500 MB message seldom contains a single atomic transaction. Cut your message down to single atomic transaction size and put those into the input queue of the real flow...
If you cannot use a file input node, do like Jeff & Vitor said see their link . Parsing on demand only, use references and prune each parsed node from the tree after propagation.
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
jzhang2009 |
Posted: Fri Feb 12, 2010 5:05 pm Post subject: re: large XML |
|
|
Newbie
Joined: 12 Feb 2010 Posts: 1
|
Have you looked at vtd-xml, sounds like you definitely want to check it out? |
|
Back to top |
|
 |
Amitha |
Posted: Sat Feb 13, 2010 6:01 am Post subject: |
|
|
 Voyager
Joined: 20 Nov 2009 Posts: 80 Location: Newyork
|
VTD-XML seems to improve XML parsing performance and memory usage compared to DOM or SAX. I think WMB XMLNSC parser is very good in performance and it is a C++ engine.In my view VTD-XML Parser is something which WESB can make use of, not WMB. |
|
Back to top |
|
 |
newtobroker |
Posted: Sat Feb 13, 2010 10:23 am Post subject: |
|
|
Novice
Joined: 04 Feb 2010 Posts: 23
|
one option that we are trying is to dynamically delete the tags of huge xmls as we complete its processing... not sure if it applies to your business requirement.
Thanks,
c* |
|
Back to top |
|
 |
|