Author |
Message
|
cwazpitt3 |
Posted: Wed Mar 21, 2012 4:22 am Post subject: Other ideas for counting lines of large file |
|
|
Acolyte
Joined: 31 Aug 2011 Posts: 61
|
Hey gurus,
I have a requirement (which I don't really think should be handled by MB, but I digress) to perform auditing ONLY on large (~75MB) inbound files and then route over to an outbound folder only if the auditing passes. I have a working solution that looks like this:
FileInput (whole file) --> ESQL which calls External Java Function (pass BLOB read/count lines and pass back count and last line i.e. trailer for auditing) --> FileOutput (whole file)
This solution works, but it eats up a fair amount of memory while processing due to the large file being read and wrote (I assume). The longest delay seems to be writing the outbound file (which is just a copyEntireMessage pass through). Most of the memory seems to be released after processing is complete, so maybe this solution is OK, but I was simply wondering if anyone had any other clever ideas for solving this problem.
I had couple other ideas such as different Java processing that just passed in the filename (not BLOB) or use a shell script to do a wc -l [filename], however, by nature when the FileInput node picks up the file, it puts it in the mqsitransitin folder and renames it to a pattern that I don't think I can obtain at runtime (seems like GUID-ogfilename). So I cannot actually access the original file to run any of the aforementioned techniques against it and I have no other way of triggering the flow.
Any other thoughts or ideas? Thanks in advance! |
|
Back to top |
|
 |
Vitor |
Posted: Wed Mar 21, 2012 4:45 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
The memory's being chewed by because you're using BLOB, hence the entire file's being loaded. As the Java's counting lines these files must have a line-based structure (I imagine <CR> delimited).
Hence you can build a message set that reads lines with no account of their structure, read and copy the file a line at a time (counting as you go) until you reach the last line (end of file on input), use this trailer and the count for whatever the audit is, finish the file if it's correct or rollback if it's not.
Other, more elegant solutions are undoubtably available but I've only had 1 so far this morning. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
cwazpitt3 |
Posted: Wed Mar 21, 2012 4:48 am Post subject: |
|
|
Acolyte
Joined: 31 Aug 2011 Posts: 61
|
Vitor wrote: |
The memory's being chewed by because you're using BLOB, hence the entire file's being loaded. As the Java's counting lines these files must have a line-based structure (I imagine <CR> delimited).
Hence you can build a message set that reads lines with no account of their structure, read and copy the file a line at a time (counting as you go) until you reach the last line (end of file on input), use this trailer and the count for whatever the audit is, finish the file if it's correct or rollback if it's not.
Other, more elegant solutions are undoubtably available but I've only had 1 so far this morning. |
I'm on my second cup so life is a little better here
I have already tried the approach you suggested. It was VERY slow...like it took about 12 mins to get through 100k of the 250k lines. While this is better at memory consumption, for such a silly requirement, I hate to eat CPU and spend all that time. Just bugs me  |
|
Back to top |
|
 |
Vitor |
Posted: Wed Mar 21, 2012 5:01 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
cwazpitt3 wrote: |
I have already tried the approach you suggested. It was VERY slow...like it took about 12 mins to get through 100k of the 250k lines. |
Really? Where did the user trace say the time was going? Unless you're running this on a server you have to wind with a key every few hours I'd expect better than that. Works out to around 140 TPS where each transaction is "oh look, a record" which isn't that sparkly. How long does the Java take?
I do agree that it's a silly requirement for WMB; if there was some kind of transformational value add maybe but still, soon someone will send a significantly larger file and you'll be able to justify more memory.
(Don't give me the "the files are always 75Mb or so and not expected to grow significantly"; I've heard that one. They either grow and the users act all surprised, or there's a problem upstream and they combine 1-n files together) _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Esa |
Posted: Wed Mar 21, 2012 5:06 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
cwazpitt3 wrote: |
I have already tried the approach you suggested. It was VERY slow...like it took about 12 mins to get through 100k of the 250k lines. While this is better at memory consumption, for such a silly requirement, I hate to eat CPU and spend all that time. Just bugs me  |
It will certainly be very slow if you don't apply techniques presented in the large message processing sample... |
|
Back to top |
|
 |
cwazpitt3 |
Posted: Wed Mar 21, 2012 5:09 am Post subject: |
|
|
Acolyte
Joined: 31 Aug 2011 Posts: 61
|
Vitor wrote: |
Really? Where did the user trace say the time was going? Unless you're running this on a server you have to wind with a key every few hours I'd expect better than that. Works out to around 140 TPS where each transaction is "oh look, a record" which isn't that sparkly. How long does the Java take? |
I never took a user trace. This is a DEV server, but it shouldn't be terrible. I could run again. Do you have any good docs or anything on User Trace? I feel like I don't really understand its full capabilities and therefore don't utilize it enough. As for the Java, it takes about 6 minutes total to do the process I outlined...not bad IMO.
Vitor wrote: |
I do agree that it's a silly requirement for WMB; if there was some kind of transformational value add maybe but still, soon someone will send a significantly larger file and you'll be able to justify more memory. |
Nope, no transformation just audit...stupid!
Vitor wrote: |
(Don't give me the "the files are always 75Mb or so and not expected to grow significantly"; I've heard that one. They either grow and the users act all surprised, or there's a problem upstream and they combine 1-n files together) |
I completely agree and I have warned that once we get to 100MB, its a deal breaker if running it whole file, am I right? I don't ever trust file size estimates. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Mar 21, 2012 5:14 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
cwazpitt3 wrote: |
Vitor wrote: |
Really? Where did the user trace say the time was going? Unless you're running this on a server you have to wind with a key every few hours I'd expect better than that. Works out to around 140 TPS where each transaction is "oh look, a record" which isn't that sparkly. How long does the Java take? |
I never took a user trace. This is a DEV server, but it shouldn't be terrible. I could run again. Do you have any good docs or anything on User Trace? I feel like I don't really understand its full capabilities and therefore don't utilize it enough. |
http://www-01.ibm.com/support/docview.wss?&uid=swg21177321 |
|
Back to top |
|
 |
cwazpitt3 |
Posted: Wed Mar 21, 2012 5:15 am Post subject: |
|
|
Acolyte
Joined: 31 Aug 2011 Posts: 61
|
Esa wrote: |
It will certainly be very slow if you don't apply techniques presented in the large message processing sample... |
@Esa, what sample are you referring to? The batch processing? |
|
Back to top |
|
 |
cwazpitt3 |
Posted: Wed Mar 21, 2012 5:17 am Post subject: |
|
|
Acolyte
Joined: 31 Aug 2011 Posts: 61
|
mqjeff wrote: |
cwazpitt3 wrote: |
Vitor wrote: |
Really? Where did the user trace say the time was going? Unless you're running this on a server you have to wind with a key every few hours I'd expect better than that. Works out to around 140 TPS where each transaction is "oh look, a record" which isn't that sparkly. How long does the Java take? |
I never took a user trace. This is a DEV server, but it shouldn't be terrible. I could run again. Do you have any good docs or anything on User Trace? I feel like I don't really understand its full capabilities and therefore don't utilize it enough. |
http://www-01.ibm.com/support/docview.wss?&uid=swg21177321 |
Thanks @mqjeff. Now I have to see if I can actually run this command since where I am we have a shared dev environment and I cannot run mqsi commands on my own What a pain! |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Mar 21, 2012 5:20 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
cwazpitt3 wrote: |
Thanks @mqjeff. Now I have to see if I can actually run this command since where I am we have a shared dev environment and I cannot run mqsi commands on my own What a pain! |
Do you have access to the Broker Explorer?
It will enable trace for you, even if it won't run mqsireadlog/mqsiformatlog.
If you spend enough time asking your dev server admin to run these commands, like five or six times a day, I"m sure they will find a way to allow you to run them on your own... |
|
Back to top |
|
 |
cwazpitt3 |
Posted: Wed Mar 21, 2012 5:22 am Post subject: |
|
|
Acolyte
Joined: 31 Aug 2011 Posts: 61
|
mqjeff wrote: |
If you spend enough time asking your dev server admin to run these commands, like five or six times a day, I"m sure they will find a way to allow you to run them on your own... |
No we don't have access to Broker Explorer. I completly agree with your solution of bugging the admin...that's my plan. |
|
Back to top |
|
 |
Esa |
Posted: Wed Mar 21, 2012 5:24 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
cwazpitt3 wrote: |
Esa wrote: |
It will certainly be very slow if you don't apply techniques presented in the large message processing sample... |
@Esa, what sample are you referring to? The batch processing? |
The sample called Large Messaging. |
|
Back to top |
|
 |
cwazpitt3 |
Posted: Wed Mar 21, 2012 5:28 am Post subject: |
|
|
Acolyte
Joined: 31 Aug 2011 Posts: 61
|
cwazpitt3 wrote: |
mqjeff wrote: |
If you spend enough time asking your dev server admin to run these commands, like five or six times a day, I"m sure they will find a way to allow you to run them on your own... |
No we don't have access to Broker Explorer. I completly agree with your solution of bugging the admin...that's my plan. |
I do have the ability to add trace nodes with user trace setting and they have exposed a way to format the logs so I can access them. This is different than the overall flow trace though, right? If I used this approach, any suggestions where I might put the trace and what I might put in it to see where the time is being spent? Too much trace can slow it down, right? |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Mar 21, 2012 5:34 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
cwazpitt3 wrote: |
cwazpitt3 wrote: |
mqjeff wrote: |
If you spend enough time asking your dev server admin to run these commands, like five or six times a day, I"m sure they will find a way to allow you to run them on your own... |
No we don't have access to Broker Explorer. I completly agree with your solution of bugging the admin...that's my plan. |
I do have the ability to add trace nodes with user trace setting and they have exposed a way to format the logs so I can access them. This is different than the overall flow trace though, right? If I used this approach, any suggestions where I might put the trace and what I might put in it to see where the time is being spent? Too much trace can slow it down, right? |
No? I don't think it's different? Trace nodes with 'user trace' output write into the same logs that mqsireadlog accesses and mqsiformatlog formats, and the request to read those logs at the user trace level (rather than service trace level) will report all of the rest of the flow level information as well as the data you have specifically added to the user trace with your trace nodes.
So I think you're good to use this to see the full flow execution. |
|
Back to top |
|
 |
marko.pitkanen |
Posted: Wed Mar 21, 2012 6:08 am Post subject: |
|
|
 Chevalier
Joined: 23 Jul 2008 Posts: 440 Location: Jamsa, Finland
|
Hi cwazpitt3,
Just to check did you set up your FileInput node to read file line by line?
Code: |
Records and Elements
Record detection = Delimeted
Delimeter = DOS or UNIX Line End |
--
Marko |
|
Back to top |
|
 |
|