MQSeries.net :: View topic - Other ideas for counting lines of large file

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Other ideas for counting lines of large file

Goto page 1, 2, 3 Next

Other ideas for counting lines of large file

« View previous topic :: View next topic »

Author

Message

cwazpitt3

Posted: Wed Mar 21, 2012 4:22 am Post subject: Other ideas for counting lines of large file

Acolyte

Joined: 31 Aug 2011
Posts: 61

Hey gurus,

I have a requirement (which I don't really think should be handled by MB, but I digress) to perform auditing ONLY on large (~75MB) inbound files and then route over to an outbound folder only if the auditing passes. I have a working solution that looks like this:

FileInput (whole file) --> ESQL which calls External Java Function (pass BLOB read/count lines and pass back count and last line i.e. trailer for auditing) --> FileOutput (whole file)

This solution works, but it eats up a fair amount of memory while processing due to the large file being read and wrote (I assume). The longest delay seems to be writing the outbound file (which is just a copyEntireMessage pass through). Most of the memory seems to be released after processing is complete, so maybe this solution is OK, but I was simply wondering if anyone had any other clever ideas for solving this problem.

I had couple other ideas such as different Java processing that just passed in the filename (not BLOB) or use a shell script to do a wc -l [filename], however, by nature when the FileInput node picks up the file, it puts it in the mqsitransitin folder and renames it to a pattern that I don't think I can obtain at runtime (seems like GUID-ogfilename). So I cannot actually access the original file to run any of the aforementioned techniques against it and I have no other way of triggering the flow.

Any other thoughts or ideas? Thanks in advance!

Vitor

Posted: Wed Mar 21, 2012 4:45 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

The memory's being chewed by because you're using BLOB, hence the entire file's being loaded. As the Java's counting lines these files must have a line-based structure (I imagine <CR> delimited).

Hence you can build a message set that reads lines with no account of their structure, read and copy the file a line at a time (counting as you go) until you reach the last line (end of file on input), use this trailer and the count for whatever the audit is, finish the file if it's correct or rollback if it's not.

Other, more elegant solutions are undoubtably available but I've only had 1

so far this morning.
_________________
Honesty is the best policy.
Insanity is the best defence.

cwazpitt3

Posted: Wed Mar 21, 2012 4:48 am Post subject:

Acolyte

Joined: 31 Aug 2011
Posts: 61

Vitor wrote:

so far this morning.

I'm on my second cup

so life is a little better here

I have already tried the approach you suggested. It was VERY slow...like it took about 12 mins to get through 100k of the 250k lines. While this is better at memory consumption, for such a silly requirement, I hate to eat CPU and spend all that time. Just bugs me

Vitor

Posted: Wed Mar 21, 2012 5:01 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

cwazpitt3 wrote:

I have already tried the approach you suggested. It was VERY slow...like it took about 12 mins to get through 100k of the 250k lines.

Really? Where did the user trace say the time was going? Unless you're running this on a server you have to wind with a key every few hours I'd expect better than that. Works out to around 140 TPS where each transaction is "oh look, a record" which isn't that sparkly. How long does the Java take?

I do agree that it's a silly requirement for WMB; if there was some kind of transformational value add maybe but still, soon someone will send a significantly larger file and you'll be able to justify more memory.

(Don't give me the "the files are always 75Mb or so and not expected to grow significantly"; I've heard that one. They either grow and the users act all surprised, or there's a problem upstream and they combine 1-n files together)
_________________
Honesty is the best policy.
Insanity is the best defence.

Esa

Posted: Wed Mar 21, 2012 5:06 am Post subject:

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

cwazpitt3 wrote:

It will certainly be very slow if you don't apply techniques presented in the large message processing sample...

cwazpitt3

Posted: Wed Mar 21, 2012 5:09 am Post subject:

Acolyte

Joined: 31 Aug 2011
Posts: 61

Vitor wrote:

I never took a user trace. This is a DEV server, but it shouldn't be terrible. I could run again. Do you have any good docs or anything on User Trace? I feel like I don't really understand its full capabilities and therefore don't utilize it enough. As for the Java, it takes about 6 minutes total to do the process I outlined...not bad IMO.

Vitor wrote:

I do agree that it's a silly requirement for WMB; if there was some kind of transformational value add maybe but still, soon someone will send a significantly larger file and you'll be able to justify more memory.

Nope, no transformation just audit...stupid!

Vitor wrote:

(Don't give me the "the files are always 75Mb or so and not expected to grow significantly"; I've heard that one. They either grow and the users act all surprised, or there's a problem upstream and they combine 1-n files together)

I completely agree and I have warned that once we get to 100MB, its a deal breaker if running it whole file, am I right? I don't ever trust file size estimates.

mqjeff

Posted: Wed Mar 21, 2012 5:14 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

cwazpitt3 wrote:

Vitor wrote:

http://www-01.ibm.com/support/docview.wss?&uid=swg21177321

cwazpitt3

Posted: Wed Mar 21, 2012 5:15 am Post subject:

Acolyte

Joined: 31 Aug 2011
Posts: 61

Esa wrote:

It will certainly be very slow if you don't apply techniques presented in the large message processing sample...

@Esa, what sample are you referring to? The batch processing?

cwazpitt3

Posted: Wed Mar 21, 2012 5:17 am Post subject:

Acolyte

Joined: 31 Aug 2011
Posts: 61

mqjeff wrote:

cwazpitt3 wrote:

Vitor wrote:

http://www-01.ibm.com/support/docview.wss?&uid=swg21177321

Thanks @mqjeff. Now I have to see if I can actually run this command since where I am we have a shared dev environment and I cannot run mqsi commands on my own

What a pain!

mqjeff

Posted: Wed Mar 21, 2012 5:20 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

cwazpitt3 wrote:

Thanks @mqjeff. Now I have to see if I can actually run this command since where I am we have a shared dev environment and I cannot run mqsi commands on my own

What a pain!

Do you have access to the Broker Explorer?
It will enable trace for you, even if it won't run mqsireadlog/mqsiformatlog.

If you spend enough time asking your dev server admin to run these commands, like five or six times a day, I"m sure they will find a way to allow you to run them on your own...

cwazpitt3

Posted: Wed Mar 21, 2012 5:22 am Post subject:

Acolyte

Joined: 31 Aug 2011
Posts: 61

mqjeff wrote:

If you spend enough time asking your dev server admin to run these commands, like five or six times a day, I"m sure they will find a way to allow you to run them on your own...

No we don't have access to Broker Explorer. I completly agree with your solution of bugging the admin...that's my plan.

Esa

Posted: Wed Mar 21, 2012 5:24 am Post subject:

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

cwazpitt3 wrote:

Esa wrote:

It will certainly be very slow if you don't apply techniques presented in the large message processing sample...

@Esa, what sample are you referring to? The batch processing?

The sample called Large Messaging.

cwazpitt3

Posted: Wed Mar 21, 2012 5:28 am Post subject:

Acolyte

Joined: 31 Aug 2011
Posts: 61

cwazpitt3 wrote:

mqjeff wrote:

If you spend enough time asking your dev server admin to run these commands, like five or six times a day, I"m sure they will find a way to allow you to run them on your own...

No we don't have access to Broker Explorer. I completly agree with your solution of bugging the admin...that's my plan.

I do have the ability to add trace nodes with user trace setting and they have exposed a way to format the logs so I can access them. This is different than the overall flow trace though, right? If I used this approach, any suggestions where I might put the trace and what I might put in it to see where the time is being spent? Too much trace can slow it down, right?

mqjeff

Posted: Wed Mar 21, 2012 5:34 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

cwazpitt3 wrote:

mqjeff wrote:

If you spend enough time asking your dev server admin to run these commands, like five or six times a day, I"m sure they will find a way to allow you to run them on your own...

No we don't have access to Broker Explorer. I completly agree with your solution of bugging the admin...that's my plan.

No? I don't think it's different? Trace nodes with 'user trace' output write into the same logs that mqsireadlog accesses and mqsiformatlog formats, and the request to read those logs at the user trace level (rather than service trace level) will report all of the rest of the flow level information as well as the data you have specifically added to the user trace with your trace nodes.

So I think you're good to use this to see the full flow execution.

marko.pitkanen

Posted: Wed Mar 21, 2012 6:08 am Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi cwazpitt3,

Just to check did you set up your FileInput node to read file line by line?

Code:

Records and Elements
Record detection = Delimeted
Delimeter = DOS or UNIX Line End

--
Marko

Display posts from previous:

Goto page 1, 2, 3 Next

Page 1 of 3

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Other ideas for counting lines of large file

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP