ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Other ideas for counting lines of large file

Post new topic  Reply to topic Goto page Previous  1, 2, 3  Next
 Other ideas for counting lines of large file « View previous topic :: View next topic » 
Author Message
cwazpitt3
PostPosted: Wed Mar 21, 2012 6:44 am    Post subject: Reply with quote

Acolyte

Joined: 31 Aug 2011
Posts: 61

marko.pitkanen wrote:
Hi cwazpitt3,

Just to check did you set up your FileInput node to read file line by line?

Code:
Records and Elements
  Record detection = Delimeted
  Delimeter = DOS or UNIX Line End


--
Marko


Yes
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Wed Mar 21, 2012 6:45 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

I ran a quick test:
FileInput -> Trace node


FileInput
Code:
Records and Elements
  Record detection = Delimeted
  Delimeter = DOS or UNIX Line End


Trace node
Code:
Destination = File
Pattern = ${CAST(CURRENT_TIMESTAMP AS CHARACTER FORMAT 'yyyyMMdd-HHmmss')}
${CAST(Root.BLOB.BLOB AS CHAR CCSID Root.Properties.CodedCharSetId)}


It took 1163 seconds to process 1 485 301 lines with this scenario. So 1277 lines per second.


--
Marko
Back to top
View user's profile Send private message Visit poster's website
mqsiuser
PostPosted: Wed Mar 21, 2012 7:10 am    Post subject: Reply with quote

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

marko.pitkanen wrote:
Trace node


How fast is it, if you use the local environment (or a shared row) for keeping a single variable (for counting the lines) and no Trace node (no trace enabled at all) ?

Wikipedia says: "Some published tests demonstrate message rates in excess of 10,000 per second in particular configurations."

We could measure against that
_________________
Just use REFERENCEs
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Mar 21, 2012 7:36 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

marko.pitkanen wrote:
So 1277 lines per second.


OS WMBv & platform details?

I did functionally the same thing on my sad little Solaris sandbox (2Gb, 1 core of a Blade T6320 as a whole root zone) and got 850 lines per second.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Mar 21, 2012 7:39 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

mqsiuser wrote:
Wikipedia says: "Some published tests demonstrate message rates in excess of 10,000 per second in particular configurations."


Yes. If the OP had the kind of configuration that could manage that, he wouldn't be posting about memory usage....
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
cwazpitt3
PostPosted: Wed Mar 21, 2012 7:51 am    Post subject: Reply with quote

Acolyte

Joined: 31 Aug 2011
Posts: 61

But I have to also write the contents to an outbound file (essentially copying the file from one location to another). Is that where my flow is spending all its time?

I did FileInput --> FileOutput and for 39,974 lines it took 504 seconds for an whopping 79 lines per second! At that rate, my 250,000 line file would take a really long time. Am I doing something wrong?

If I could get to 800 lines per second or so it would compare to the 6 minutes the Java approach takes. I might just stick with what I have.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Wed Mar 21, 2012 8:01 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

cwazpitt3 wrote:
is that where my flow is spending all its time?

the user trace will show you where your flow is spending all it's time.

Did you deploy any additional instances.... ?
Back to top
View user's profile Send private message
cwazpitt3
PostPosted: Wed Mar 21, 2012 8:03 am    Post subject: Reply with quote

Acolyte

Joined: 31 Aug 2011
Posts: 61

mqjeff wrote:
the user trace will show you where your flow is spending all it's time.

Did you deploy any additional instances.... ?


I am working on the user trace thing. I don't think just putting a trace node in give me any other information about the flow.

No additional instances.
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Wed Mar 21, 2012 8:20 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

I made a quick review of the user trace and found that it took most of the time with Trace node. Changed test flow to use compute and fileoutput nodes to produce the same functionality. Now running the same test.

--
Marko
Back to top
View user's profile Send private message Visit poster's website
marko.pitkanen
PostPosted: Wed Mar 21, 2012 11:22 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

Some how I messed up with Compute and FileOutput nodes and made test flow a lot slower. I need to find why. But with Compute node and SHARED INT I managed to count 1 485 300 lines in 979 seconds.

--
Marko
Back to top
View user's profile Send private message Visit poster's website
marko.pitkanen
PostPosted: Wed Mar 21, 2012 12:22 pm    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

From my quick test It seems that something takes the most of the time happens between building the new parser and reading the record for every line while reading line by line and counting them from a local file.

Code:

2012-03-21 21:34:51.216648       12   UserTrace   BIP6064I: A parser of type ''BLOB'' was created on behalf of node 'readFileLineByLine.File Input' to handle the input stream, beginning at offset '0'. The parser type was selected based on value ''NONE'' from the previous parser.
2012-03-21 21:34:51.217308 =  660    12   UserTrace   BIP3352I: ''FileInput'' node ''File Input'' in message flow ''readFileLineByLine'' is propagating record ''5'' obtained from file ''/home/xxxxx/mqsitransitin/yyy-readFileLineByLine.in'' at offset ''132'' to terminal ''out''.
                                       The FileInput node read a record from the file, and will propagate it to the named terminal.
                                       No action is required.
2012-03-21 21:34:51.217356 =  708     12   UserTrace   BIP3907I: Message received and propagated to 'out' terminal of input node 'readFileLineByLine.File Input'.
2012-03-21 21:34:51.217404 =  756   12   UserTrace   BIP6063I: A parser of type ''Properties'' was created on behalf of node 'readFileLineByLine.File Input' to handle the input stream, beginning at offset '0'.
2012-03-21 21:34:51.217444 =  796   12   UserTrace   BIP6069W: The broker is not capable of handling a message of data type ''BLOB''.
                                       The message broker received a message that requires the handling of data of type ''BLOB'', but the broker does not have the capability to handle data of this type.
                                       Check both the message being sent to the message broker and the configuration data for the node. References to the unsupported data type must be removed if the message is to be processed by the broker.
2012-03-21 21:34:51.217520   =  872  12   UserTrace   BIP6064I: A parser of type ''BLOB'' was created on behalf of node 'readFileLineByLine.File Input' to handle the input stream, beginning at offset '0'. The
2012-03-21 21:34:51.218148   =  628  12   UserTrace   BIP3352I: ''FileInput'' node ''File Input'' in message flow ''readFileLineByLine'' is propagating record ''6'' obtained from file ''/home/xxxxx/mqsitransitin/


--
Marko
Back to top
View user's profile Send private message Visit poster's website
marko.pitkanen
PostPosted: Wed Mar 21, 2012 11:08 pm    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

From debug trace I can see that it takes some time to finalise the current flow execution after "return false" statement and perhaps a little a bit longer time to initialise file node to read next record.

Code:
2012-03-22 08:46:18.180320       12   UserTrace   BIP2539I: Node 'readFileLineByLine.Compute': Evaluating expression ''iRows = 1 OR MOD(iRows, 100000) = 0'' at ('.readFileLineByLine_Compute.Main', '6.18'). This
resolved to ''FALSE OR FALSE''. The result was ''FALSE''.
2012-03-22 08:46:18.180342       12   UserTrace   BIP2537I: Node 'readFileLineByLine.Compute': Executing statement   ''RETURN FALSE;'' at ('.readFileLineByLine_Compute.Main', '16.3').
2012-03-22 08:46:18.180744       12   UserTrace   BIP4144I: Entered function 'cniCreateElementAsLastChild'(42396e5c, acbc160, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A').
.
.
.
2012-03-22 08:46:18.181174       12   UserTrace   BIP4142I: Evaluating cniElementSet'Name'. Changing value from '''' to ''Wildcard''.
                                       Element ''Name'' has been changed to ''Wildcard''.
                                       No user action required.
2012-03-22 08:46:18.181304       12   UserTrace   BIP3352I: ''FileInput'' node ''File Input'' in message flow ''readFileLineByLine'' is propagating record ''4'' obtained from file ''/home/xxx/mqsitransitin/yyy-readFileLineByLine.in'' at offset ''131'' to terminal ''out''.
                                       The FileInput node read a record from the file, and will propagate it to the named terminal.
                                       No action is required.
2012-03-22 08:46:18.181336       12   UserTrace   BIP3907I: Message received and propagated to 'out' terminal of input node 'readFileLineByLine.File Input'.


--
Marko
Back to top
View user's profile Send private message Visit poster's website
mqsiuser
PostPosted: Thu Mar 22, 2012 12:35 am    Post subject: Reply with quote

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

Code:
The parser type was selected based on value ''NONE'' from the previous parser.


I'd try and change the "NONE".

Currenly I assume you are creating 1,4853 Mio Parsers (Parser-Objects)... probably that is where your performance goes down the drain ?!
_________________
Just use REFERENCEs
Back to top
View user's profile Send private message
Esa
PostPosted: Thu Mar 22, 2012 12:55 am    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

I guess you are using 'Parsed Record Sequence' or 'Delimited' for record detection in the FileInput node. That will guarantee huge overhead.

I tested counting records for a whole file using the techniques from Large Messaging sample. Processing a 18 MB file of 198313 records took 10,2 seconds. The actual counting took less than 1 ms. The rest went to allocating memory for the blob and read/write operations. I must admit that I used MQinput and MQOutput, because I happened to have a test setup at hand.

That makes almost 20 000 records per second.

So I would say that you get the best performance by using Record detection settings 'Whole File' or 'Fixed Length' (of 5-10 MB perhaps), a simple message set and large message processing techniques.
Back to top
View user's profile Send private message
rramasu
PostPosted: Tue May 13, 2014 10:20 am    Post subject: Reply with quote

Newbie

Joined: 23 Sep 2007
Posts: 9
Location: india

Hi esa,

Could you provide the peices of code to count the lines from BLOB char

Thanks,
_________________
Rajamani
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2, 3  Next Page 2 of 3

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Other ideas for counting lines of large file
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.