Author |
Message
|
prabhuoist |
Posted: Tue Oct 10, 2017 3:10 am Post subject: File output node slow performance while writing files |
|
|
Apprentice
Joined: 10 Oct 2017 Posts: 39
|
Dear All,
I have a code where messages comes to input MQ and then using logic I write messages to file on file systems(GPFS). Logic behind writing files is that fileout node has to cut and close the file within 15 seconds(12 sec to write records in file and 3 seconds sleep time so that file closes properly).
Files first created at mqsitransit folder then it moved out. I have used timeout control and timeout notification nodes to achieve this.
Issue here is application works properly for some days but suddenly file writing becomes slow and message piles up in input MQ. Then it resolves automatically once the load comes down.
We have checked almost everything at IIB i.e. memory usage, cpu usage. What else we could check at IIB level.
Note : This application handles load around 400 tps. |
|
Back to top |
|
 |
zpat |
Posted: Tue Oct 10, 2017 4:47 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
Sleeping in the flow should not be necessary. It will cause the flow to block the thread.
How many flow instances do you have? With higher volumes, most of your flows are going to be blocked.
12 seconds is a long time, maybe your disk i/o rates are very high or the disk subsystem is very slow.
Using files for integration always causes compromises and/or risks - use messages end to end if you want things to work properly and quickly. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Oct 10, 2017 5:03 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
What happens if the data that needs to go to the file is too big to write in 12 seconds?
Why do you have to wait the full 15 seconds to close the file? _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
prabhuoist |
Posted: Tue Oct 10, 2017 10:23 pm Post subject: |
|
|
Apprentice
Joined: 10 Oct 2017 Posts: 39
|
What happens if the data that needs to go to the file is too big to write in 12 seconds?
Answer : Data is only '|' seperated records and and each record is of 1 KB. So in 12 seconds the file size would be around 400-500 KB
Why do you have to wait the full 15 seconds to close the file?
Answer : If we will reduce the time then more number of files would be created which underlying system(which picks up files) are not able to process.
Sleeping in the flow should not be necessary. It will cause the flow to block the thread.
Answer : We have used sleep because When file getting close event and thread write into that file ,at that time old file is close but that new data is record into mqtransit floder and never close.so, we need to prevent that thread to write into file.so,we put timer to sleep that thread for some time.
How many flow instances do you have? With higher volumes, most of your flows are going to be blocked.
Answer : 1 instance per EG. Actually we have 10 EG's in which 22 circles has been deployed. Also we creates files for circle wise with below naming convention to avoid duplication :
circlecode_ApplicationName_brokerthreadId_brokerName_timestamp_fileExtension |
|
Back to top |
|
 |
zpat |
Posted: Tue Oct 10, 2017 11:01 pm Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
You are attempting to overcome the inherent unsuitability of using files for transaction processing.
This is simply wrong and will lead to complications, limitations, failures and all the things you really don't want.
Use a proper transactional, atomic protocol like MQ. Your business is worth it. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
prabhuoist |
Posted: Tue Oct 10, 2017 11:07 pm Post subject: |
|
|
Apprentice
Joined: 10 Oct 2017 Posts: 39
|
I absolutely agree with you but this was the business requirement as underlying system was not able to process message from MQ.
When this application went live in production the file cut time was 25 sec and sleep time was 5 sec(even EG's were only 3) and it worked fine for around 3-4 months but then business gradually decease time to 12 and 3 sec respectively.
Then since last few months we are facing this issue. Hence we had to create multiple EG's and divide circles. |
|
Back to top |
|
 |
zpat |
Posted: Tue Oct 10, 2017 11:11 pm Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
No, the business requirement was a solution fit for purpose.
Failure to apply this principle is a mistake that I see time and time again. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
prabhuoist |
Posted: Tue Oct 10, 2017 11:25 pm Post subject: |
|
|
Apprentice
Joined: 10 Oct 2017 Posts: 39
|
But as this is in production, is there any suggestion we can implement here to resolve this issue? |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 11, 2017 4:37 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
prabhuoist wrote: |
But as this is in production, is there any suggestion we can implement here to resolve this issue? |
No.
You (as many of my associates have pointed out)have a bad design which seems to have been driven by a limitation of (or pandering to) the underlying system. I mean:
prabhuoist wrote: |
more number of files would be created which underlying system(which picks up files) are not able to process. |
A file based system that can't handle a lot of files is not a very good system. A very poor method has been used to deal with this and one of the problems (as you've discovered) is that it doesn't scale.
You're in a trap of your own making. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
prabhuoist |
Posted: Thu Oct 12, 2017 1:07 am Post subject: |
|
|
Apprentice
Joined: 10 Oct 2017 Posts: 39
|
Thank you for your replies so far.
Please can you advice if decreasing file cut time from 12 would improve the performance ? |
|
Back to top |
|
 |
zpat |
Posted: Thu Oct 12, 2017 3:39 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
Because file handling is not suited to processing atomic (single) transactions, you have decided to batch them up every 12 seconds.
If you decrease the time interval, you will increase the number of files but may decrease the transaction latency assuming they are processed promptly.
But if the file system can't handle more files then it will just bottleneck. As we have said, there is no good way to solve a bad design.
To determine what time interval give the best transaction throughput you would have to experiment with different values. If the batch delay is not critical then increasing the delay value would reduce the load on the system. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
prabhuoist |
Posted: Thu Oct 12, 2017 4:21 am Post subject: |
|
|
Apprentice
Joined: 10 Oct 2017 Posts: 39
|
OK.
Here I am using single thread for each circle so just suspecting that thread is being blocked and thread which is supposed to close (to pick queue messages) in total 15 sec is not getting close.
Here I have also used timeout control and timeout notification node to close the files in time.
Can timer nodes cause any issue here? |
|
Back to top |
|
 |
zpat |
Posted: Thu Oct 12, 2017 4:37 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
I coded a flow to do something similar in that it read a queue of SWIFT messages and put them to files.
The way I batched it was to use the MQGET node to get subsequent messages with a get wait limit of "n" seconds set in the node.
When no messages had arrived for "n" seconds the timeout terminal was driven from the MQGET node, which finalised that batch of records to the file.
I also invoked the finalise terminal when the message count exceeded "x" messages to avoid over-large files if the queue was busy.
That was simple and worked fine. No need to sleep or use timer nodes.
It just read the queue and wrote each file after "x" messages had been added, or if none had arrived for "n" seconds, whichever happened first. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Oct 12, 2017 5:01 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
prabhuoist wrote: |
Can timer nodes cause any issue here? |
The timer nodes are not your problem.
The design is your problem. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
prabhuoist |
Posted: Thu Oct 12, 2017 5:05 am Post subject: |
|
|
Apprentice
Joined: 10 Oct 2017 Posts: 39
|
OK I may try this but as this application already running in production, business may be rigid to change this. Also it would require good effort of testing which may take sometime.
Another query, can I see how many transactions(connections) are opened from broker to file system. May it possible that connection are kept open for long time and cause an issue as issue comes in every 2-3(or sometime longer) days. |
|
Back to top |
|
 |
|