Author |
Message
|
RangaKovela |
Posted: Sun Dec 13, 2015 9:39 pm Post subject: Sequential Processing using fileInput node |
|
|
Apprentice
Joined: 10 May 2011 Posts: 38
|
Hi Team,
Environment : IIB9 broker on Windows
SFTP server is on Windows.
We have requirement to process a batch of files generated by backend system in sequential order (i.e FIFO). A batch can have multiple files . All the files are placed in the IIB source directory from where FileInput Node is polling using move command.
I want to know if FileInput node is capable of picking up files in the order they were created by backend system.
Thanks, |
|
Back to top |
|
 |
zpat |
Posted: Sun Dec 13, 2015 11:04 pm Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
Use MQ, not files, if data integrity or sequence are important. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
RangaKovela |
Posted: Sun Dec 13, 2015 11:35 pm Post subject: |
|
|
Apprentice
Joined: 10 May 2011 Posts: 38
|
Thanks for your response. IIB source is FileServer in this case. Can't change this now . |
|
Back to top |
|
 |
zpat |
Posted: Mon Dec 14, 2015 3:47 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
That's just an excuse. Get the design right and these issues do not arise. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
Vitor |
Posted: Mon Dec 14, 2015 5:18 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
RangaKovela wrote: |
IIB source is FileServer in this case. Can't change this now . |
Well that's an awesome design decision by someone.
The FileInput node matches files based on what the underlying OS (in this case Windoze) tells it is available. This means:
a) the files could be picked up in any order
b) you can never scale the solution to run additional instances
c) you can't be sure (especially for larger files) IIB will process the entire file as you can't be sure that the file has been completely moved into the target directory before it showed up as available to the FileInput node _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
timber |
Posted: Mon Dec 14, 2015 5:34 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
You could write a script that copies/moves the files into the IIB source directory one at a time, in the correct order. That might just about work if this is a daily batch job and performance is not important.
But I agree with zpat and Vitor; the design is not good. |
|
Back to top |
|
 |
zpat |
Posted: Mon Dec 14, 2015 8:33 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
Because many of the new people in IT seem to be untrained and/or inexperienced, combined with a desire to get something that appears to work done quickly, rather than get it right - I spend increasing amounts of my time on this subject.
What you need to understand about using files is this.
1. On unix there is no locking done by most applications (including FTP and SFTP) so that files may appear in a directory (and start to get processed) before they have been completely written. This applies to both files coming into and going out of the broker. Result = partial data and no warning of this.
2. There is nothing inherent to prevent a file being read more than once (unlike MQ where messages are consumed as they are accessed). So inadvertant duplication can (and does) often occur when scripts are run more than once, or restarted after failure.
3. All file transfers are synchronous - so that both ends must be active for it to work, when the destination is down - the file transfer fails and there is no automatic retry or queueing, this makes management a nightmare. Files can therefore be missed entirely and no warning is given.
4. Failure during file processing causes great uncertainty because there is no transactional control, at best the entire file is re-processed leading to duplication of data, but it can easily be only partially processed, no warning of this is apparent to the target application.
5. Files can be processed out of sequence and again no warning is given to the application. Also since no locking exists, file can be corrupted by multiple applications opening the file for write at the same time.
The only way to avoid this (with files) is to use a numbering convention for the files and a header and trailer record in each file. Then the receiving application must check it has processed each file in order, not missing any or processing it twice. It must also check that the file is complete and has the trailer record matching the header record.
In other words the end application has to be extremely resistant to the unreliable delivery inherent with file usage. Since NO-ONE ever does this adequately - you are designing a system that WILL FAIL at some point. The impact of the loss or duplication of data can be immense. Companies can go out of business very quickly these days if they lose credibility.
Or you can just use something fit for purpose - MQ, which is very easy to code for (or just use JMS). This can even do XA 2-phase commit which gives you bulletproof delivery to a database. Nothing else does this.
Remember - using files is NEVER a requirement, it is only a possible (and very poor) solution, when anything transactionally important is being sent. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
smdavies99 |
Posted: Mon Dec 14, 2015 9:10 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
In adittion to what my esteemed colleague has said so eloquently...
Files are often used because they cost nothing. System have filesystems and files. Using them costs nothing.
Files are often chosen because the people doing the choosing realy know nothing else. FTP is free so that is what is used.
As has been said, once you start speeding things up (from about 1-2 files/second) then the problems start. Until you have been bitten (badly) by them you don't really understand how bad a choice files can be. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
zpat |
Posted: Mon Dec 14, 2015 10:02 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
MQ client (or JMS) is also free.
But what is the cost of losing or duplicating production data? _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Dec 14, 2015 3:43 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
If the original data is in a file, and the data needs to get into WMB, what do you approach do you guys take with the architects? Do you just dig in your heels, say unless the data arrives as MQ messages at WMB don't bother with WMB?
Whether its WMB converting the file into MQ messages, or some home grown mess, SOMETHING needs to absorb the pain and risk of getting that data out of a file and into MQ messages. As unappetizing as it is to do it in WMB, is it the least worst choice? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Simbu |
Posted: Tue Dec 15, 2015 2:51 am Post subject: Re: Sequential Processing using fileInput node |
|
|
 Master
Joined: 17 Jun 2011 Posts: 289 Location: Tamil Nadu, India
|
RangaKovela wrote: |
Hi Team,
Environment : IIB9 broker on Windows
SFTP server is on Windows.
We have requirement to process a batch of files generated by backend system in sequential order (i.e FIFO). A batch can have multiple files . All the files are placed in the IIB source directory from where FileInput Node is polling using move command.
I want to know if FileInput node is capable of picking up files in the order they were created by backend system.
Thanks, |
Hi, Worst case, sequence can be maintained by last updated timestamp.
IBM document says,
Quote: |
files to be processed by the FileInputNode are
prioritized as per the last updated timestamp.that is the oldest files are processed first |
Is any failure while processing the any file then sequence will be missed |
|
Back to top |
|
 |
zpat |
Posted: Tue Dec 15, 2015 3:17 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
To avoid partial files being processed prematurely - use the put and rename approach in SFTP (or FTP).
I.e. send the file as filename pattern 1, then after the put - change (rename/move) the file name to filename pattern 2.
In the file input node, look only for filename pattern 2. Also increase the polling interval to a sensible value.
This way, partial files will not be processed by the broker at least.
---------------
However you really need to get the data issued as messages in the first place, long experience tells me that not the slightest effort is ever made to do this in certain locations due to "cultural" issues.
---------------
In terms of timestamps - take care as some file transfer tools preserve the original file's date/time, and some set the date/time of the transfer. Usually this can be controlled with an option.
Also if using WINSCP you must disable sending files into temporary filenames (see preferences - endurance)
--------------
If you work for a financial institution and are using files to carry transactional data, please let us know the company (so we can avoid using it).  _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
RangaKovela |
Posted: Tue Dec 15, 2015 10:18 pm Post subject: |
|
|
Apprentice
Joined: 10 May 2011 Posts: 38
|
thank you all for your responses.
We did some more research on ths and found this in IBM site: http://www-01.ibm.com/support/docview.wss?uid=swg1IC91632
We are moving files from backend directory to IIB Directory (directory from which FileInput Node picks up the messages) using batch script hourly basis.
We have designed two flows here.
Flow1 - which picks Files using fileInput node and posts Files into MQ
Flow2. - Process MQ message received from flows and post them in Cloud base web application using HTTPS
We have implemented backout mechanism in case of any intermittent connectivity failures in Flow2 to ensure sequential processing.
We are planning to get disable the main queues using PCF messages from Flow2 if in case connectivity issues persists after 5 retries. An e-mail notification/text message will be sent out to Support team. Support team will have to retry Messages in backout queue followed by enabling of main queue once backend system is up and running. This will ensure sequential processing. Any suggestions |
|
Back to top |
|
 |
timber |
Posted: Wed Dec 16, 2015 1:35 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
We are planning to get disable the main queues using PCF messages from Flow2 if in case connectivity issues persists after 5 retries. An e-mail notification/text message will be sent out to Support team. Support team will have to retry Messages in backout queue followed by enabling of main queue once backend system is up and running. This will ensure sequential processing. Any suggestions |
I am suspicious of solutions that involve sending PCF messages from a message flow. I understand why people do it, but I am fairly sure that IBM never intended it to be done. I reckon there is always a better solution. So how about this...
Can you set the retry count to 5, and remove the backout queue?That way, after 5 retries the message will sit on the input queue and block the remaining messages ( it will act like a 'poison message' ). Sounds as if that is the behaviour that you need - and it would be a lot easier than using PCF messages. |
|
Back to top |
|
 |
RangaKovela |
Posted: Wed Dec 16, 2015 1:47 am Post subject: |
|
|
Apprentice
Joined: 10 May 2011 Posts: 38
|
If backout queue is not specified MQInput node would move messages DLQ assigned to the queuemanager. |
|
Back to top |
|
 |
|