ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Creating XML message from PDF file input

Post new topic  Reply to topic Goto page 1, 2  Next
 Creating XML message from PDF file input « View previous topic :: View next topic » 
Author Message
venkat_chekka
PostPosted: Tue Dec 14, 2010 12:13 pm    Post subject: Creating XML message from PDF file input Reply with quote

Apprentice

Joined: 14 Apr 2006
Posts: 37

Has any one implemented converting a pdf message to a XML using Message Broker? Is it possible.

I am trying to do a POC. Any inputs will be greatly appreciated
Back to top
View user's profile Send private message
bsiggers
PostPosted: Tue Dec 14, 2010 12:21 pm    Post subject: What format? Reply with quote

Acolyte

Joined: 09 Dec 2010
Posts: 53
Location: Vancouver, BC

Anything is possible - but it is not clear what you are trying to accomplish.

You could just encode the PDF file using Base64 and stick it in some XML field, for example - done, your PDF file is now in an XML message.
Back to top
View user's profile Send private message
smdavies99
PostPosted: Tue Dec 14, 2010 12:35 pm    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Slightly OT but multi-part MIME Messages are ideal for this sort of thing. The properties of the elements describe the type of data in the BLOB part.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
venkat_chekka
PostPosted: Tue Dec 14, 2010 12:35 pm    Post subject: Reply with quote

Apprentice

Joined: 14 Apr 2006
Posts: 37

I want to read some particular data from pdf message and need to populate xml message using that pdf information.

Is it possible in Message broker??
Back to top
View user's profile Send private message
smdavies99
PostPosted: Tue Dec 14, 2010 12:42 pm    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Get yourself a postscript interpreter and away you go.
You need to convert the PDF to Text. Then you can manipulate the text within.

Look at the features of Ghostscript.It might do hat you want. I know it can combine PDF's.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
Vitor
PostPosted: Tue Dec 14, 2010 12:42 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

venkat_chekka wrote:
Is it possible in Message broker??


How would you achieve this using another application language (like c#)? Code the same thing in WMB.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
bsiggers
PostPosted: Tue Dec 14, 2010 12:50 pm    Post subject: Reply with quote

Acolyte

Joined: 09 Dec 2010
Posts: 53
Location: Vancouver, BC

Google is your friend, as usual. Searching for 'Java PDF' came up with this as the first hit:

http://pdfbox.apache.org/

Java, open source - sounds like it would be worth a try at least, it has the ability to extract stuff from PDF files.
Back to top
View user's profile Send private message
venkat_chekka
PostPosted: Tue Dec 14, 2010 1:14 pm    Post subject: Reply with quote

Apprentice

Joined: 14 Apr 2006
Posts: 37

Can we implemant this using Message Broker ESQL language??
Back to top
View user's profile Send private message
mqjeff
PostPosted: Tue Dec 14, 2010 1:34 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Implement WHAT?

What do you *need* to DO with the PDF document?

Do you need to parse, extract, and understand it?

Or do you just need to deal with it as a chunk?

Regardless, you should strongly consider asking for help locally, like discuss this with your team lead.
Back to top
View user's profile Send private message
Vitor
PostPosted: Tue Dec 14, 2010 1:44 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

mqjeff wrote:
What do you *need* to DO with the PDF document?


venkat_chekka wrote:
I want to read some particular data from pdf message and need to populate xml message using that pdf information

_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
kimbert
PostPosted: Tue Dec 14, 2010 2:06 pm    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

There is no built-in parser for PDF documents in WMB. However, you can parse a PDF document using a third-party Java library and build a message tree from the extracted information.
Quote:
Can we implemant this using Message Broker ESQL language?
Technically, ESQL can do anything you like, so the answer is 'yes'. But why would you bother, when there is at least one ready-made solution available in Java?
Back to top
View user's profile Send private message
venkat_chekka
PostPosted: Tue Dec 14, 2010 8:16 pm    Post subject: Reply with quote

Apprentice

Joined: 14 Apr 2006
Posts: 37

Actually I can not use any third party Java libraries and looking for only ESQL code to implemant this.

Here is the complete information of my POC.

PDF file has some information but I want to extract only one text value from the PDF file.

Example: PDF message has below information.

Policy Number:xxxxx1234

I want to take the only Policy Number information from the PDF file.

Is there any chance to extract above information from ESQL coding.
Back to top
View user's profile Send private message
smdavies99
PostPosted: Tue Dec 14, 2010 10:29 pm    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

venkat_chekka wrote:
Actually I can not use any third party Java libraries and looking for only ESQL code to implemant this.


Ah the good old 'Not invented here and who's backside are we going to kick if it goes wrong' excuse.

Then get hold of as many examples of the PDF you can and scan them looking for the particular bits of postcript that hold the data you are looking for.
If there is a clear pattern then just extract that and pull out the bit of data you want. Remember that the PDF will be in a BLOB form so your substring will need to look for the HEX representation of a series of characters.
Then convert the extracted part into to a char. Then parse it finally to get the data you need.

If there is NO CLEAR PATTERN in the postscript then you will need to use a 3rd party Java Library unless you really, really want to put yourself through pain & torture and write your own parser.

The deep joys of Systems Integratino and the real world of PHB's and their bright ideas...

_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
venkat_chekka
PostPosted: Thu Dec 16, 2010 1:29 pm    Post subject: Reply with quote

Apprentice

Joined: 14 Apr 2006
Posts: 37

My data is optical character recognition type data that means my data is part of image in the pdf message.

So can I access this type image data from PDF message in the Message Broker?
Back to top
View user's profile Send private message
mqjeff
PostPosted: Thu Dec 16, 2010 2:08 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Yes, you can access this kind of data from Message Broker.

You can read the PDF document as a string of bytes, and then write any code you want to process those bytes and extract meaning.

I expect that it would take a competent and well trained programmer at least six months, more likely a year, to complete a meaningful and robust OCR system purely in ESQL.

Your management is asking you to do something VERY VERY HARD. You need to tell them that you need to use either a third party Java library or a third party PHP library to process and parse the PDF document and then a third party image library to perform OCR on the data.

If you are a strong C programmer, you can create your own User Defined Node to do this, as well - probably again using a third-party library.

Or you need to quit your job and get a new one with better or smarter management.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Creating XML message from PDF file input
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.