I posted this question on Stackoverflow but did not receive any usable answers.
I have just joined the Adobe forums in the hope someone knowledgeable here will be able to answer this specific Adobe Acrobat SDK automation question.
*******************************************************************
I am parsing a .pdf using the acrobat.tlb library
Hyphenated words are being split across new lines with the hyphens removed.
e.g. ABC-123-XXX-987
Parses as:
ABC
123
XXX
987
If I parse the text using iTextSharp it parses the whole string as displayed in the file which is the behaviour I want.
However, I need to highlight these strings (serial numbers) in the .pdf and iTextSharp is not placing the highlight in the correct location... hence acrobat.tlb
I am using this code, from here: http://www.vbforums.com/showthread.php?561501-RESOLVED-2003-How-to-highlight-text-in-pdf
' filey = "*your full file name including directory here*"
AcroExchApp = CreateObject("AcroExch.App")
AcroExchAVDoc = CreateObject("AcroExch.AVDoc")
' Open the [strfiley] pdf file
AcroExchAVDoc.Open(filey, "")
' Get the PDDoc associated with the open AVDoc
AcroExchPDDoc = AcroExchAVDoc.GetPDDoc
sustext = "accessorizes"
suktext = "accessorises"
' get JavaScript Object
' note jso is related to PDDoc of a PDF,
jso = AcroExchPDDoc.GetJSObject
' count
nCount = 0
nCount1 = 0
gbStop = False
bUSCnt = False
bUKCnt = False
' search for the text
If Not jso Is Nothing Then
' total number of pages
nPages = jso.numpages
' Go through pages
For i = 0 To nPages - 1
' check each word in a page
nWords = jso.getPageNumWords(i)
For j = 0 To nWords - 1
' get a word
word = Trim(CStr(jso.getPageNthWord(i, j)))
'If VarType(word) = VariantType.String Then
If word <> "" Then
' compare the word with what the user wants
If Trim(sustext) <> "" Then
result = StrComp(word, sustext, vbTextCompare)
' if same
If result = 0 Then
nCount = nCount + 1
If bUSCnt = False Then
iUSCnt = iUSCnt + 1
bUSCnt = True
End If
End If
End If
If suktext<> "" Then
result1 = StrComp(word, suktext, vbTextCompare)
' if same
If result1 = 0 Then
nCount1 = nCount1 + 1
If bUKCnt = False Then
iUKCnt = iUKCnt + 1
bUKCnt = True
End If
End If
End If
End If
Next j
Next i
jso = Nothing
End If
The code does the job of highlighting the text, but the FOR loop with the 'word' variable is splitting the hyphenated string into component parts prohibiting me from highlighting the complete string.
For i =0To nPages -1
' check each word in a page
nWords = jso.getPageNumWords(i)
For j =0To nWords -1
' get a word
word = Trim(CStr(jso.getPageNthWord(i, j)))
Does anyone know how to maintain the whole string using acrobat.tlb? My quite extensive searches have drawn a blank.
Many thanks...