Check out the complete Jupyter Notebook from my GitHub repo.We have seen two basic applications of OCR – Building word clouds, creating audible files by converting text to speech using gTTS. Save the audio file as “rev.mp3” outObj.save("rev.mp3")īy the end of this article, we have understood the concept of Optical Character Recognition (OCR) and are familiar with reading images using OpenCV and grabbing the text from images using pytesseract. OutObj = gTTS(text=txt, lang=language, slow=False) Set language and create a convert the text to audio using gTTS bypassing the text, language language = 'en' # grab the text from image using pytesseract # display the image using cv2.imshow() method Read the image using cv2.imread() and grab the text from the image using pytesseract and store it in a variable. Set the tesseract path _cmd=r'C:Program FilesTesseract-OCRtesseract.exe' To install, execute the command “ pip install gtts” in the command prompt. GTTS is a Python Library with Google Translate’s text-to-speech API. So, it’s good to build bigram/trigram word clouds to not miss out on the context. If we look at the context of the word stuck, it says “Though it has just 3 GB RAM, it never gets stuck” which is a positive thing about the device. The words expensive, stuck, struck, disappoint stood out in the negative word cloud. import matplotlib.pyplot as pltįrom wordcloud import WordCloud Positive Word Cloud # Choosing the only words which are present in poswords Importing libraries to generate and show word clouds. With open(r"opinion-lexicon-Englishnegative-words.txt","r") as neg: with open(r"opinion-lexicon-Englishpositive-words.txt","r") as pos: Once the files are downloaded, read those files in the code and create a list of positive, negative words. These files can be downloaded from the link or directly from my GitHub repo. In the English language, we have a predefined set of positive, negative words called Opinion Lexicons. Install word cloud library using the command “ pip install wordcloud“. Step 6: Build positive, negative word clouds Removing stopwords from the ‘Cleaned Review’ and appending all the remaining words to a list variable “final_list”. The below error occurs if we do not set the path. Set the tesseract path in the code _cmd=r'C:Program FilesTesseract-OCRtesseract.exe' Converting Image to String import pytesseract If needed, resize the image using cv2.resize() method img = cv2.resize(img, (400, 400))ĭisplay the image using cv2.imshow() method cv2.imshow("Image", img)ĭisplay the window infinitely (to prevent the kernel from crashing) cv2.waitKey(0)Ĭlose all open windows cv2.destroyAllWindows() 2. Read the image using cv2.imread() method and store it in a variable “img”. To install it, open the command prompt and execute the command “ pip install opencv-python“.īuild sample OCR Script 1. OpenCV-Python is the Python API for OpenCV. OpenCV(Open Source Computer Vision) is an open-source library for computer vision, machine learning, and image processing applications. Next, to install the Python wrapper for Tesseract, open the command prompt and execute the command “ pip install pytesseract“. So, in my case, it is “ C: Program FilesTesseract-OCRtesseract.exe“. The typical installation path in Windows systems is C:Program Files. We will require it later as we need to add the path of the tesseract executable in the code if the directory of installation is different from the default. Note: Don’t forget to copy the file software installation path. Installation of Tesseract OCR:ĭownload the latest installer for windows 10 from “ “. Execute the. In this article, we will focus on Tesseract OCR. Some of the Open Source OCR tools are Tesseract, OCRopus. Converting handwritten documents into electronic images.Extracting business card information into a contact list.OCR has plenty of applications in today’s business. Optical Character Recognition (OCR) is a technique of reading or grabbing text from printed or scanned photos, handwritten images and convert them into a digital format that can be editable and searchable. Wouldn’t it be great if our machines or systems could also read the text just like the way we do? But the bigger question is “How do we make our machines read”? This is where Optical Character Recognition (OCR) comes into the picture. We, humans, read text almost every minute of our life.
0 Comments
Leave a Reply. |