Optical Character Recognition
Recognizing Text to Labels on an Android Platform
By Sourav Soni, Prolatech
OCR is not a new technology, it has been here for years.
But now it has become trending with the use of other technologies.
From a scanner app to the live language translation app,
OCR has become a way for the people and especially for developers to develop a different
use-case with the integration of it.
As MNCs and top companies of the world are investing their time and effort to make OCR more effective, start-ups are also in the race. With strong use-cases they have also come into the market to showcase their OCR capabilities.
My journey of OCR had started with the scanner app, I saw from the very first time that Optical Character Recognition is so much capable that it can detect multiple languages and I decided to take a deep dive into the OCR world.
Optical Character Recognition (OCR)
detects text in an image and extracts the recognized words into a machine-readable
Analyse images to detect embedded text, generate character streams and enable
Take photos of text instead of copying to save time and effort
I explored many OCR from different-2 MNCs and start-ups like Google, Microsoft, Tesseract and many more. I developed android app on each company’s OCR capability to check the feasibility and accuracy. I have tested these APIs in different lighting environment and I discovered that many OCRs were failing when there is no proper lighting and proper environment even sometimes they were giving different language altogether while scanning English language. But at the end I found one OCR that is capable to work on dim light and also able to get all content from the image. The name of that OCR is Microsoft vision API.
Microsoft is providing an OCR API that is so called Microsoft computer vision API that works accurately in terms of OCR. The API is free for some extents, it is providing 5,000 transactions, 20 per minute for 30 days that I think enough us to develop and test some use-case. Further we can move to the paid version. Its paid version also so much cheap.
OCR technology detects text content in an image and extracts the
identified text into a
machine-readable character stream. You can use the result for search and numerous other
purposes like medical records,
security, and banking. It automatically detects the language. OCR saves time and
provides convenience for users by
allowing them to take photos of text instead of transcribing the text.
OCR supports 25 languages. These languages are: Arabic, Chinese Simplified, Chinese Traditional, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian (Cyrillic and Latin), Slovak, Spanish, Swedish, and Turkish.
If needed, OCR corrects the rotation of the recognized text, in degrees, around the horizontal image axis. OCR provides the frame coordinates of each word as seen in below illustration.
• The size of the input image must be between 40 x 40 and 3200 x 3200 pixels
• The image cannot be bigger than 10 megapixels
Input image can be rotated by any multiple of 90 degrees plus a small angle of up to ’40 degrees.
The accuracy of text recognition depends on the quality of the image. An inaccurate reading may be caused by the following situations:
• Blurry images
• Handwritten or cursive text
• Artistic font styles
• Small text size
• Complex backgrounds, shadows, or glare over text or perspective distortion
• Over-sized or missing capital letters at the beginnings of words
• Subscript, superscript, or strike-through text
On photos where text is dominant, false positives may come from partially recognized
On some photos, especially photos without any text, precision can vary a lot
depending on the type of image
1. Go to the following link to get a key for Microsoft computer
Microsoft computer vision API
Login with your credentials on GitHub/Microsoft/Facebook any one of them and get the key.
Save both API key and the endpoint to hit.
2. Go to the following link to download the SDK of computer vision
API for android.
Microsoft computer vision API SDK
Click on the android a GitHub page will be open.
Download the code and open it on android studio.
3. Change the subscription key you get earlier in the
string name=”subscription_key”>YOUR API KEY
4. This is very important step please focus on it.
End point that you get with api key you need to pass in the constructer VisionServiceRestClient.
client = new VisionServiceRestClient(getString(R.string.subscription_key), "YOUR API END POINT";
In your case it may be one of the following.
That’s it you are done you can use existing source code or can play
with the code and modify code as per your use-case.
Optical character recognition (OCR) software provides the ability to convert scanned documents and images into editable and searchable documents in a variety of output formats.
1. Accounts payable invoice processing
It’s always hard to justify paying an employee for something that can be done just
as well automatically.
Manually entering data from electronic or paper invoices into accounts payable
records is one of those
repetitive tasks that’s ripe for an efficiency improvement. If you are only creating
payable records for one or two
invoices a day, it’s probably not taking too much time. But if you are closer to
double digits in terms of invoice processing,
which amount of data entry is creating a significant overhead labor expense that’s
unnecessarily cutting into your profitability.
OCR can help streamlines the processing of invoices for payables record creation. A successful OCR-enabled invoice processing system requires a couple key conditions. First, there needs to be a significant volume of invoices from repeated vendors or suppliers, as the invoice processing software needs to be configured to interpret specific invoice formats. Second, if your accounts payable module doesn’t natively support OCR document management, you’ll need to use your AP system’s API to import data from the OCR application.
If these conditions apply, and OCR makes sense for your business, you likely have an opportunity to realize significant labor-based cost savings through a more automated and efficient accounts payable process.
2. Expense reporting and auditing
The people who study this sort of thing (the financial advisory firm,
Stout Risius Ross, in this case)
say that fraudulent expense reimbursement costs business over $1B annually. Yikes.
A CNN article cataloged a number of the expense reporting fraud schemes that employees can use:
…Getting the cab drivers to give them blank receipts, asking for double receipts at hotels and restaurants, masking one transaction as another, using cash to buy something and getting a blank receipt and putting in for more than the transaction it was. There are an endless number of possibilities here.
An OCR based expense management reporting system can help defeat this type of expense management fraud in a couple of key ways. First, mobile-device based OCR client software can allow for policies that require real-time capture of receipts and other expense documentation. Adding this layer of transparency can dissuade potential fraud attempts. Second, auditing expense reports is a critical step in detecting and preventing fraud. OCR increases the ease and depth with which audits can be conducted, allowing auditors to easily search expense documents for particular transaction details.
3. Business card recognition
Anyone who has ever attended a conference or trade show knows the pain of entering
all those new contacts into a CRM or contact management system.
A capable OCR business card recognition app can eliminate this pain point, while helping to make you the first to follow up with all those new contacts.
Optical character recognition software often struggles with assigning semantic meaning to data — especially when the data originates in previously unseen 3rd party documents. This challenge is much easier to overcome in the case of business cards, though. The reason is that the possible range of semantic meanings from business cards is so much more limited than other types of business documentation.
Because of the easy time saving provided by OCR business card recognition, apps that support it are becoming increasingly common. LinkedIn CardMunch and Evernote Hello are two of the more popular choices.
4. Preserving meeting notes
You know you’re studiously taking notes on their unique requirements,
but for all your client knows, you’re text messaging co-workers an invitation to
meet later for happy hour.
Using a stylus to hand write notes or simply relying on good old pen and paper
solves that problem —
but you lose the search-ability of your notes. OCR apps for handwriting provide a
Recognizing characters in handwritten notes is considerably more complex than standard printed text. In fact, it’s even garnered its own term: intelligent character recognition (ICR). ICR software providers still tend to talk about the accuracy of their technology in more glowing terms than customers do, but the end users are starting to come around. A recent reddit thread discussing handwriting capable OCR tools included detailed descriptions of a variety of technical solutions with reports of varying levels of accuracy. One popular app was described as handling “handwritten text very nicely for me, even when the handwriting is pretty bad.”
The reality is that working with ICR solutions will require some manual review of notes if 100% accuracy is your goal. But for many note-taking tasks 100% accuracy isn’t a necessity. At minimum a handwriting ICR can provide the benefit of allowing search functions to find the note your are looking for, while allowing you to access the original document to get the full details you need.
5. Importing application and form submissions
These days a lot of customer generated data is collected online and in the form of
natively searchable documents. But there’s still quite a bit of data that’s
collected that isn’t immediately searchable.
If you are collecting information from customers in a highly structured format and have a large number of transactions associated with the data collection document, you likely can benefit from OCR. Some common examples include:
• Service sign-up forms,
• Loyalty program enrollments,
• Rental agreements, and
Manually re-entering data is time-consuming administrative work that isn’t really generating any direct value to your business. Data re-entry also provides an opportunity to introduce data errors and slow business process cycle times.
Replacing your manual processes can create an easy source of savings and free up employees for more productive work. It’s not necessarily easy though. Generally, you’ll need to integrate the OCR program with the relevant business software system that supports the application. In many cases that will require working with your business software support provider to create the custom integration. You’ll need to weigh the investment versus the likely returns, but if the activity is repetitive enough you may find yourself able to generate significant savings.
I have created one app with Microsoft Computer vision API below are the UI screens.
This app we can used as a WiFi-Password Scanner because we can’t remember
password or it will be irritating sometimes to put long length password.
You can also used it as business card reader or any other purpose that I
already mention earlier.
Source code of this app is available on my GitHub page,below is the link,go and check and enjoy the OCR if key is still alive if not change it accordingly with the earlier instructions.
BestOCR Code on GitHub
OCR is here to stay. I hope this was helpful in getting one started with the concept of OCR. Let me know if there is any query or suggestion. I am a comment away.