Optical Character Recognition
Recognizing Text to Labels on an Android Platform
By Sourav Soni, Prolatech
OCR is not a new technology, it has been here for years.
						But now it has become trending with the use of other technologies.
						From a scanner app to the live language translation app,
						OCR has become a way for the people and especially for developers to develop a different
						use-case with the integration of it.
						
						As MNCs and top companies of the world are investing their time and effort to make OCR more
						effective, start-ups are also in the race.
						With strong use-cases they have also come into the market to showcase their OCR capabilities.
						
						My journey of OCR had started with the scanner app,
						I saw from the very first time that Optical Character Recognition is so much capable that it can
						detect multiple languages
						and I decided to take a deep dive into the OCR world.
					
									Optical Character Recognition (OCR)
									detects text in an image and extracts the recognized words into a machine-readable
									character stream.
									Analyse images to detect embedded text, generate character streams and enable
									searching.
									Take photos of text instead of copying to save time and effort
									
									I explored many OCR from different-2 MNCs and start-ups like Google, Microsoft,
									Tesseract and many more.
									I developed android app on each company’s OCR capability to check the feasibility
									and accuracy.
									I have tested these APIs in different lighting environment and I discovered that
									many OCRs were failing when there is no proper
									lighting and proper environment even sometimes they were giving different language
									altogether while scanning English language.
									But at the end I found one OCR that is capable to work on dim light and also able to
									get all content from the image.
									The name of that OCR is Microsoft vision API.
								
 
							Microsoft is providing an OCR API that is so called Microsoft computer vision API that works accurately in terms of OCR. The API is free for some extents, it is providing 5,000 transactions, 20 per minute for 30 days that I think enough us to develop and test some use-case. Further we can move to the paid version. Its paid version also so much cheap.
OCR technology detects text content in an image and extracts the
								identified text into a
								machine-readable character stream. You can use the result for search and numerous other
								purposes like medical records,
								security, and banking. It automatically detects the language. OCR saves time and
								provides convenience for users by
								allowing them to take photos of text instead of transcribing the text.
								
								OCR supports 25 languages. These languages are: Arabic, Chinese Simplified, Chinese
								Traditional, Czech, Danish,
								Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean,
								Norwegian, Polish,
								Portuguese, Romanian, Russian, Serbian (Cyrillic and Latin), Slovak, Spanish, Swedish,
								and Turkish.
								
								If needed, OCR corrects the rotation of the recognized text, in degrees,
								around the horizontal image axis. OCR provides the frame coordinates of each word as
								seen in below illustration.
							
 
							
								• The size of the input image must be between 40 x 40 and 3200 x 3200 pixels
								• The image cannot be bigger than 10 megapixels
								Input image can be rotated by any multiple of 90 degrees plus a small angle of up to ’40
								degrees.
								
								
								The accuracy of text recognition depends on the quality of the image. An inaccurate
									reading may be caused by the following situations:
								• Blurry images
								• Handwritten or cursive text
								• Artistic font styles
								• Small text size
								• Complex backgrounds, shadows, or glare over text or perspective distortion
								• Over-sized or missing capital letters at the beginnings of words
								• Subscript, superscript, or strike-through text
							
								On photos where text is dominant, false positives may come from partially recognized
									words.
									On some photos, especially photos without any text, precision can vary a lot
									depending on the type of image
							
1. Go to the following link to get a key for Microsoft computer
									vision API.
									Microsoft computer vision API
									Login with your credentials on GitHub/Microsoft/Facebook any one of them and get the
									key.
									Save both API key and the endpoint to hit.
								
2. Go to the following link to download the SDK of computer vision
									API for android.
									Microsoft computer vision API SDK
									Click on the android a GitHub page will be open.
									Download the code and open it on android studio.
								
3. Change the subscription key you get earlier in the
									res/value/string.xml
								
										string name=”subscription_key”>YOUR API KEY
									
								4. This is very important step please focus on it.
									
									End point that you get with api key you need to pass in the constructer
									VisionServiceRestClient.
									
								
                                    client = new VisionServiceRestClient(getString(R.string.subscription_key), "YOUR API END POINT";
									Example:
									https://westcentralus.api.cognitive.microsoft.com/vision/v1.0
									In your case it may be one of the following.
									https://westus.api.cognitive.microsoft.com/vision/v1.0
									https://westus2.api.cognitive.microsoft.com/vision/v1.0
									https://eastus.api.cognitive.microsoft.com/vision/v1.0
									https:// eastus2.api.cognitive.microsoft.com/vision/v1.0
									https://southcentralus.api.cognitive.microsoft.com/vision/v1.0
									https://westeurope.api.cognitive.microsoft.com/vision/v1.0
									https://northeurope.api.cognitive.microsoft.com/vision/v1.0
									https://southeastasia.api.cognitive.microsoft.com/vision/v1.0
									https://eastasia.api.cognitive.microsoft.com/vision/v1.0
									https://australiaeast.api.cognitive.microsoft.com/vision/v1.0
									https://brazilsouth.api.cognitive.microsoft.com/vision/v1.0
									
                                
								That’s it you are done you can use existing source code or can play
									with the code and modify code as per your use-case.
									Optical character recognition (OCR) software provides the ability to
										convert scanned documents and
										images into editable and searchable documents in a variety of output
										formats.
								
1. Accounts payable invoice processing
 
								
									It’s always hard to justify paying an employee for something that can be done just
									as well automatically.
									Manually entering data from electronic or paper invoices into accounts payable
									records is one of those
									repetitive tasks that’s ripe for an efficiency improvement. If you are only creating
									payable records for one or two
									invoices a day, it’s probably not taking too much time. But if you are closer to
									double digits in terms of invoice processing,
									which amount of data entry is creating a significant overhead labor expense that’s
									unnecessarily cutting into your profitability.
									
									OCR can help streamlines the processing of invoices for payables record creation.
									A successful OCR-enabled invoice processing system requires a couple key conditions.
									First, there needs to be a significant volume of invoices from repeated vendors or
									suppliers,
									as the invoice processing software needs to be configured to interpret specific
									invoice formats.
									Second, if your accounts payable module doesn’t natively support OCR document
									management, you’ll
									need to use your AP system’s API to import data from the OCR application.
									
									If these conditions apply, and OCR makes sense for your business,
									you likely have an opportunity to realize significant labor-based cost savings
									through a more automated and efficient accounts payable process.
								
2. Expense reporting and auditing
 
								
									The people who study this sort of thing (the financial advisory firm,
									Stout Risius Ross, in this case)
									say that fraudulent expense reimbursement costs business over $1B annually. Yikes.
									
									A CNN article cataloged a number of the expense reporting
									fraud schemes that employees can use:
									
									…Getting the cab drivers to give them blank receipts,
									asking for double receipts at hotels and restaurants, masking one transaction as
									another,
									using cash to buy something and getting a blank receipt and putting in for more than
									the transaction it was.
									There are an endless number of possibilities here.
									
									An OCR based expense management reporting system can help defeat this type of
									expense management
									fraud in a couple of key ways. First, mobile-device based OCR client software can
									allow for policies that
									require real-time capture of receipts and other expense documentation.
									Adding this layer of transparency can dissuade potential fraud attempts.
									Second, auditing expense reports is a critical step in detecting and preventing
									fraud.
									OCR increases the ease and depth with which audits can be conducted, allowing
									auditors to easily
									search expense documents for particular transaction details.
								
3. Business card recognition
 
								
									Anyone who has ever attended a conference or trade show knows the pain of entering
									all those new contacts into a CRM or contact management system.
									
									A capable OCR business card recognition app can eliminate this pain point, while
									helping to make you the first to follow up with all those new contacts.
									
									Optical character recognition software often struggles with assigning semantic
									meaning to data —
									especially when the data originates in previously unseen 3rd party documents.
									This challenge is much easier to overcome in the case of business cards, though.
									The reason is that the possible range of semantic meanings from business cards is so
									much more limited than other types of business documentation.
									
									Because of the easy time saving provided by OCR business card recognition,
									apps that support it are becoming increasingly common. LinkedIn CardMunch and
									Evernote Hello are two of the more popular choices.
								
4. Preserving meeting notes
 
								
									You know you’re studiously taking notes on their unique requirements,
									but for all your client knows, you’re text messaging co-workers an invitation to
									meet later for happy hour.
									Using a stylus to hand write notes or simply relying on good old pen and paper
									solves that problem —
									but you lose the search-ability of your notes. OCR apps for handwriting provide a
									solution.
									
									Recognizing characters in handwritten notes is considerably more complex than
									standard printed text.
									In fact, it’s even garnered its own term: intelligent character recognition (ICR).
									ICR software providers still tend to talk about the accuracy of their technology in
									more glowing terms than customers do,
									but the end users are starting to come around.
									A recent reddit thread discussing handwriting capable OCR tools included detailed
									descriptions of a variety of technical
									solutions with reports of varying levels of accuracy. One popular app was described
									as handling “handwritten text
									very nicely for me, even when the handwriting is pretty bad.”
									
									The reality is that working with ICR solutions will require some manual review of
									notes if 100%
									accuracy is your goal. But for many note-taking tasks 100% accuracy isn’t a
									necessity.
									At minimum a handwriting ICR can provide the benefit of allowing search functions to
									find the note your are looking for,
									while allowing you to access the original document to get the full details you need.
								
5. Importing application and form submissions
 
								
									These days a lot of customer generated data is collected online and in the form of
									the other
									natively searchable documents. But there’s still quite a bit of data that’s
									collected that isn’t immediately searchable.
									
									If you are collecting information from customers in a highly structured format and
									have a large number
									of transactions associated with the data collection document, you likely can benefit
									from OCR. Some common examples include:
									
									• Service sign-up forms,
									• Loyalty program enrollments,
									• Waivers,
									• Rental agreements, and
									• Applications.
									Manually re-entering data is time-consuming administrative work that isn’t really
									generating any direct
									value to your business. Data re-entry also provides an opportunity to introduce data
									errors and slow business process cycle times.
									
									Replacing your manual processes can create an easy source of savings and free up
									employees for more productive work. It’s not necessarily easy though.
									Generally, you’ll need to integrate the OCR program with the relevant business
									software system that supports the application.
									In many cases that will require working with your business software support provider
									to create the custom integration.
									You’ll need to weigh the investment versus the likely returns, but if the activity
									is repetitive enough you may find yourself able to generate significant savings.
									
									I have created one app with Microsoft Computer vision API below are the UI
											screens.
								
 
								 
								
									This app we can used as a WiFi-Password Scanner because we can’t remember
											password or it will be irritating sometimes to put long length password.
											You can also used it as business card reader or any other purpose that I
											already mention earlier.
											Source code of this app is available on my GitHub page,below is the link,go
											and check and enjoy the OCR
											if key is still alive if not change it accordingly with the earlier
											instructions.
									
									BestOCR Code on
										GitHub
									
									OCR is here to stay. I hope this was helpful in getting one started with the concept
									of OCR.
									Let me know if there is any query or suggestion. I am a comment away.
									
									Happy Coding.