OCR is a mechanism for reading data from identity documents.
ID cards are essential to validate our data and our own identity. If they did not exist, we would not have the right to identity, nor would we be able to prove who we are or identify ourselves when carrying out some procedures.
Their importance is considerable in such procedures as they are public, official, personal, and non-transferable papers. Despite their relevance, these documents have not always been the same, and like many others, Spanish ID has undergone different changes to adapt to the new era. Among these changes, we find the introduction of the OCR system.
You may not know almost anything about OCR technology. That is why we have gathered everything you need to know about this technology in this post: what is OCR, how it works, its applications, and how Mobbeel makes use of it through MobbScan.
What is OCR technology and what does its acronym mean?
The acronym OCR stands for Optical Character Recognition. Before knowing what optical character recognition is, it is necessary to break down its acronyms so that you better understand what it refers to, like this:
People use their eyes and brain connections to recognise images and read documents. Nevertheless, computers use a scanner camera to recognise documents and images, considering them a simple set of pixels.
Characters are units of information that correspond to symbols or graphemes. In other words, they are the compositions of pixels, curves, and lines that form the written digits and the letters that we use in the alphabet.
Character recognition takes place after the optical scanner digitises the image.
When the characters have been scanned, the OCR software proceeds to identify the letters and digits in the image and converts them into words.
You may have already got an idea of what is OCR, but you must know a little more about its meaning. OCR technology is a technique that allows recognising characters in written texts and images and transcribing them in digital format.
However, the system must have learned and internalised the characters that it has to recognise in advance for this recognition to occur. In other words, this system analyses documents and images in different media and formats and recognises characters in them that match the information it has stored.
How does the OCR system work?
Once you know what is OCR software, is time to learn how it works. The character recognition system has two stages:
- Image digital processing that modifies the input image to remove all elements that can affect character recognition. It involves a thresholding process (to convert the image to binary), cleaning and noise removal, and morphological transformations to improve the layout, especially in the case of handwriting recognition.
- Classification consists of applying recognition techniques. There are different approaches to character classification, some are very simple based on comparison using geometric or statistical methods and others are more complex using the latest techniques in machine learning.
Advantages of using OCR technology for identity verification
OCR software provides several benefits to identity verification. The most important are the following:
- Error reducing: when automating the data entry, the OCR reduces the errors of manual entry, ensuring a high level of reliability in identity verification.
- Efficiency: the technology speeds the verification process allowing IDs to be processed quickly and efficiently. This fact makes it possible that companies being able to verify more identities in less time.
- Better user experience: the system only requires users to take a photo of their ID with their mobile device and send it for verification. Therefore, there is no need to complete never-ending forms, improving the UX significantly.
- Highest level of security: it provides higher levels of security by verifying the authenticity of the identity document, reducing the risk of fraud, and protecting organisations and users from security threats.
OCR applications to digitalise identity documents
Your company can use OCR technology for different purposes, emphasising data extraction and verification activities. Here are the most representative use cases when it comes to identity documents:
Digitisation of identity documents
Many companies carry out campaigns to update the ID cards of their clients. The OCR system facilitates the digitisation process since documents are scanned through the web, and validated and information is extracted by OCR quickly and efficiently, saving time and effort.
Underage people cannot access online betting sites. Online gaming vendors have to control that the participants are over eighteen years old, verifying and validating the identity of the users in the registration processes. The user’s ID card is scanned and data is extracted using the OCR system to carry out this process.
Extraction of meta-information from an ID card automatically
Given a scanned document or an image of a valid identity document, OCR would be used to perform the extraction of all information fields along with the photo available in said identity document.
We have clients that send scanned identity documents to the MobbScan API cutting out the image of the ID card and extracting the information from the document by OCR.
What kind of documents is verifiable through OCR automatically?
According to the ICAO document 9303, there are three types of standard documents in which data is encoded to be read by an OCR system.
Size 1 Travel Document (TD1)
TDI is especially used in identity cards. The space in this document is limited so the MRZ is moved to its back. For this reason, it is necessary to capture the front and the back of the document in order to extract the important information and validate this kind of document. The MRZ area in the TD1 contains three lines and each line contains thirty character rows. In addition, it is possible for each country to add custom content to this area.
Size 2 Travel Document (TD2)
The size of the TD2 document is smaller than TD1, making it easier to carry. Another benefit is that could find MRZ on the front so it is just necessary to scan that part of the document to verify it. The MRZ in the TD2 spans two lines with thirty-five character rows.
Size 3 Travel Document (TD3)
The TD3 document is the one used in most passports. This document includes the MRZ key information on the backside. This enables solutions such as Mobbscan to speed up the passport control process and the extraction of data as only one side of the document should be processed. The MRZ has two lines with forty character rows.
How does MobbScan extract personal information through OCR?
MobbScan extracts all the data that an identification document collects through optical character scanning to optimise identity validation.
Mobbeel’s advanced technology scans the document, detecting and reading the information in the machine-readable zone or MRZ. After this, it is decoded and converted into user-readable information.
The digital scanning can be carried out in two ways depending on the needs and demands of the client:
- Exclusive MRZ scanning only extracts the information included in the mechanical reading zone of the ID card or passport with which we are working. The MRZ contains all the basic data of a person (name, date of birth, expiration date, issuing country, document number, etc.) and it also includes several control digits to ensure/validate that the data extracted is correct and non-manipulated. For this to happen, the document must comply with the international standard 9303 ICAO. international standard 9303 ICAO.
- The full scan of the official identification document allows other types of additional information to be extracted, such as the address and the issuing equipment. This type of scanning allows validation to verify that the data on both sides of the document match. This type of scanning allows us to verify that the data on both sides of the document match.
Mobbeel’s OCR technology works reliably and accurately to comply with the Know Your Customer regulations both with documents that comply with ICAO 9303 (travel documents) and with other documents that do not comply with this standard, such as the EU driving license.
If you want to know more about our OCR technology (MobbScan), feel free to contact us through our contact form.
I am a Computer Engineer who loves Marketing, Communication and companies’ internationalization, tasks I’m developing as CMO at Mobbeel. I am loads of things, some good, many bad… I’m perfectly imperfect.
Capture the ID information and verify your customers with OCR/NFC technology
- Meet AML / KYC requirements and regulation.
- Enhance user experience.
- Reduce the dropout rate during onboarding.
- Automate user verification.
- Avoid documentary and identity fraud.