OCR service on AWS | Textract
Recently I was involved in one of the projects where OCR was used. And discussion was around making OCR as-a-service. During that time I thought of giving Amazon Textract. Amazon Textract is a service that automatically extracts text and data from scanned documents. So I picked up one of the hand written slides on Aurora-Serverless and gave it a try. To be honest results were impressive.
Image I used to test the service:
Output from Textract:
Its not perfect, but who is!! This was pretty easy using console. Simply upload document and run the analysis. But I was more interested in doing it in a server less way. So I used lambda to call the Textract api and S3 to store the document. I wrote below python code for lambda.
But I was not able to run the same in default boto3 which comes with Lambda. Since it does not have the support to Textract. To overcome this I created a lambda layer with latest boto3 and then applied it to the lambda. Here is the output of lambda.
Hope it was helpful.