Working with Azure Computer Vision APIs to read from Image Invoice/bills

Subhad365

5 User Group Leader

Like ( 0 )

Report

Azure Computer Vision Service — An AI Cognitive Service | by Amir Mustafa | Medium

Hey guys, wassup?!!

As more we are delving deep into Azure AI services, more avenues of integration and automation are coming into picture. Computer Vision APIs are no wonder one of the mostly talked and celebrated Azure AI offerings that can read from any embeded text from any image, and are certainly a very efficient way of extracting texts from images. You can call the API to read from a text, and it will read and return all the details from the image very easily, as a JSON.

Here comes the problem statement: taking out a particular text from image (ex: information like total amount of an Invoice or sales person of the transaction) could be tricky. You can write an Azure function or a Python App, to parse the JSON. But it's quite difficult to extract text values, without writing any single line of code. This aritcle can help you in parsing the same, using Azure Logic Apps.

Step 1: Create a Computer vision API by going to Azure Portal, creating a new resource >> Computer Vision >> fill out all the necessary details: I am keeping the name of the rersource as 'r-sales-invoice-rd':

I am also keeping Pricing tier as Free, as it is just for a demo purpose. Click on Review + Create >> Create to complete the resource creation.

Step 2: Once the resource is created, jump into Vision studio by clicking on the Vision Studio:

This will open your Vision Studio page. Come over to 'Optical Character Recognition' tab page:

Click on Try it out:

This will let you choose the file, which you want to read from. Do remember to check on the highlighted checkbox, by selecting the newly created resource:

This gives me the reading from a very simple page, containing the quote of Youth Hero Swami Viveknanda:

Well, this was a very simple example. I will try to upload a proper invoice:

and let us see the response:

Whew, it resulted in a proper detection of the all the texts. Click on the JSON tab, you will see all the coordinates of invidiual texts:

and a Confidence score:

indicating how close does Image detection algorithm thinks it is to actuality.

Alright, all set then. Now imagine this invoice image file comes to me from my supplier and I have to get the contents-- I need to parse this JSON (which is a really a very complex structure of nested objects within objects) and get the details like Total invoice amount, currency, item details, etc. The file comes at certain interval to a storage container folder and my logic app is supposed to read from the same.

Step 3: For this, I am designing a Logic App that allows it to pick up from the container like this:

Let us call our OCR API which we created above:

Fill out the small window that opens with details like:

a. Give a proper name: SalesInvoiceReader

b. Authentication Type: Leave it as it is.

c. Account Key: you can get the API key, if you come back to your Computer Vision resource and click on 'Manage Keys':

Copy from the 'Key-1' value and paste here.

d. Site URL: you can get the value from Endpoint value of the highlighted text:

CLick on 'Create now' to finish.

Step-4: Coming back to your Logic app, select 'Image content' as file content from the above step:

Step-5: The above step gives us a JSON response. Look at it very carefully. I use: 'https://jsoneditoronline.org/#left=local.gofove&right=local.joqebe' to visualize my JSON responses:

Essentially, there is a 'Region' node, under which so many lines are there, and then at the end of the last line we have our total value residing:

We need to get this total value from here.

Step 6: go back to your logic apps and Use a parse JSON action step:

Choose from the above step as selected, and as input schema, you can copy from the JSON output of the any of the region nodes and paste:

Step 7: Next we need to get the number of nodes from here. We need this because, we need to go to the last node from JSON output:

We are using a variable called 'TotalNodes' and assigning it in the above step. Note that: we are reducing the value of the node by 1, because JSON nodes start from 0.

Step 8: And you guessed it right, we need to get to the last node by using this variable:

This results in the last node to be getting into sub-JSON, which you can catch by creating another parse JSON action or just keep it into a variable.

Step 9: The rest is very simple. You can add a 'Select' action and select the 'text' JSON node:

That's it. Run it now. It will give me the desired result, as expected:

We can try with another image file, like this (with different items in it, different total amount):

And the result matches, as expected:

So the idea is: we can get various details of an image file, using expression syntanxes of Logic apps too, just in case if we want to avoid coding. You can also extract the Item details, individual prices, etc. of each line, using a for-each loop.

Alright -- let me call it a day guys. Will come back soon with more such cool features of Azure AI services.

Much love and namaste.....

Comments

*This post is locked for comments

Working with Azure Computer Vision APIs to read from Image Invoice/bills

Comments

Community Spotlight of the Month

Blog subscriptions now enabled!

TechTalk: How Dataverse and Microsoft Fabric powers ...

more Community News