The idea for this app is simple. You take a photo of a list of ingredients, say, on a box of packaged food. You get back that list as buttons, where clicking on each item will pull up information about that ingredient.
Because I wasn’t sure about the technical accuracy of OCR (optical character recognition), I went right into prototyping mode to see if I could find a library that works well. I tried a few: tesseract, and ocrad.js and after struggling to figure out how to get them to working, I was a little disappointed with the results. Ocrad.js tended to spit back a lot of nonsense characters, and while tesseract was much better, it still was missing a lot of small details – misinterpreting punctuation, erroneous characters here and there – that would make it very difficult to feel confident about using the data returned from it.
Then, seemingly as if I was handed a present directly from Santa, Google opens up it’s Cloud Vision API. Exactly what I needed, and with the power of Google behind it I felt like it would be the most technically accurate OCR I could find. The problem is that it’s not free. Well it’s free for small scale testing, but as soon as requests hit a certain limit, they bill per request. I’ll figure that out later..
So the challenges laid out in front of me were now clear. To build a working prototype of this app I would need to:
- Interface with the camera
- Crop the resulting image to ensure no extra characters get processed
- Send the image to cloud vision
- Display results as a list
- Store data (not necessarily persistent data) for retrieval to be displayed when a particular item is tapped
After about a day and a half, I had a working prototype:
After a picture is taken, it’s saved to a server so I can access it later. I haven’t implemented it yet, but ideally I’d save all data, including the cloud vision results so I can start to build a database of these ingredients. Saving the photo to the server took some messing around with getting a base64 string for the image, which requires first drawing the image into a canvas, then using a built in canvas function .toDataUrl(). Once I had the string, I was able to POST it to my server, save it with a unique filename (i.e. just use a date string, which isn’t sustainable for multiple time zones, I know, but it works for now). The issue was that the node library I’m using on the server side to send an API request to cloud vision requires sending an actually image file, not a base64 string. So I had to figure out how to decode it, which wasn’t that difficult. The part that tripped me up was realizing I had to remove the metadata from the base64 string in order to save it as an actual image file.
A base64 string looks like this:
The red part needs to be removed before writing to disk. So using a simple string replace method took care of that. But man, that was a few hours of a headache.
Cloud Vision API
Luckily I found a node library to handle the API calls. Made my life loads easier. The finicky problem I ran into here was parsing the data as it came back. I’m passing around objects pretty frequently, and somehow I lost track of when data was coming back as JSON directly, or just a string. At one point I was sending a string back to the client thinking I was sending JSON, and when I tried to parse the data I would get nonsense, of course. After another hour or so being really confused, I realized I was sending a string and needed to parse it into an object. Tsk Tsk.
I’m using jquery to bind the (‘load’) event to the HTML image element that I’m rendering the photo from the camera. I had to wait for this load event otherwise the img elt does not give the information I need. Trying to get the width of an img elt that doesn’t have a fully loaded image in it just doesn’t work. So after understanding that, I rashly plowed ahead and put an AJAX POST within the body of the image loaded callback. This worked fine on the first picture, but every successive photo would post the new photo, plus send as many POST requests as photos I had taken during this session of the app’s existence. What was happening was every time I took a new photo, I was adding another function the ‘load’ event, which would fire, along with all the other ones, and send a new POST request. I needed to unbind (I really don’t like jquery’s naming for attaching event listeners) the ‘load’ event so that ‘load’ callbacks would be killed once they were executed.
- UI Development (UX, UI, and coding it up)
- Database construction (gonna take FOREVER)