AuthorDomain
A project that combines AWS with ChatGPT's API
Background
​
​
There are interesting divides of the domains of knowledge that are becomingly increasingly partitioned in our accelerated technological times, where our confirmation bias allows us to immerse ourselves in information that sits comfortably with our present views. We can choose the articles we read, the books we digest, and the online conversations we sit in to align with ourselves, our world view, and our beliefs, and our deeply-held preconceptions about the way the world works.
One way in which knowledge becomes organized and separated is through the gender of those that create knowledge. It is no secret that until very recently, mostly male authors have been recognized as the primary bearers of intelligence, respected knowledge, and intellectual authority.
​
​
Project Objective:
​
My ambition is to create a tool that will allow individuals to understand the gender distribution of their personal physical book collections. This will allow individuals to understand gender as a selection process for the knowledge they consume.
​
Important Limitations:
​
This will be a highly imperfect tool. It's important to remember that there is a variance in gender identity, which this project will not address. This project uses a combination of AWS's Rekognition imaging processing tool, and ChatGPT's 3.5's NLP processing. While this combination is effective, it is far from perfect, and is likely to misgender some authors, particularly authors who are non-binary or who's self-described gender does not match the algorithm's processing of their names. It is an experiment, an art-piece, and a tool for further thought.
​
​
Initial Stages:​
​
The first phase of the project's process was to wrap my head around AWS services, and playing with OpenAI's Python API. After some research, I settled on AWS's Rekognition processing software S3 storage and retrieval service, and Elastic Beanstalk for the deployment of the final mini-app.
​
I created a Python program that can upload an image of the spines of books to AWS's S3, send those images to Rekognition, return the extracted text to ChatGPT3.5, and display a list of male and female authors, as well as the percentage of the books that are by male and female authors. While I initially played with other Python-compatible NLP tools, such as spaCY, I found that most of them were highly case-sensitive and struggled heavily with the often capitalized text that Rekognition pulled from the book spines. So I settled on 3.5 as a solution, as OpenAI's tool proved highly effective at processing the variety of text styles that are used on book spines.
​
​
​

Image of Books

Python Code


Output

This is a good start! There are a few problems that need to be worked out, and the results are not perfectly consistent for each run, but I'll put this in the good enough for a silly experiment category for now.
Later on, I'll add a feature that will allow multiple images to concatenate (statistically, 30 seems to be a pretty good representative sample of a book collection, if a user is willing/able to submit 3-4 images), and begin the front-end web-development for a simple submission form.
​
​
The next step will be to format the code using Flask to handle the server-side logic of the web-based application. I modified the code to work with Flask to test locally; I will be using Postman to upload a test image.
​
​

Sweet! Rekognition is able to process the image I submitted through Postman and return the text from the image.

Next, I added the ChatGPT API integration, and was pleased to see that ChatGPT returned the results! ​
​

After adding some HTML, I can upload the file locally on my browser. There is some inconsistency in ChatGPT's performance as can be seen in the screenshot below, but I'm focused on basic functionality for now.

After consulting with some developer ninjas, I decided to pivot from using Flask and AWS Beanstalk to using AWS Lambda. This will involve making some important fundamental shifts in my code. Rather than handling HTTP requests directly through routes, I will adapt the code to Lambda's event-driven approach. I will also need to use base64 to transfer the image, and host the HTML and JS code on S3.
​
​
​
​
