top of page

AuthorDomain

A project that combines AWS with ChatGPT's API
Background

​

​

Who writes what we read shapes how we see the world. And, for a lot of us, that world view is shaped by male authors.​

​

Project Objective:

 

My goal is to build a tool that will allow users to understand the gender distribution of books in their personal libraries.​
 

​

Important Limitations:

​

This will be a highly imperfect tool. It's important to remember that there is a variance in gender identity, which this project will not address.  This project uses a combination of AWS's Rekognition imaging processing tool, and ChatGPT's NLP processing.  While this combination is effective, it is far from perfect, and is likely to misgender some authors, particularly authors who are non-binary or who's self-described gender does not match the algorithm's processing of their names. It is an experiment, an art-piece, and a tool for further thought.

​

​

Initial Stages:​

​

The first phase of the project's process was to wrap my head around AWS services, and playing with OpenAI's Python API.  After some research, I settled on AWS's Rekognition processing software S3 storage and retrieval service, and Elastic Beanstalk for the deployment of the final mini-app.

​

I created a Python program that can upload an image of the spines of books to AWS's S3, send those images to Rekognition, return the extracted text to ChatGPT3.5, and display a list of male and female authors, as well as the percentage of the books that are by male and female authors. While I initially played with other Python-compatible NLP tools, such as spaCY, I found that most of them were highly case-sensitive and struggled heavily with the often capitalized text that Rekognition pulled from the book spines.  So I settled on 3.5 as a solution, as OpenAI's tool proved highly effective at processing the variety of text styles that are used on book spines.

​

​

​

booksdan5.jpeg
Image of Books
Screen Shot 2023-12-30 at 5.11.38 PM.png
Python Code
Output
Screen Shot 2023-12-30 at 5.11.28 PM.png

This is a good start! There are a few problems that need to be worked out, and the results are not perfectly consistent for each run, but I'll put this in the good enough for a silly experiment category for now.

 

Later on, I'll add a feature that will allow multiple images to concatenate (statistically, 30 seems to be a pretty good representative sample of a book collection, if a user is willing/able to submit 3-4 images), and begin the front-end web-development for a simple submission form. 

​

​

The next step will be to format the code using Flask to handle the server-side logic of the web-based application. I modified the code to work with Flask to test locally; I will be using Postman to upload a test image.

​

​

Screen Shot 2023-12-31 at 4.35.46 PM.png

Sweet! Rekognition is able to process the image I submitted through Postman and return the text from the image.

Screen Shot 2023-12-31 at 5.26.15 PM.png

Next, I added the ChatGPT API integration, and was pleased to see that ChatGPT returned the results! ​

​

Screen Shot 2023-12-31 at 5.42.15 PM.png

After adding some HTML, I can upload the file locally on my browser. There is some inconsistency in ChatGPT's performance as can be seen in the screenshot below, but I'm focused on basic functionality for now.

Screen Shot 2024-01-01 at 1.16.48 PM.png

After consulting with some developer ninjas, I decided to pivot from using Flask and AWS Beanstalk to using AWS Lambda. This will involve making some important fundamental shifts in my code. Rather than handling HTTP requests directly through routes, I will adapt the code to Lambda's event-driven approach. I will also need to use base64 to transfer the image, and host the HTML and JS code on S3.

​

​

​

​

Screen Shot 2024-01-08 at 9.29.22 PM.png
Last update 01.08.24
bottom of page