Justin Seitz is Canadian security consultant and author of two computer hacking books from No Starch Press. He blogs at AutomatingOSINT.com and can be found on Twitter @jms_dot_py.
This article was originally published on the AutomatingOSINT.com blog.
As part of my previous post on gangs in Detroit, one thing had struck me: there are an awful lot of guns being waved around on social media. Shocker, I know. More importantly I began to wonder if there wasn’t a way to automatically identify when a social media post has guns or other weapons contained in them. This post will cover how to use a couple of techniques to send images to the Imagga API that will automatically tag pictures with keywords that it feels accurately describe some of the objects contained within the picture. As well, I will teach you how to use some slicing and dicing techniques in Python to help increase the accuracy of the tagging. Keep in mind that I am specifically looking for guns or firearm-related keywords, but you can easily just change the list of keywords you are interested in and try to find other things of interest like tanks, or rockets.
This blog post will cover how to handle the image tagging portion of this task. In a follow up post I will cover how to pull down all Tweets from an account and extract all the images that the user has posted (something my students do all the time!).
The Python module that I use for this post has some specific installation instructions that you should follow here. Make sure you click the “Python (v2)” tab in the instructions and that you follow them carefully. We are also going to use an image manipulation library called Pillow which you can install like so:
The first step I did when I first started trying this was to push some pictures to the Imagga website that contained only weapons. So for example you could do a Google Image search for “AK-47” or “pistol” and download a few images. Then submit them to Imagga tagging demo here and see what the tagging results are:
So you can make a note of which keywords fit for a number of different test images. Of course you can adapt my methodology or the script to suit any particular use case that you might have but this is precisely what I did to start out.
This works fine and dandy for identifying a positive sample that does not include any background imagery, or for example someone holding a gun. This gentleman was found during my gangs of Detroit research:
Now I can appreciate that this image tagging technology is really doing its best when there are a pile of things going on in this photograph (look closely). However, you will notice that it did not detect any firearms or weapons related images. One theory I had (along with David Benford from Blackstage Forensics) was to chop the image up so that you had a higher possibility of isolating the weapon in the picture:
This appeared to work for this image however I also found that chopping images horizontally (depending on the orientation of the weapon) was also useful in some cases. Of course we are not going to be manually chopping up each image before testing it. That’s what code is for my friends.
So we have a couple of things we want to accomplish. We want to be able to feed in an image to our script and have the script chop it up into thirds both vertically and horizontally. Next we want to submit the image to Imagga using their API and then test the results against our list of tags. Let’s open up a new Python script called gunhunter.py and start punching in the following code:
Most of this code is just imports and setting our username and password for the Imagga API access. The only thing to take note of is on line 18 where we are setting our list of tags that we are going to use to determine whether we have a successful hit or not. With the Imagga API we have to first upload an image which returns back a content_id that we have to use in subsequent API calls. Let’s add this function now:
Let’s examine this little snippet of code:
Perfect, this will take care of getting the file up to Imagga. Now that we have a content ID we need to do a second call to get Imagga’s tagging engine to actually analyze the image and give us the results back. Go forth and implement this function now:
Let’s break this down:
That concludes our work that is required to deal with Imagga. Now let’s start working on our function that will split the image both horizontally and vertically.
Let’s break this down a little bit:
Now let’s implement the logic that will extract each chunk of the image and get it ready for submission to Imagga. Add the following code, being mindful of the indentation of course!
Let’s take a look at this chunk of code we have just written. This is the piece that will chop the image into three chunks vertically:
Whew! Ok we are nearly done. We have dealt with chopping the image vertically, so now let’s pound out the required code to chop the file horizontally. This is much the same as the code before, except we are now stepping DOWN the image, instead of stepping to the right in order to calculate our cropping box. Get typing!
No need to review this code as it is nearly identical to our previous section. Now for our final function to put in place which will be responsible for kicking off the entire process.
The last line just calls the function passing in an image on your hard drive. Now go out and do some searches on Google for guns or related terms including military terms. Download a test image and..
If you want to use the image from above, you can grab it (while the Tweet still lives that is) from here:
— #⃣BadNews (@SikDrive_CORN) August 19, 2015
Run that through your script and you should see some output like so:
[*] Trying image /Users/justin/Desktop/testimages/CMyN3KZU8AAB9ZZ.jpg [*] Image matches! => weapon
Perfect! Although this script does not do the work for downloading the images from Twitter, this technique can still be useful if you have a folder full of images that you want to run through the system or if you are looking to train it on other interesting things such as military equipment in war zones. In a future post we will cover how to use the Twitter API (adapted from my course) to pick an account and download all available images and automatically submit them to Imagga. Keep in mind that for each image you will have up to 7 Imagga API requests.
Your donation to Bellingcat is a direct contribution to our research. With your support, we will continue to publish groundbreaking investigations and uncover wrongdoing all around the world.
Along with our published content, we will update our readers on events that our staff and contributors are involved with, such as noteworthy interviews and training workshops.