A guide to investigating using reverse image search-bellingcat

2021-11-24 02:47:07 By : Ms. Kate Lau

Aric Toler started volunteering for Bellingcat in 2014 and has been working since 2015. He is currently responsible for Bellingcat's training and research work, with a focus on Eurasia/Eastern Europe.

Reverse image search is one of the most well-known and simple digital investigation techniques. It has a two-click function to select "Search for images in Google" in many web browsers. This method is also widely used in popular culture, perhaps most notably in the MTV show Catfish, which exposes people in online relationships using stealing photos on social media.

However, if you only use Google for reverse image search, you will often be disappointed. Limiting your search process to only uploading photos in their original form to images.google.com may provide you with useful results for the most obviously stolen or popular images, but for most complex research projects you will need additional The site is for you to use-as well as a lot of creativity.

This guide will introduce detailed strategies for using reverse image search in digital surveys, focusing on identifying people and locations, and determining the offspring of images. After detailing the core differences between search engines, Yandex, Bing, and Google conducted tests on five test images showing different objects and from different parts of the world.

The first and most important piece of advice on this topic cannot be emphasized enough: Google reverse image search is not very good.

As of the date of this guide, the undisputed leader in reverse image search is the Russian website Yandex. After Yandex, the runner-ups are Microsoft's Bing and Google. The fourth service that can also be used for investigations is TinEye, but this site deals exclusively with infringements of intellectual property rights and finds complete duplication of images.

Yandex is by far the best reverse image search engine, with a powerful ability to recognize faces, landscapes and objects. This Russian website makes extensive use of user-generated content, such as travel review sites (such as FourSquare and TripAdvisor) and social networks (such as dating sites) to obtain very accurate results through facial and landscape recognition queries.

Its advantage lies in photos taken in the context of Europe or the former Soviet Union. Although photos from North America, Africa, and other places may still return useful results on Yandex, you may find yourself scrolling through results mainly from Russia, Ukraine, and Eastern Europe rather than the country where the target image is located. Frustrated.

To use Yandex, visit images.yandex.com and select the camera icon on the right.

From there, you can upload a saved image or enter a URL hosted online.

If you are confused about the Russian user interface, please pay attention to Выберите файл (select file), Введите адрес картинки (enter picture address) and Найти (search). After searching, please pay attention to Похожие картинки (similar images) and Ещё похожие (more similar).

The facial recognition algorithm used by Yandex is very good. Yandex will not only look for photos that look similar to a photo with a face in it, but also look for other photos of the same person (determined by matching facial similarities), but with completely different lighting, background colors, and positions. While Google and Bing may only look for other photos that show people with similar clothes and general facial features, Yandex will search for these matches and other photos with facial matches. Below, you can see how the three departments searched the face of the Russian suspect Sergey Dubinsky in the MH17 crash. Yandex found a large number of Dubinsky's photos from various sources (only two top results have unrelated people). The result is different from the original image, but it shows the same person. Google had no luck at all, and Bing had a result (fifth image, second line) that also showed Dubinsky.

Obviously, Yandex is a Russian service organization, and people have concerns and doubts about its relationship with the Kremlin (or potential future relationships). Although we at Bellingcat often use Yandex's search function, you may be more paranoid than us. The risk of using Yandex is at your own risk, especially if you are also worried about using VK and other Russian services. If you are not particularly paranoid, try searching for unindexed photos of yourself or someone you know in Yandex to see if it can find yourself or your avatar online.

In the past few years, Bing has caught up with Google in reverse image search capabilities, but it is still limited. Bing's "Visual Search" is located at images.bing.com, which is very easy to use and provides some interesting features not found elsewhere.

In image search, Bing allows you to crop the photo (the button below the source image) to focus on a specific element in the photo, as shown below. The result of cropping the image will exclude irrelevant elements and focus on user-defined boxes. However, if the selected part of the image is small, it is worth manually cropping the photo yourself and increasing the resolution-low resolution images (less than 200×200) will give poor results.

Below, a Google Street View image of a man walking a few pugs is cropped to focus only on the dogs, leading Bing to suggest the dog breeds visible in the photos ("look like" function), and visually similar results. These results mainly include pairs of dogs that are walking, matching the source image, but not always only pugs, because French bulldogs, English bulldogs, mastiffs, etc. are all mixed in.

By far the most popular reverse image search engine, at images.google.com, Google is suitable for most basic reverse image searches. Some of the relatively simple queries include identifying well-known people in photos, finding the sources of images shared in large numbers on the Internet, and determining the name and creator of an artwork. However, if you want to find an image that is not close to an exact copy of the image you are researching, you may be disappointed.

For example, when searching for the face of a person who tried to attack a BBC reporter at a Trump rally, Google could find the source of the cropped image, but could not find any other images of him, or even someone similar to the past.

Although Google is not very powerful in finding other instances of this person’s face or similar-looking people, it still found an uncropped version of the original photo with screenshots, showing some usefulness.

In order to test different reverse image search technologies and engines, a small number of images representing different types of surveys were used, including original photos (not uploaded to the Internet before) and recovered photos. Since these photos are included in this guide, these test cases may not work as expected in the future because search engines will index these photos and integrate them into the results. Therefore, a screenshot of the results that appeared at the time of writing this guide is included.

These test photos include multiple different geographic regions to test the strength of source material search engines in Western Europe, Eastern Europe, South America, Southeast Asia, and the United States. For each photo, I also highlighted the discrete objects in the image to test the strengths and weaknesses of each search engine.

Feel free to download these photos (each image in this guide is directly hyperlinked to a JPEG file) and run them through the search engine yourself to test your skills.

Isolation: Nizhny Novgorod's white SUV

Isolation: a trailer for Nizhny Novgorod

Isolation: Apartment complex, "Pagitte Plaza"

Isolated: Toca do Açaí

Isolation: Dutch flag (also rotated 90 degrees clockwise)

Each of these photos was chosen to show the functions and limitations of the three search engines. Although Yandex sometimes seems to be playing digital black magic, it is far from foolproof and may struggle with certain types of searches. For some methods that might overcome these limitations, I detailed some creative search strategies at the end of this guide.

It is foreseeable that Yandex can easily identify this Russian building. In addition to photos from similar angles to our source photos, Yandex also found images from other angles, including 90 degrees counterclockwise from the vantage point of the source image (see the first two images in the third row).

Yandex also effortlessly identified the white SUV in the foreground as a Nissan Juke.

Finally, in the most challenging isolated search for the image, Yandex failed to identify the gray trailer in front of the building. Many results look like results from the source image, but none of them actually match.

Bing did not succeed in identifying this structure. Almost all results are from the United States and Western Europe, showing houses with white/grey masonry or siding and brown roofs.

Similarly, Bing cannot determine that the white SUV is a Nissan Juke, but instead focuses on a series of other white SUVs and cars.

Finally, Bing failed to identify gray trailers and paid more attention to RVs and larger gray campers.

Google's results for the entire photo were very bad, and it seemed to have almost no visual similarity to the House TV shows and images.

Google successfully identified the white SUV as Nissan Juke and even noted it in the text field search. As Yandex sees it, providing images to search engines from a similar perspective to popular reference materials—similar to the side views of cars in most advertisements—will best let the reverse image algorithm work its magic.

In the end, Google recognizes what a gray trailer is (travel trailer/camper), but its "visually similar image" is far from that.

Yandex is technically able to recognize the urban landscape as the urban landscape of Cebu, Philippines, but it may only be accidental. The fourth result in the first row and the fourth result in the second row are Cebu, but only the second photo shows the same buildings as in the source image. Many results also come from Southeast Asia (especially Thailand, which is a popular destination for Russian tourists), noting similar architectural styles, but none of them share the same point of view as the source.

Among the two buildings isolated from the search (Padgett Palace and Waterfront Hotel), Yandex was able to identify the latter but not the former. The Padgett Palace building is a relatively inconspicuous high-rise building, filled with apartments, and there is also a casino inside the Waterfront Hotel, which leads to a series of tourist photos showing that its architecture is more distinctive.

Bing did not find any results when searching Cebu Cityscape even in Southeast Asia, which shows that its index results have severe geographic restrictions.

Like Yandex, Bing cannot recognize the building on the left side of the source image.

Bing cannot find the Waterfront Hotel, either when using Bing's cropping feature (bring back only low-resolution photos) or manually cropping from the source image and increasing the resolution of the building. It is worth noting that the results of these two versions of the image are exactly the same except for the resolution, but they have brought completely different results.

Like Yandex, Google brought back Cebu photos in its results, but did not have much resemblance to the source image. Although Cebu is not in the thumbnail of the initial result, an image of the Cebu skyline is taken as the eleventh result (the third image in the second row below) through "visually similar images".

Like Yandex and Bing, Google cannot identify the high-rise apartment buildings on the left side of the source image. Google also failed to achieve success in the image of the Waterfront Hotel.

Yandex found the source image from this Bloomberg campaign ad-a stock photo of Getty Images. At the same time, Yandex also discovered the photo version with the filter applied (second result, first row) and other photos from the same stock photo series. Also, for some reason, pornography, as shown in the fuzzy results below.

When only isolating the face of the stock photo model, Yandex brought back some other photos of the same person (see the last picture in the first row) and pictures of the same stock photo collection in the classroom (see the fourth picture). a row).

Bing had an interesting search result: it found photos that exactly matched the stock photos, and then brought back "similar pictures" of other men in blue shirts. The resulting "pages with this content" tab provides a handy list of duplicate versions of the same image on the web.

Focusing only on the face of the stock photo model will not bring back any useful results, nor will it provide its source image.

Google recognized that the image used in Bloomberg's campaign was a stock photo and brought back accurate results. Google will also provide other stock photos of people wearing blue shirts in class.

When quarantining the student, Google will once again return to the source of the stock photo, but its visually similar images will not show the stock photo model, but a series of other men with similar facial hair. We think of this as a half win of finding the original image, but will not display any information about a specific model like Yandex does.

Yandex was unable to determine that this photo was taken in Brazil, but focused on the urban landscape of Russia.

For Toca do Açaí, for some reason, Yandex mainly brought back porn as a result. These images are blurry, you can click here to see the result. However, despite the blurring of the smudges, two of the results did correctly identify the logo.

For the stop sign [Estacionamento], Yandex didn't even approach it.

Bing didn't know that this street view picture was taken in Brazil.

......Bing also did not recognize the stop sign...

...Or the Toca do Açaí logo.

Although the image was taken directly from Google’s Street View, Google Reverse Image Search did not recognize photos uploaded to its own service.

Just like Bing and Yandex, Google cannot recognize stop signs in Portuguese.

Finally, Google did not come close to recognizing the Toca do Açaí logo, but focused on various types of wood panels, showing how it focused on the background of the image rather than the logo and text.

Yandex knew that this photo was taken at the exact location in Amsterdam, and found other photos taken in the center of Amsterdam, even including photos of various birds in the photo.

Yandex correctly identified the bird in the foreground of the photo as a gray heron (серая цапля), and brought back a series of gray heron images similar in position and posture to the source image.

However, Yandex failed the test to identify the Dutch flag flying in the background of the photo. When the image was rotated 90 degrees clockwise to present the flag in its normal mode, Yandex was able to determine that it was a flag, but did not return any Dutch flags in its results.

Bing only recognized that the image showed a city landscape with water, and no results from Amsterdam.

Although Bing encountered difficulties in identifying the urban landscape, it correctly identified the bird as a gray heron, including a special "looks like" result, and went to the page describing the bird.

However, like Yandex, the Dutch flag is too confusing for Bing, whether it is in its original form or in a rotating form.

Google noticed reflections in the canals of the image, but nothing more, focusing on various paved paths in the city, while Amsterdam did not.

Google was close in the bird recognition exercise, but hardly missed it-it was a gray, not a blue heron.

Google also cannot recognize the Dutch flag. Although Yandex seems to recognize that the image is a banner, Google's algorithm focuses on the window sill that composes the image and mistakes the banner for a curtain.

Final scorecard: Yandex 9/14; Bing 2/14; Google 3.5/14

Even with the shortcomings described in this guide, there are some ways to maximize your search process and take advantage of search algorithms.

First, you can use some more specialized search engines besides the three detailed in this guide. For example, the Merlin Bird ID app from Cornell Laboratories is very accurate in identifying the type of bird in a photo or providing possible options. In addition, although it is not an application and does not allow you to search for photos backwards, FlagID.org will allow you to manually enter information about the flag to find out its source. For example, even the Dutch flag that Yandex has encountered has no problem with FlagID. After selecting the horizontal tricolor flag, we put in the colors visible in the image, and then received a series of options, including the Netherlands (and other similar-looking flags, such as the Luxembourg flag).

If you are viewing a foreign language using a spelling you don’t recognize, try using some OCR or Google Translate to make your life easier. You can use Google Translate's handwriting tool to detect the language of your handwritten letters*, or choose a language (if you already know it) and write it out yourself. Below, the name of the cafe ("Hedgehog in the Mist") was written using Google Translate's handwriting tool, giving an input version of the searchable word (Ёжик).

*Please note that if you do not understand the language, Google Translate is not very good at recognizing letters, but if you scroll through enough results, you can eventually find your handwritten letters.

As detailed in the short Twitter thread, you can pixelate or blur the elements of a photo to trick search engines into focusing on the background. In this photo of Rudy Giuliani’s spokeswoman, uploading an accurate image will not bring back the results showing where it was taken.

However, if we blur/pixelize the woman in the middle of the image, it will allow Yandex (and other search engines) to use their magic to match all the other elements of the image: chairs, paintings, chandeliers, carpets, and wall patterns. and many more.

After performing this pixelation, Yandex knows exactly where the image was taken: a popular hotel in Vienna.

Reverse image search engines have made tremendous progress in the past decade, and there is no end in sight. As the number of indexed materials continues to increase, many search giants have attracted their users to sign up for image hosting services, such as Google Photos, providing endless machine learning materials for these search algorithms. Most importantly, facial recognition AI is entering the consumer field through products such as FindClone, and may have been used in some search algorithms, namely Yandex. There is no publicly available facial recognition program that can use any western social network, such as Facebook or Instagram, but it may only be a matter of time before such a thing occurs, and it also has a major blow to online privacy (at a huge cost). Increased digital research capabilities.

If you have skipped most of the article and are just looking for the bottom line, here are some easy-to-understand reverse image search techniques:

Your donation to Bellingcat is a direct contribution to our research. With your support, we will continue to publish groundbreaking investigations and expose illegal activities around the world.

In addition to the content we have published, we will also introduce readers to the activities our employees and contributors participate in, such as noteworthy interviews and training seminars.