How to hunt somebody down on the internet – Part 2.
In a previous article, I covered the two other methods of attempting to determine the identity of an anonymous individual on the internet; namely using their IP address or applying an appropriate technique of social engineering. Here, I’ll be covering the third; what I call the fist analysis approach.
The first part of it will be giving you a necessary framework and some background for the technique, and the second will move onto the actual mechanics of conducting the search.
Back at the dawn of the electronic communications era, the best the equipment could do was send an analogue electrical pulse down a wire or later on wirelessly through what was then called the ether. Voice transmission was some way off but a system called Morse code was devised using agreed combinations of short or long pulses, otherwise known as dots and dashes, to spell words and therefore messages. Essentially we’re talking ones and zeroes – binary. Morse was just the grandaddy that spawned all those new message encoding upstarts such as ASCII or EBCDIC.
As often happens when people interact with a new technology, an unexpected thing emerged – telegraph operators found they could recognise the individual sending to them by nuances in their transmitting style, which they called their “fist”.
In a strange way, this seemingly useless feature actually saved a few individual’s lives in WWII. Radio operators dropped into enemy territory still used Morse, because the equipment required could be squeezed into a small suitcase but above all because the messages could be encoded. If captured by either side, they were always given the same stark choice; continue transmitting for us or face immediate execution as a spy. You see, because the recipients knew their fist, there was no point in substituting one of your own people to send their messages. The fist would be wrong and they’d probably be detected straight away, although even then a few radio games were played.
The theory underlying fist analysis is predicated on an exactly analogous phenomenon. People express themselves on the internet in their own unique way. They have favourite words or expressions, spelling or grammar mistakes they always make, curiosities of punctuation, wrong or missed capitalisations, an habitual phrasing structure, ideas they frequently mention, strange little tics of their keyboard. If you have enough written material by them, it’s possible to develop a very distinctive lexicological profile composed of all those mannerisms which you can then go on to use in identifying them.
To establish a common vocabulary and borrowing a term from poker, I’ll refer to these items from now on as tells.
There are no deep psychological insights about such tells. What we’re after here is the involuntary Galvanic reflexes, the ridges, whorls and swirls of their fingerprints and nothing more. We’re looking for the riboproteins, the AGCT building blocks of their DNA, the elemental markers rather than the complete human being they’ll express. Concentrate on that level of minute detail and you’ll be a lot more successful with this technique.
They have an identity under some anonymous name and they also have another identity under their real name. The only common denominator between those two persona will be the equivalence in their fist, therefore that’s what you use to link those two identities. However, if they’ve never left any text on the internet under their real name, no amount of fist analysis will find them. Even then, such a dead-end search can occasionally yield interesting gems such as the same person operating under several different personas, which is quite common for paid trolling.
Again for clarity and from now on, I’ll use the term handle to denote the anonymous name a person uses.
There’s some interesting things to be noted about the handle a person chooses to operate under. For starters, once having established that handle, they become invested in it. They’ll tend to use the same one across multiple sites. That habit relates to something called the persistence of identity; people are unconsciously very reluctant to renounce their sense of themselves as a distinct individual. They can get comfortable with maintaining one alter ego, but running several with ease tends to be for the very disciplined or those suffering from some variety of dissociative identity disorder.
This unconscious refusal to abandon their true identity manifests itself both in the real and cyber world with about 70% of handles adopted using a least one initial or a variation of an element of a person’s true name. For instance, Jesse James established a new life for a time under the handle James Howerd. Peruse your local wanted posters, you’ll often see that phenomenon once removed when aliases are listed. So often, they’ll invent a new surname and keep using it, varying nothing but the forename.
If you’re dealing with a new handle which you can’t find anywhere else around the blogosphere, it usually means it’s a one-off, ad hoc handle being employed for whatever reason by someone who usually operates under another one. Where it’s a fairly random jumble of letters and numbers, it tends to confirm it’s a throw away they don’t envisage using again. Put simply, by not personalising it they’re making no investment in it. Hopefully there’s enough tells to make the link to their usual handle and from there to their real name.
As with the social engineering approach, you have to scour the internet for everything they’ve ever written under their handle. Read everything you can find, start building your list of tells and noting any information they’re leaking. There’s a subtle distinction I make between an information leak and an information declaration. A leak is something you can reasonably deduce from something they wrote, while a declaration is something they said explicitly about themselves. The former, because they very often didn’t realise they were doing it, tend to be true, while the veracity of the latter in my experience can be highly dubious.
It’s worth noting that information leaks tend to occur more frequently when they first start using the blogosphere or they’re commenting on what they consider to be friendly turf.
The objectives of the research phase are two-fold. The first of course is to pick out their tells. I demonstrated an example of doing this in a previous article a few years back, and it gives an idea of how detailed an inspection you have to make. This can be difficult because to some extent, you need to suppress a key element of your basic reading skills. When people read, they don’t actually spell out each word, but rather interpret the distinctive shape a sequence of letters make as a certain word. The same is true of common phrases. That’s one of the reasons why typos tend to escape detection; they’re nearly the right shape. It can help to read it extremely slowly or even backwards, word by word.
The second objective is to get an impression of the person; how they think, what’re their fixations, what do they approve of, what do they hate and to note information leaks. When you find a possibly strong candidate, it’s this profile information which usually confirms if it’s actually them or not.
You have to be aware that quite frequently the handle personality is a release valve, a cathartic outpouring and a spiteful expression of frustrations in other areas of their life. Operating under an anonymous handle allows that inner ugliness out. It can be pretty stark but you’ve got to look through it to find their public persona. Very often, that feeling of achievement in finding them is balanced out by discovering nothing more than a nasty but very sad individual working out some deep personal issues in totally the wrong fashion.
Okay, now you’ve got a framework for a search, let’s look at the mechanics of how it is conducted. Before we take the plunge, let’s agree a few conventions.
When I ask you to try typing something in, I’ll enclose it between square brackets – which means [type everything in between except the square brackets].
The second one is that when I give you a link and you click on it, it’ll open in a new tab or window depending on how you’ve set up whatever browser you’re using. Click it or not (no compulsion, we’re all consenting adults here), look at it, read it and close it and you should by default drop back to your place in this article, that way you’ll never be in danger of losing your place in such a bodice-ripper of an article.
The third and final one, is that in the places where I need to show you a URL, I’ve substituted “hzzp” for the “http” part in order to stop the WordPress editor turning it into a real link.
Google is of course your prime attack weapon, simply because it indexes the largest amount of internet data. At face value, its search facilities don’t look too sophisticated, but they’re not as basic as most people think. What we will be doing is feeding it words, phrases, fragments of language and certain magic keywords in order to produce fewer and more accurate search results. Again, to establish a consistent vocabulary, I’ll refer to all those things we type into Google as search tokens, or simply tokens.
You may occasionally get a suspicious activity query from Google, because you’ll be doing a few things only a fraction of a percentage of Google users get up to. The message looks like this – “Our systems have detected unusual traffic from your computer network. This page checks to see if it’s really you sending the requests, and not a robot”. Don’t panic! You’ll be asked to enter a captcha code to prove you’re not a robotic entity and then you can continue as usual. If you do happen to be a cybernetic life form, I’d suggest phoning a friend.
There’s a bit of knowledge that I’m sure less than a tenth of 1% of the world know that I’m going to trust you with, but only on condition you never tell anyone else. This is serious; if you don’t swear to keep it under your hat, you’ll find a man with a gun on your doorstep. Break the Omerta rule, I send Luca Brasi after you.
There are actually three ways to enter search tokens into Google; three Input Areas, which I’m going to abbreviate to IAs in order to prevent wear and tear on my poor little wiggly but manicured phalanges. The Basic IA, the Advanced IAs and the URL IA.
Taking them in turn, the Basic IA, as its name suggests, is in the centre of the basic Google screen and looks like this. The button men use it.
The Advanced IAs, of which there are several, are all contained in the advanced search screen and the screen itself looks like this. The caporegimes use it.
The URL IA is located at the top of your browser and for this article will contain its internet address, which will contain something like “hzzps://thepointman.wordpress.com/2044/08/01/how-to-hunt-somebody-down-on-the-internet-part-2”, where http should be substituted for hzzp. This input method is for the exclusive use of the consigliere and I won’t be talking about it much, because that’d be a book-length article.
The easiest introduction to the advanced search facilities is to start by using the advanced search screen. This screen is accessed by clicking on the “Settings” link located on the bottom of the basic Google screen, second in from the right. Click it and then select from the menu which pops up the item second from the top – “Advanced search”. The advanced search page is then presented.
It consists of several rows of different search criteria each with their own IA. The general format is that each row has a title, an input area and optionally some help on how it’s used. For instance, the first one has a title of “all these words: “, followed by an input area and the help text “Type the important words: tri-colour rat terrier”.
Take a few minutes now to study and understand what each of the search criteria is intended to do. I won’t be doing an exhaustive discussion of each one, though for various reasons, I’ve used most of them at one time at another. There are a few things to note about the advanced search screen. The first is that it does not include all the search operators you can use, such as searching for a range of numbers. You can find a complete list of all operators here. Again, take the time to study them. There aren’t that many and the explanations of what they do are good.
The second thing is that it doesn’t always work the way you’d expect it to do – which is a diplomatic way of saying it’s buggy, very buggy. In point of fact, once you get past searching for simple things on the basic screen, Google is the sort of bug-ridden substandard product you’d expect from a self-congratulatory corporate culture that is more focused on really neat cutting-edge half-finished things than getting the basics right. Anyway, it may be shite but it’s the only game in town at the moment, so I’ll leave it to you to find the particular workarounds one is obliged to discover for oneself to get useful results out of it.
Let’s start doing the research phase by looking for that Pointman guy. Kick up the advanced search screen, type in [pointman] without the square brackets into the first Advanced IA, scroll down and press the “Advanced Search” button. The normal Google search results page is presented, but notice that [pointman] is in the Basic IA of it. You should get about 1.5 million results (the actual number of hits varies from day-to-day depending on whatever I’ve been up to or who’s been saying terrible things about me) but if you glance through them, you’ll quickly notice that Google has helpfully included results containing the two words point and man separated by a space.
We need to suppress this unhelpful behaviour because when searching for tells, we need to force Google to look for exactly what we type, and not what it thinks we intended to look for. The way to do this is by enclosing whatever we’re looking for, a word, a phrase or a fragment of language, in quotation marks, aka double quotation marks in these post-grammar Visigoth days, since nobody appears to know what speech marks are called, or even why there’s a difference. It must be nice to be double quoted.
Rather than starting afresh with a blank advanced search screen, just press the back icon on the previous advanced search screen and you should again be looking at it with [pointman] already entered. Add quotes to either end of it, turning it into [“pointman”], click the Advanced Search button and you should have eliminated any point-space-man results and whittled the hits down to about 900,000 results. Again, notice that the Basic IA of the results page now contains [“pointman”] with the quotes you added.
By habit I enclose nearly all search tokens for words and phrases in quotes. If the tell you’re looking for is a variation of a phrase, the wildcard or asterisk operator can be very useful. eg “a * saved is a * earned”.
In passing and only if you’re a connoisseur of crapware, try [” pointman”] or [“pointman “] or [” pointman “] and notice the difference in the results – pointman with an intervening space has crept back into the results. The buggy wuggie bugle boy of company G strikes again. I have to say though, that rather fetching girl in the middle of the trio is carrying the other two, but then again she’s got all the moves and a great set of pipes. Notice that momentary growl she does; she’d a sadly underused vocal range.
Okay, we’ve lost a half million or so hits but it isn’t actually much of a help since we’ve got nearly a million left.
Scanning through the results, you’ll notice some hits for a TV series called Pointman, which I’ve never heard of. Since I’m not in any TV series that I’m aware of, let’s eliminate those hits. Go back into the advanced search and enter [tv] into the IA labelled “none of these words: “, click the Advanced Search button and you should be down to about 750,000 results. Notice that the Basic IA of the result page now contains [“pointman” -tv]. When you want Google to exclude results for a particular search token, you put a minus in front of it if you’re using the Basic IA.
Let’s use another magic field in the advanced search dialog to slim the search results down to the bugger’s comments at a certain Anthony Watts’ site. Get down to the IA labelled “site or domain:” and key in [hzzp://wattsupwiththat.com]. Whoopidy doop! Only a few hundred results, so it’s looking a bit more manageable. What you’re looking at on the results screen is supposedly every occurrence of the word pointman at WUWT – that’s about the right size to start reading through picking up the tells and building the profile. Again, notice that the Basic IA of the results page now contains [“pointman” -tv site:hzzp://wattsupwiththat.com].
At this point, we’re going to stop using the advanced search screen because it’s quite simply too unpredictable and bug-infested for our purposes. To illustrate, enter the three search tokens [“pointman”], [tv] and then [-site:hzzp://wattsupwiththat.com] into the appropriate IAs. Notice that I’ve put a minus in front of the site: token. We’re looking for the non-televisual me on any site other than WUWT. The results page looks like this, but notice the site: token has mysteriously disappeared from the Basic IA of the results screen.
There are a lot worse bugs than that. It’s actually easier and more predictable to go directly to the basic Google screen and type into the Basic IA [“pointman” -tv -site:hzzp://wattsupwiththat.com] and you get this screen with the site: token left in the IA results screen. From now on, you can just edit whatever is in the Basic IA of the results page.
We won’t lose any functionality since whatever you type into the advanced search screen appears (a lot of the time) as one long string of tokens in the basic screen and then as something called parameters in the URL IA. We’re just bypassing a bug-infested layer of the Google interface.
Okay, at this point you know how you can narrow down your search to a site they frequent. Read everything by them, compile your list of tells and let’s start using them to establish what I call the baseline.
Let’s assume you’re hunting someone operating under the handle mysterio and you’ve picked out three of their tells; tell01, tell02 and tell03. Go to the basic Google search and input [“mysterio” site:hzzp://WhateverTheSiteIs.com] and run the search. This’ll give you every comment they’ve ever made there but more importantly, a count of their comments.
Now, let’s find out how good the tells you’ve identified are.
Go to the basic Google search and input [“tell01” “tell02” “tell03” site:hzzp://WhateverTheSiteIs.com] and run the search. Notice we’ve dropped “mysterio” from the search tokens. If the count of search results matches the search under their handle of mysterio, I’d be very surprised, since you’re only looking for entries containing all of those tells.
Let’s amend the Basic IA to [“tell01” OR “tell02” OR “tell03” site:hzzp://WhateverTheSiteIs.com]. With the use of the OR keyword, we’re now searching the site for any entries containing at least one of their tells. If you’ve identified really good discriminating tells, you should be looking at a hits count similar to the search under their handle of mysterio.
You can narrow the search criteria further though. Amend the Basic IA of the results screen to [“tell01” “tell02” OR “tell03” site:hzzp://WhateverTheSiteIs.com] and run the search. What we’re now doing is looking for are all entries that have tell01 and either tell02 or tell03.
If you’re starting to pick out only their comments, you’re well on track. You can keep on playing with the query and save it at any point by simply doing a save to favourites of the results page. You can then just reload the page from your favourites to continue playing with the search tokens.
At this point, I’d move to other sites they frequent under their handle and run the same tells. Amend the Basic IA of the results screen to [“tell01” “tell02” OR “tell03″ site:hzzp://AnotherSite.com]. All that’s changed is the site: token and if the results are solid, you’re now locked and loaded to find them anywhere on the internet.
Let’s assume your query that’s picking them out looks something like [tell01” “tell02” OR “tell03”].
You now run your finely honed query against the whole of the internet. Key into the Basic IA [-“mysterio” “tell01” “tell02” OR “tell03”]. Notice we’ve excluded mysterio from the search by using the minus operator. We’re now looking across the whole of the internet for someone with those tells who’s not using the handle mysterio. If you’re looking at millions of results, you need to go back and refine your search criteria. If you’ve got a manageable number of hits, start wading through them because they’re most probably there and if you’ve developed a good profile of their real persona, you’ll be able to pick them out.
There’s a search token I haven’t mention yet that I find useful on the internet wide search. The filetype: token restricts the search to a particular type of document, for instance a document (DOC), spreadsheet (XLS) or text file (TXT). Any file extension will work. For instance running [“tell01” “tell02” OR “tell03” filetype:pdf] restricts the search to files with an extension of PDF which contain those tells.
Also, if you look at the search results, you’ll see a “Search tools” button in the menu under the Basic IA which you can use to further refine your results by country, time and various other criteria. As with all things in this area, try it out for yourself.
Let’s recap the methodology. Find what sites they comment on. For each site, search it using their handle to pick out all their comments there. Read all of them and pick out what you consider to be their tells. Then search the sites using only their tells to verify your search criteria. When you’ve developed what looks like a good set of search criteria, run it against the whole of the internet. If you’re looking at a few hundred results, you’ve probably found them.
Given any set of tells, there are a number of permutations of them you can search on. You may also have developed more than one set of search criteria. Google’s treatment of the ordering of search tokens is interesting, to say the least. I’ve automated that combinatorial legwork to some extent with a variety of homebrew wing and a prayer scripts. The search criteria are input nodes and the results set are the output nodes, needing nothing but a bit of annealing to connect the two. I don’t share scripts, because I know I’d end up running support of them, so don’t even ask. Also, I don’t bounty hunt either. If you fancy writing something yourself, you’ve just read a program spec for it.
Having found out who’s behind a handle, the question arises as to what to do with that information. It does come down to why you were looking for them. If they’re just an obnoxious troll, outing them will just make them reappear under a different handle. A hint that you know a lot of stuff about them can work wonders. For instance, a few years back a person for their own reasons, started stalking me for a time. Eventually, I left them a comment leaving them in no doubt I knew exactly who they were and all about their real circumstances, as opposed to their information declarations. They’ve never bother me since.
I tend to look for people for my own purposes and have never yet outed anyone, for a number of reasons. The big one is the day when a person, who’s obliged for security, personal or professional reasons to use an anonymous title to express their genuine viewpoint, is forced to use their real name is the day free comment dies on the blogosphere. Anonymity is abused I know, but I’m a big boy who’ll take the rough with the smooth. Anyway, information you know that they don’t know you know tends to be more useful.
This particular method I’ve outlined is not the only way of doing this sort of search, but it’s the most direct and easily explainable to the general internet user. Try it, have fun and develop your own techniques. It’s a blue skies area so feel free to innovate. Let me know how you get on and any search tips you invent.
Related articles by Pointman: