So that thing about filthy-mouthed liberals? Something seemed—I don't know—off about those numbers. Quick exercise: I'm from the South. I know what patriotic Americans sound like after church on Sundays when, for example, someone scores on 'dem Saints. I also know many people in the service. I've driven hours into the desert to hang out with them as they passed through Fort Irwin. And you know what? Those guys who used "fucking" as an adverb in high school? They're still fucking awesome.
So I re-ran Patrick Ishmael's experiment using his instructions:
How did I get this result? I searched Google using the following format and recorded the page results that were returned:
site:xyz.com "search term 1" OR "search term 2" OR "search term 3"...
Nine search terms total—the seven profanities as single words, and two of those as their own two-word variations. I then added the individual site results together and compared them.
Some of his results are, shall we say, a little misleading. Just so you don't think I'm inventing this, I'm going to link to the very searches he claimed to do. Here's his entry for Daily Kos:
Daily Kos 146,000
I'm not sure where that number came from, because when you do the search he claimed he had, these are the results:
Daily Kos 9,960
Don't believe me? Click on the link. Where did those other 136,000 or so instances come from? I'm not sure. But I think I've figured out why the Huffington Post came across so salty. His number's on top; mine, complete with link to his search, on the bottom:
Huffington Post 109,000
Huffington Post 10,600
Throw in a little Google invariability, and what you have here is a simple transcription error. Wonkette, on the other hand:
Wonkette 78,200
Wonkette 3,960
The only major error of this sort I found on his list of conservative sites was Ace of Spades, who ain't quite as profane as Ishmael estimated:
Ace of Spades 9,730
Ace of Spades 7,480
Now, I don't necessarily believe my numbers are correct. But I do know that his numbers should be similar (considering the inevitable Google waggle) to mine, since I used his methodology. But some of them aren't even close. I wonder why that is?
[Those interested in witnessing counter-counter-methodology slams are encouraged to check the comments. Also, I'm more than willing to be wrong about this, as Ishmael's numbers jibe with my intuitive sense on the matter.]
Actually, you forgot "cock sucker" or "mother fucker." If you include those, you get 145,000 hits for DailyKos. If you're going to reconduct the search, you have to do it right.
Posted by: | Friday, 02 March 2007 at 04:30 PM
Sorry, your error wasn't in the names. It was that you used "www." before dailykos, which is extraneous.
Posted by: | Friday, 02 March 2007 at 04:36 PM
Nope. Here it is without the "www": 10,500.">http://dailykos.com+tits+OR+motherfucker+OR+%22mother+fucker%22+OR+%22cock+sucker%22+OR+cocksucker+OR+shit+OR+cunt+OR+piss+OR+fuck&num=100&hl=en&lr=lang_en&safe=off&as_qdr=all&filter=0">10,500.
Posted by: Scott Eric Kaufman | Friday, 02 March 2007 at 04:39 PM
Your searches come up with the "Search English pages" option marked. The numbers are much larger when you search the entire web. I'll leave the methodology slam to those who know what they're talking about.
Posted by: eb | Friday, 02 March 2007 at 04:41 PM
Also, "cock sucker" and "mother fucker" are both in there. Scroll over the links, you'll see them. Of course, they're in parentheses, so we don't get every mention of the word "mother" popping up. (Presumably "mother" isn't a dirty word unless you're a liberal.)
Posted by: Scott Eric Kaufman | Friday, 02 March 2007 at 04:41 PM
Take it off English, eb, and you get 337,000, not 146,000, as Ishmael claims. So it remains strange.
Posted by: Scott Eric Kaufman | Friday, 02 March 2007 at 04:44 PM
I get 165,000 from the link in 337,000.
Posted by: eb | Friday, 02 March 2007 at 04:47 PM
Do you have "safe search" turned on?
Posted by: Scott Eric Kaufman | Friday, 02 March 2007 at 04:50 PM
I get 337,000 from the link in 10,500. Google is reformulating your queries: things in quotes are being connected by dashes(or vice versa depending on which search is first, maybe). Anyway, who's posting English curse words in foreign language posts on English language blogs?
Seems like the search results might be influenced just a bit by things like google's algorithm, the structure of various software programs (an instance on an individual post, same instance on an archive page, same instance on a category pages, etc.) and other factors that make this sort of a pointless exercise without help from those people who know what they're talking about who could conduct a methodology slam (as I said).
Posted by: eb | Friday, 02 March 2007 at 04:53 PM
Let the record show that I had the chance to correct grammar and spelling in the last comment I posted and I chose not to do so.
I don't think safe search was on, but something odd is going on.
Posted by: eb | Friday, 02 March 2007 at 04:59 PM
You're right: Google's converting things between quotation marks into hyphenated words in a really random fashion. (I was going to make the same complaint about the English/non-English results, since it's not like someone typing along in Farsi's suddenly going to switch keyboards just to type "mother fucker," "motherfucker" or "mother-fucker." We need a methodology slam here, people, and by someone more capable, nay qualified than I.)
Posted by: Scott Eric Kaufman | Friday, 02 March 2007 at 05:01 PM
Actually, now I'm imagining exactly that. Off in Iran, there's some Farsi-speaking Bush-hater blithely typing in her DailyKos diary when, suddenly, it occurs to her to speak to this "Resident" Bush in a language he can understand. She reaches for her copy of 100 Dirty English Phrases (American Edition), borrows a keyboard from her English-speaking lover, and proceeds to insert "mother fucker" and "cock sucker" into her screed...
Posted by: Scott Eric Kaufman | Friday, 02 March 2007 at 05:06 PM
You're right: Google's converting things between quotation marks into hyphenated words in a really random fashion.
That's some fucked-up shit, right there.
(Or something...) Thanks for the attempt at replicating results.
Posted by: Andrew Haggerty | Friday, 02 March 2007 at 05:23 PM
No matter which "methodology" is used, it will still all add up to: Who the fuck cares?
This is just another instance of the Right throwing shit at the wall and the reactive Left running to clean it off.
The next time the Right tries something like this, the Left might try ignoring it.
Posted by: marc page | Friday, 02 March 2007 at 06:05 PM
But Marc, baseball doesn't officially start for another month, and I need to crunch some sort of numbers. Eventually it'll be, you know, Beltran's line-drive percentage or Glavine's first-pitch strikes, but in the meantime, I jones for numbers ... but lack the talent to do much more than count.
Posted by: Scott Eric Kaufman | Friday, 02 March 2007 at 06:08 PM
Well, alright then. Addictions I understand.
Posted by: marc page | Friday, 02 March 2007 at 06:34 PM
i also tried to re-create ishmael's experiment, with similarly differing results.
i found only 1/3 as many swear words for my blog skippy as ishmael found, and i found twice as many swear words for little green footballs as ishmael found.
so by my anecdotal calculations, ishmael over-counted liberals by a factor of 300%, and undercounted conservatives by a factor of 50%.
plus, my blog, while cute, is certainly not among the 18 "biggest liberal blogs," so i don't know why i was included (must have been my placing as a finalist for the 2005 weblog awards thanks for noticing).
and, ishmael conspicuoulsy left out protein wisdom starring jeff "slap your face with my cock" goldstein from the conservative side.
skewed? ishmael makes john lott look like tim lambert.
Posted by: skippy | Friday, 02 March 2007 at 08:44 PM
The bigger problem--aside from the fact that the results are of no interest whatsoever--is that the "methodology" is carefully rigged to compare top liberal sites which mostly have unmoderated comments to major conservative sites, which usually don't, hence egregiously rigging the comparison.
Posted by: Scott Lemieux | Saturday, 03 March 2007 at 12:35 AM
Clearly, the counts were being done under the auspices of Diebold Election Systems.
Posted by: Daniel Kim | Saturday, 03 March 2007 at 06:10 AM
Something is wrong with Google.
First, you have to be careful to reformat the search string so it doesn't misinterpret it. This means replacing "quoted phrases" with quoted-phrases, because otherwise, one of the two gets treated as required, while the other is not.
Secondly, there's the language issue. I checked not just the English-only results, but then compared it to restricting the search to each of the languages in turn. I found that there were only 10 posts on DailyKos that were not categorized by Google as English:
French: 2
German: 1
Japanese: 6
Spanish: 1
Looking at the actual posts, the French ones really were in French, and the ones flagged as Japanese did have at least one small passage in Japanese. But I couldn't find any German or Spanish in the ones flagged as those two languages (but they were long threads).
So, according to Google 9,850 English posts plus 10 non-English posts adds up to 154,000 posts in total.
So, it's quite clear that something is wrong with the Google algorithm.
However, using the DailyKos website's own search functionality suggests that the English-only Google results are the really correct ones. This is what I found:
Stories: 1,261
Diaries: 343
Stories
& Diaries: 28
Comments: 8,845
If you consider that what Google indexes is the archives, which puts the story and comments all on one page, you can see why the comments number is very close to the number returned by Google (8,845 vs. Google's 9,850). If you add in the diaries, you get 9,188, which is reasonably close to the Google number. Even if you ignore the duplication, if you just add Stories + Comments, you get 10,106, which is only 256 fewer than Google, which could simply be the difference between a search of the live site and Google's index, which is always going to be somewhat behind (by a day or two). Indeed, a quick check of the number of comments with the 7 dirty words in them posted in the last 12 hours comes up with 224, which is pretty close to the difference between the DailyKos search and Google.
So, it's quite clear that the ALL LANGUAGE Google results are completely bogus.
Now, someone who really cares needs to do this same kind of search for the right-wing sites.
And a real comparison would be to throw out all the comments so you can compare the left with the right (where most sites don't have comments).
Another interesting comparison would be to weight the 7 dirty words posts as a portion of posts on the website in general. This could only be done with websites that have good search functions, but given that there's much more posting activity on the big liberal blogs than on the right, you've got a much bigger pool in the first place.
But it won't be *me* who does all of this! I think that job is best left up to the people with something to prove, which is the hysterical right.
Posted by: David W. Fenton | Saturday, 03 March 2007 at 09:10 PM