My Photo

Roll Call

« Tomorrow, I Teach | Main | Wilco, Atheism, then more Cerebus »

Friday, 02 March 2007

"Show Me Your Work," Filthy-Mouthed Liberals Politely Demand

So that thing about filthy-mouthed liberals?  Something seemed—I don't know—off about those numbers.  Quick exercise: I'm from the South.  I know what patriotic Americans sound like after church on Sundays when, for example, someone scores on 'dem Saints.  I also know many people in the service.  I've driven hours into the desert to hang out with them as they passed through Fort Irwin.  And you know what?  Those guys who used "fucking" as an adverb in high school?  They're still fucking awesome. 

So I re-ran Patrick Ishmael's experiment using his instructions:

How did I get this result? I searched Google using the following format and recorded the page results that were returned:

site:xyz.com "search term 1" OR "search term 2" OR "search term 3"...

Nine search terms total—the seven profanities as single words, and two of those as their own two-word variations. I then added the individual site results together and compared them.

Some of his results are, shall we say, a little misleading.  Just so you don't think I'm inventing this, I'm going to link to the very searches he claimed to do.  Here's his entry for Daily Kos:

Daily Kos 146,000

I'm not sure where that number came from, because when you do the search he claimed he had, these are the results:

Daily Kos 9,960

Don't believe me?  Click on the link.  Where did those other 136,000 or so instances come from?  I'm not sure.  But I think I've figured out why the Huffington Post came across so salty.  His number's on top; mine, complete with link to his search, on the bottom:

Huffington Post 109,000
Huffington Post 10,600

Throw in a little Google invariability, and what you have here is a simple transcription error.  Wonkette, on the other hand:

Wonkette 78,200
Wonkette 3,960

The only major error of this sort I found on his list of conservative sites was Ace of Spades,  who ain't quite as profane as Ishmael estimated:

Ace of Spades 9,730
Ace of Spades 7,480

Now,  I don't necessarily believe my numbers are correct.  But I do know that his numbers should be similar (considering the inevitable Google waggle) to mine, since I used his methodology.  But some of them aren't even close.  I wonder why that is?

[Those interested in witnessing counter-counter-methodology slams are encouraged to check the comments.  Also, I'm more than willing to be wrong about this, as Ishmael's numbers jibe with my intuitive sense on the matter.]

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c2df453ef00d83468738a69e2

Listed below are links to weblogs that reference "Show Me Your Work," Filthy-Mouthed Liberals Politely Demand:

» Smelling Salts Watch from Political Animal
SMELLING SALTS WATCH....First a confession: I'm a little weary of grown men and women pretending that they don't understand the power of profanity in written communication. I happen to be a fan of it myself, but I'm also mindful of... [Read More]

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Actually, you forgot "cock sucker" or "mother fucker." If you include those, you get 145,000 hits for DailyKos. If you're going to reconduct the search, you have to do it right.

Sorry, your error wasn't in the names. It was that you used "www." before dailykos, which is extraneous.

Nope. Here it is without the "www": 10,500.

Your searches come up with the "Search English pages" option marked. The numbers are much larger when you search the entire web. I'll leave the methodology slam to those who know what they're talking about.

Also, "cock sucker" and "mother fucker" are both in there. Scroll over the links, you'll see them. Of course, they're in parentheses, so we don't get every mention of the word "mother" popping up. (Presumably "mother" isn't a dirty word unless you're a liberal.)

Take it off English, eb, and you get 337,000, not 146,000, as Ishmael claims. So it remains strange.

I get 165,000 from the link in 337,000.

Do you have "safe search" turned on?

I get 337,000 from the link in 10,500. Google is reformulating your queries: things in quotes are being connected by dashes(or vice versa depending on which search is first, maybe). Anyway, who's posting English curse words in foreign language posts on English language blogs?

Seems like the search results might be influenced just a bit by things like google's algorithm, the structure of various software programs (an instance on an individual post, same instance on an archive page, same instance on a category pages, etc.) and other factors that make this sort of a pointless exercise without help from those people who know what they're talking about who could conduct a methodology slam (as I said).

Let the record show that I had the chance to correct grammar and spelling in the last comment I posted and I chose not to do so.

I don't think safe search was on, but something odd is going on.

You're right: Google's converting things between quotation marks into hyphenated words in a really random fashion. (I was going to make the same complaint about the English/non-English results, since it's not like someone typing along in Farsi's suddenly going to switch keyboards just to type "mother fucker," "motherfucker" or "mother-fucker." We need a methodology slam here, people, and by someone more capable, nay qualified than I.)

Actually, now I'm imagining exactly that. Off in Iran, there's some Farsi-speaking Bush-hater blithely typing in her DailyKos diary when, suddenly, it occurs to her to speak to this "Resident" Bush in a language he can understand. She reaches for her copy of 100 Dirty English Phrases (American Edition), borrows a keyboard from her English-speaking lover, and proceeds to insert "mother fucker" and "cock sucker" into her screed...

You're right: Google's converting things between quotation marks into hyphenated words in a really random fashion.

That's some fucked-up shit, right there.

(Or something...) Thanks for the attempt at replicating results.

No matter which "methodology" is used, it will still all add up to: Who the fuck cares?

This is just another instance of the Right throwing shit at the wall and the reactive Left running to clean it off.

The next time the Right tries something like this, the Left might try ignoring it.

But Marc, baseball doesn't officially start for another month, and I need to crunch some sort of numbers. Eventually it'll be, you know, Beltran's line-drive percentage or Glavine's first-pitch strikes, but in the meantime, I jones for numbers ... but lack the talent to do much more than count.

Well, alright then. Addictions I understand.

i also tried to re-create ishmael's experiment, with similarly differing results.

i found only 1/3 as many swear words for my blog skippy as ishmael found, and i found twice as many swear words for little green footballs as ishmael found.

so by my anecdotal calculations, ishmael over-counted liberals by a factor of 300%, and undercounted conservatives by a factor of 50%.

plus, my blog, while cute, is certainly not among the 18 "biggest liberal blogs," so i don't know why i was included (must have been my placing as a finalist for the 2005 weblog awards thanks for noticing).

and, ishmael conspicuoulsy left out protein wisdom starring jeff "slap your face with my cock" goldstein from the conservative side.

skewed? ishmael makes john lott look like tim lambert.

The bigger problem--aside from the fact that the results are of no interest whatsoever--is that the "methodology" is carefully rigged to compare top liberal sites which mostly have unmoderated comments to major conservative sites, which usually don't, hence egregiously rigging the comparison.

Clearly, the counts were being done under the auspices of Diebold Election Systems.

Something is wrong with Google.

First, you have to be careful to reformat the search string so it doesn't misinterpret it. This means replacing "quoted phrases" with quoted-phrases, because otherwise, one of the two gets treated as required, while the other is not.

Secondly, there's the language issue. I checked not just the English-only results, but then compared it to restricting the search to each of the languages in turn. I found that there were only 10 posts on DailyKos that were not categorized by Google as English:

French: 2
German: 1
Japanese: 6
Spanish: 1

Looking at the actual posts, the French ones really were in French, and the ones flagged as Japanese did have at least one small passage in Japanese. But I couldn't find any German or Spanish in the ones flagged as those two languages (but they were long threads).

So, according to Google 9,850 English posts plus 10 non-English posts adds up to 154,000 posts in total.

So, it's quite clear that something is wrong with the Google algorithm.

However, using the DailyKos website's own search functionality suggests that the English-only Google results are the really correct ones. This is what I found:

Stories: 1,261
Diaries: 343
Stories
& Diaries: 28
Comments: 8,845

If you consider that what Google indexes is the archives, which puts the story and comments all on one page, you can see why the comments number is very close to the number returned by Google (8,845 vs. Google's 9,850). If you add in the diaries, you get 9,188, which is reasonably close to the Google number. Even if you ignore the duplication, if you just add Stories + Comments, you get 10,106, which is only 256 fewer than Google, which could simply be the difference between a search of the live site and Google's index, which is always going to be somewhat behind (by a day or two). Indeed, a quick check of the number of comments with the 7 dirty words in them posted in the last 12 hours comes up with 224, which is pretty close to the difference between the DailyKos search and Google.

So, it's quite clear that the ALL LANGUAGE Google results are completely bogus.

Now, someone who really cares needs to do this same kind of search for the right-wing sites.

And a real comparison would be to throw out all the comments so you can compare the left with the right (where most sites don't have comments).

Another interesting comparison would be to weight the 7 dirty words posts as a portion of posts on the website in general. This could only be done with websites that have good search functions, but given that there's much more posting activity on the big liberal blogs than on the right, you've got a much bigger pool in the first place.

But it won't be *me* who does all of this! I think that job is best left up to the people with something to prove, which is the hysterical right.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment