The Digital Media Machine: Aggregators

Showing posts with label Aggregators. Show all posts

Saturday, March 9, 2013

Bad data, and a pool problem

We put so much trust into online databases. Oftentimes, the trust is misplaced. I learned this the hard way when I used some expiring American Airlines miles to book a vacation in New York. I trusted the online database used by aavacations.com to display accurate data about the New York Marriott Downtown. Instead, I found out that some crucial information about the hotel (see the circled part below) was incorrect.

I contacted the hotel and American Airlines, but aside from cancelling the reservation, nothing could be done. My family was out of luck.

In the grand scheme of things, whether or not a hotel database displays accurate information about certain hotel in a certain city matters little. But when you consider that hundreds of millions of people make decisions based on what they see in online databases every day, the scale of the problem becomes apparent. Even if only 10% of listings in a travel or ecommerce website have incorrect data, that represents a lot of frustration, misspent dollars, and lost business.

Monday, January 23, 2012

Why Ancestry.com is not enough

"The real killer for me was that every search gave me zillions of irrelevant hits. That, together with a paucity of Scottish data, means that it is never going to be worth my while trying to use it."

This quote comes from an email sent to me by Ray Hennessy, an experienced genealogist in Scotland who maintains the What's In A Name website. He made this comment after reading my Ancestry.com review, and discussing with me some of the difficulties I have encountered building my own Lamont family tree (I am using his comment with permission).

When I heard this, I was not at all surprised. I've been working on my family tree off and on for the past 15 years, and am very familiar with the limitations of using databases (online and offline) to build out a family tree.

Lamont is a Scottish surname, and we confirmed the Scottish connection through family sources and public documents, including obituaries and birth records. I began using online sources for research in the 1990s, and tried Ancestry.com for the first time in 2008. I revisited Ancestry again late last year after receiving a trial offer that urged me to explore a new batch of U.S. military records that they had posted. The online military records were a huge waste of time. Despite easy access, I've had much more productive sessions in phone or face-to-face interviews with relatives, examining letters and photographs from family members, and visiting local government offices.

Ancestry.com and other online databases such as Rootsweb have been of limited use. The "zillions of irrelevant hits" problem is only part of the story. Other problems include an emphasis on censuses and other lists (ship registers, military rosters, etc.) as opposed to photographs, local histories, and trend data relating to health/immigration/economy. The databases also fail to provide mechanisms to connect or correlate names, despite statistical and algorithm-driven methods to piece together likely connections. Sometimes, family trees submitted by other people help connect the dots, but often the trees are poorly documented or contain significant errors.

But there's another issue at work here, too: A database is only as good as the data that's put into it. If you can exclude the irrelevant hits, an 1870 census record might be a starting point, but the best stuff is to be found in church registers, county historians' offices and the humble town clerk -- sources which are almost never digitized or shared online.

Image: Portion of a handwritten family tree kept on file in the Clinton County, NY, Historian's Office.

Thursday, July 14, 2011

Google News Badges: Mixed messages for users, potential threat to MSM

Spotted on Hacker News this evening, a link to a new Google feature: Google News Badges. The following video explains how it works:

My reaction on Hacker News:

It's being marketed as something that helps people track articles and find new content, share content with friends, and spark conversations, but it's also framed in terms of earning/leveling up. I think there is a bit of disconnect there. People who appreciate the game-like elements may not get so much value from the discovery/social features (and vice versa).

If the badges result in significantly higher usage of Google News, some large news organizations will feel threatened. Making an aggregator more attractive means fewer people starting their search for news on CNN, BBC, Fox, nytimes.com and large local news sites like Boston.com. On the other hand, smaller/niche publishers that are featured on Google News will welcome more traffic.

I've been using Google News for many years (see "Google News: Biased or broken?" and "Xinhua finally becoming a "world news agency"?") so I might derive some value from the badges, but I think to make new users it will be a confusing distraction. There are already enough pictures, links and boxes; dropping cute chicklets on the page isn't going to make it easier for new users to quickly grok what's going on.

In addition, people come to Google News and other "headline aggregators" like Techmeme to find content and leave. Encouraging them to stay on the page and start conversations goes against the spirit of quick content discovery.

Some of my other posts about the online news industry:

Monday, May 24, 2010

Associated Content: Buyer beware!

So Yahoo is buying Associated Content. The price tag? Rumors say it's between $90-$100 million.

A strategy based on low-quality, commodity content is bad for any brand, but because this product is tied so closely to SEO, it is really putting itself at the mercy of Google -- Yahoo's classic foe in the search arena. A fundamental change in Google's search algorithm or SERP design could really hurt Associated Content.

Saturday, April 24, 2010

Google News: Still flawed

Spotted a few minutes ago on Google News, in the "Spotlight" section:

Pathetic, isn't it? I'm not just talking about low-quality "birther" nonsense showing up on the same page as legitimate news and analysis aggregated from sources all over the world. I'm also referring to the fact that the algorithm(s) Google uses to pick and highlight "news" for inclusion on its popular aggregator has been flawed for years, and the company is apparently unwilling or unable to fix it.

In 2005, I documented how Xinhua, China's state-run news and propaganda agency, was one of the main sources for Google News. Last year, a made-up Onion story was listed in the top block of Google News between real breaking news about swine flu vaccines and Canadian politics. Now it's bottom-feeding political muckraking from a conspiracy theorist. What's next?

Is Gave Rivera's human-assisted news aggregator, Techmeme, a superior model? I don't like the fact that stories and blog posts about Apple tend to stay up at the top of the list for long periods of time, but I never see wing-nut bloggers (believe it or not, they exist in the tech news world, too) showing up, either.

Some of my other posts about Google News, Techmeme, and algorithmic editors:

Wednesday, December 9, 2009

Reuters' syndication network

A senior Reuters executive made some comments about the future of news to an FTC workshop recently, which were published on Reuters.com yesterday. It was interesting to see the company downplay the role of aggregators -- one of Rupert Murdoch's favorite bugaboos -- but I was far more intrigued by the outline of a "new network of syndication" that Reuters is proposing to create new revenue streams for content creators and eliminate redundancy.

Sounds like a super solution to save journalism and find workable business models for news, right? Well, I thought it through and came back with the following comment:

To “stop wasting resources on writing the umpteenth undifferentiated story that is available elsewhere” sounds great in theory, but there are a few formidable issues to realizing that vision:

1) Audiences overlap, and the same story may have to be tailored in minor ways to appeal to different audiences, based on local issues, the “tone” or expertise of the publication, and other factors. Two stories that may appear “undifferentiated” to you actually have different angles, emphasis, or additional facts that actually make them more suitable or the audiences they are aimed at. Publications want to be differentiated in some way, and using the same outsourced copy does not help them achieve that goal.

2) There needs to be a system of trust and baseline quality in place, but also great flexibility considering the types of content providers and multitude of publications using it.

3) Making Reuters and a few other specialist players the powerbrokers will lead to news oligopolies — kind of like we had before the advent of the Internet, execept on global scale. That doesn’t sound like progress to me.

Ian Lamont

Managing Editor

The Industry Standard

More posts by Ian Lamont on the future of media:

Sunday, November 22, 2009

The journalism crisis: Short-term technoligical hope, long-term business uncertainty

It's a tough time to be a journalist, but at the same time I see some reasons to be optimistic. I thought I would share some of my thoughts about what's going on, and offer some insights into where the news industry is headed.

First, the state of the industry today:

A seemingly never-ending cascade of layoffs and restructuring, especially at older publishers which used to enjoy monopoly or oligopoly status.
An inability on the part of publishers and advertisers to find an online business model that works. News Corp. CEO Rupert Murdoch has some bizarre concepts about the way the online news ecosystem should operate, but when it comes to online advertising he is right on target: "There is an almost infinite increase in inventory for websites and for display," he noted earlier this year. This results in a great deal of "downward pressure" on CPMs.
A broken system of journalism education: Too many students with unrealistic expectations, too many programs that are preparing too many people for careers that don't exist (this fall, 49 percent of students in the Columbia journalism masters program are in the print track), and too many teachers with impeccable 20th-century credentials but little online media experience.
Widespread unwillingness in sales and editorial departments to let go of old ways of doing things and experiment with/expand new initiatives.

So who will save journalism? I'm encouraged by inroads made by blog-based news sites and new news technologies. TechCrunch's "Scamville" exposé of Facebook gaming ripoffs put the New York Times' lapdog coverage of the industry to shame. I regularly use Techmeme and Twitter to keep abreast of what's going on in the world, and to monitor interesting online discussions.

But these services can't replace quality reporting and investigations. It's simply too expensive for current online business models to support. In the short to medium term, I see an opportunity for local and national television news websites to pick some of the slack -- they have far more robust advertising-based revenue streams than newspapers and magazines, and some broadcast outlets (at least in Boston) have proven to be avid users of new technology and shown a willingness to experiment with ways in which online and video content can be integrated. They are taking some important steps toward reinventing themselves online, and that's a good thing.

Long-term, however, broadcasters' rich revenue model will fade, as advertisers move away from expensive 30-second video clips and demand more interactivity and engagement from their campaigns. What will replace it? That's the billion-dollar question that publishers are trying to figure out. Until they do, many sectors of the news industry will continue to downsize and disintegrate, as fast-moving broadcasters and technology-driven upstarts pick up some of the pieces.

Sources and research: Twitter, WFXT Facebook page and website, WBZ website, New York Times website, TechCrunch, Fake Steve Jobs blog, IAB website, Poynter Online

Sunday, November 15, 2009

Google News Bias

Google News is the gold standard when it comes to algorithm-driven news aggregators. It's updated every few minutes, tracks breaking news across multiple topic areas, and is reasonably well integrated with Google's regular search engine.

It's also seriously biased ... or broken. I usually check it several times per day, and imagine my surprise when I saw this (click to see larger version of screenshot):

A made-up Onion story from last month, listed between real breaking news about swine flu vaccines and Canadian politics?

And a Wikipedia article, listed prominently as the background source for the Jaycee Lee Duggard kidnapping? I thought the Ivy League whiz kids at Google realized Wikipedia is a notoriously unreliable source for information about famous people and many other topics, and is frequently manipulated by spin doctors, SEO consultants, and vandals. I mean, that's one of the reasons Google created Knol, right?

But there's another problem with the results, one that's been nagging me for the past few months. Every single one of the top headlines for each of the topics comes from an old-school, traditional mainstream media organization -- 19th- and 20th-century print or broadcast news outfits, most backed by giant media conglomerates or billionaire founders. CNN, Reuters and the New York Times Co. don't have a monopoly on information or opinions, but with this kind of help from Google, they continue to preserve their dominant market positions in the new online playing field.

It's disappointing, but not entirely unexpected. Google News has had problems with its algorithm for years. For instance, unreliable foreign news sources used to be featured prominently on the site. Part of this related to an apparent desire to diversify sources of information to include non-Anglo-American media outlets (see my 2005 post about Xinhua's unlikely inclusion as a primary news source) as well as some foreign publications taking advantage of or manipulating Google News to get more traffic to their sites. Google has since tweaked the algorithm, but I fear it's been too far in the opposite direction. The same, tired old voices that one sees on newsstands and TV screens -- as well as a few quirky sources that should not be considered sources of "news" -- now take center stage on Google's automated news platform.

Some of my other posts about Google News, Techmeme, and algorithmic editors:

Pages