Google’s public image disconnect: Smart engineers and dumb algorithms

Google’s search technologies struggle to identify original news stories.

Google looks clever and its people act intelligently, but its algorithms are not clever. Machine learning operates well, not language, when it comes to pictures. Google’s dirty little secret is its algorithms are pretty stupid and difficult to understand what they see and read.

Take Google’s recent example of how its search algorithm will be trained to highlight initial news stories such as scoops and research pieces…

“After weeks of reporting, a journalist breaks a story. Moments after it goes online, another media organization posts an imitative article recycling the scoop that often grabs as much web traffic as the original. Publishers have complained about this dynamic for years…”

This has been a problem since Google News launched in September 2002. Finally, the head of Google News, Richard Gingras, has responded.

“An important element of the coverage we want to provide is original reporting, an endeavor which requires significant time, effort, and resources by the publisher. Some stories can also be both critically important in the impact they can have on our world and difficult to put together, requiring reporters to engage in deep investigative pursuits to dig up facts and sources.”

Foremski’s Take:

Why did Google have to cope with that for over 17 years? Why does the Google algorithm require thousands of “raters” to assist them understand the initial news?

Gingras said Google has updated its handbook to define and classify more than 10,000 external contractors who operate as “raters.” This data will be used to create adjustments to the search algorithm by software engineers.

Many of these raters are outside the United States. Google would like them to know the creation of news stories and what makes one tale more original than another and to complete a wide internet form— with hundreds of quality content features outlined in a 168-page document. And only a few minutes are provided per assignment.

Gingras argues that initial news stories are hard to define, which is true if you are attempting to teach a machine. But anyone who looks at several news stories can quickly tell who broke the story and who doesn’t have fresh data.

This is an instance of the disconcerting reality that Google’s algorithms are not that intelligent and still insufficient despite centuries of machine learning.

Websites must mark their websites with unique tags telling Google how to index site content, what an ad is, what the major content is, what links are not to be followed, what is spam, etc. Moreover, there are horrible issues in Googlebot’s comprehension of content quality.

That is why Google Search requires assistance from thousands of raters for fundamental functions, such as the identification of initial news stories. Google will not allow raters to edit the search outcomes directly, because they act as publishers and Google would be legally liable for its content like a newspaper.

The search algorithm remains a slow student after more than two decades.

That implies that we are stuck with the poor algorithms of Google. And it implies ongoing democratic difficulties, as Google struggles to detect false news, hate speech and toxic content.

When individuals were responsible for posting news we never had this issue.

Leave a Reply