AI & Investigative Journalism: The New York Times Case Study

by Archynetys World Desk

“`html

AI Powers Investigative Reporting: The New York Times‘ Toolkit

The New York Times is leveraging artificial intelligence to enhance investigative journalism, developing tools that enable reporters to analyze vast datasets and uncover hidden connections.

The New York Times (NYT) is at the forefront of integrating artificial intelligence (AI) into its newsroom operations. zach Seward, the editorial director of artificial intelligence initiatives, leads a multidisciplinary team called AI Issues, which focuses on applying AI tools across various aspects of news production. According to Seward,AI is being used not only for internal workflows but also for exploring how news might be consumed in the future.

At the recent WAN-IFRA Congress in Krakow,Seward shared insights into how the NYT’s AI applications have evolved from experimental to institutionalized,notably in the realm of investigative reporting. The AI Issues team identified repeatable patterns in how journalists tackle the challenge of sifting through massive amounts of data,leading to the development of an internal toolkit to support their work.

Seward emphasized the value of Large Language Models (LLMs) in analyzing extensive document and video collections that would be impossible for humans to review comprehensively. “We now, as journalists, have a capability to search through data sets in ways that previously were not possible and realy give our journalists a whole new superpower,” he stated.

The AI Toolkit for Investigations

The AI Toolkit for Investigations is built upon four repeatable patterns that emerged from the NYT’s experience using AI in investigations:

  1. Bias-based search
  2. Diving for pearls
  3. Augmenting datasets
  4. End-to-end verification

vibes-Based Search: Uncovering Semantic Connections

Vibes-based search, also known as semantic or vector search, employs vector embeddings to identify semantically similar content beyond exact keyword matches. This enables journalists to uncover connections and patterns that might be missed by customary search methods.Seward noted that this approach is particularly valuable for identifying variations in terminology.

“It’s not as simple as simply looking up one specific word in a data set… we would have found maybe one, two or three examples… by using semantic search, we were able to find a much, much wider swath of examples,” Seward explained.

The Math Behind Semantic Search

Semantic search functions by encoding text as numerical vectors in multi-dimensional space:

  • Text is converted into numerical representations (embeddings).
  • Similar concepts cluster together in this mathematical space.
  • Distance calculations reveal semantic relationships between terms.
  • This enables “equations with text” – for example: [king] – [man] + [woman] ≈ [queen]

Seward elaborated, “What they’re doing, in essence, is encoding text or othre types of media as huge arrays of numbers. And as you are creating numbers, you can start to do math with them.”

diving for Pearls: Extracting Insights from Overwhelming content

The “diving for Pearls” tool applies AI to extract insights from vast amounts of content by leveraging journalist expertise to guide the AI through carefully crafted prompts. It structures findings in spreadsheets organized by topics of interest.Such as, the NYT used this tool to analyze over 500 hours of video leaked from an election interference group.

“The first step that the AI helped with was just transcribing the videos into text, which was still 5 million spoken words – far more than we could deal with in the time allotted. But crucially, we didn’t just have the source material; we had two reporters who’ve been covering democracy and threats to elections in the US for collectively more than 18 years. So that proved to be a powerful combination,” Seward said.

Seward added, “The problem that our reporters typically have when they call in members of my team is… too much data. They’re sitting on tens of thousands of documents or hundreds of hours of video that are truly impossible for any journalist to go through themselves.”

Augmenting Datasets: Enhancing Analysis with AI

The NYT utilizes optical character recognition (OCR) to analyze complex document sets, including handwritten notes.

“The newest foundational models from all of the major LLM developers have really taken OCR to the next level,and it’s now possible to do all sorts of really complex and messy analysis on all sorts of messy data sets,” Seward explained.

The NYT has also developed tools for monitoring “manosphere” content creators and generating daily summaries, and also screening

Related Posts

Leave a Comment