I got back to DC today after a week in Chicago. Thanks to all the folks who made my trip out (Mid)west worth it—it was my first time returning to the Windy City since moving in 2022, and you all make the city still feel like home.
My bandwidth has been limited, but I’ll have fresh observations from around the creator world next Sunday. For now, as the end of the year approaches, I’ve been thinking more about the next stage of AI.
I dug into the archives and listened back to an interview I conducted several months ago with Proof News journalist Annie Gilbertson. She’s done some really incredible work uncovering how exactly the biggest tech companies in the world have been scraping creator content to train their AI models. Therefore, I thought this would be the perfect opportunity to both highlight Annie’s work and share my first pod on this blog.
More on that below. If you have a chance to listen, let me know what you think—I plan on sharing more interviews in this space moving forward.
— NGL
P.S. Last week, I wrote about swimming in circles and the necessity of telling “the story behind the story.” If you missed it, check it out here.
It feels like we’re past the endless AI hype cycle of 2023 and the race to scale in 2024. The average working professional I talk to is either actively figuring out what tools and best practices are actually useful on the day-to-day—or tuning out AI outright. Tech companies, meanwhile, have realized they can avoid copyright-related PR headaches (at best) or lawsuits (at worst) by pulling from their overflowing pool of investor cash and cutting big checks to individual publishers.
All of this is to say that conversations around the ethical concerns of AI proliferation have been put on the backburner. It may be a culturally induced feeling of inevitability; it’s probably more likely that the lobbyists here on K Street are simply doing their jobs effectively.
Which brings me back to my interview with Annie. For context: Though I haven’t shared this full conversation previously, we originally chatted for a story that appeared in the Publish Press (you can read it here).
Here’s what I wrote then about her reporting:
Videos from MrBeast, Marques Brownlee, and 48,000 other YouTube channels are part of a dataset that several large tech companies (including Apple and Nvidia) used to train their artificial intelligence models—without the original creators’ permission, nonprofit news studio Proof News reported yesterday.
The dataset, which does not include images or audio, is a collection of subtitles and transcripts from over 173,000 YouTube videos. It was created by a “non-profit AI research lab” called EleutherAI.
Conversations around AI can get confusing. But why I think this one is evergreen is because it’s about so much more than who’s getting screwed by who, as billion-dollar companies race to make the best tool for your college-aged sibling to plagiarize on their final paper.
It’s about the very fabric of what we consider “private” and “public” life—and who gets to claim control over our data and likeness. As Annie puts it, “the table stakes of the Internet have changed.”
You can watch or listen to the interview by clicking the embedded video above, as well as find it on powderblue.substack.com.*
Thanks for reading! Shoot me a reply, comment, or DM if anything resonated with you in particular—I respond to them all.
* If you’re watching the interview, you might notice a black screen from 2:36-2:44. Apologies for the rendering error here!