
Restrictive Reddit's move: is blocking archive posts protecting users or censoring history?
A few days ago stumbled upon a statistic that Reddit is widely used as a source of information for AI answers.
Reddit answered: They wanna block access to the Internet Archive (kind of protection against scraping)
Is restricting archive access the right way to safeguard data, or does it risk losing valuable online history, which is also useful for journalists/media, etc?
(TBH, knowing what kind of discussions are on Reddit, I am surprised it is one of the dominant sources) :D

230 views
Replies
No, I don’t think so. At the end of the day, online discussions, on Twitter/Facebook/Reddit, reflect current conversations. When we go back to topics from 10–15 years ago, we often laugh at what we wrote. How wrong we were. Or how someone actually guessed something right. So for me, they have huge value.
If someone is old enough, they’ll remember that 20 years ago we used to communicate in forums. And I find it very funny when I come across a thread from that time in which I participated. I see nuances about events that I had forgotten. I see certain trends that could already be felt back then. And that is a treasure. A history. Which, to me, is quite unnecessary to lose.
After all, AI already has enough information. But for users, it’s a loss.
@byalexai That's kinda true. We sometimes overestimate what we contributed. :D
Actually, Reddit users contribute surprisingly helpful and well-thought-out opinions, sometimes just to gain karma before they start self-promotion, but it still benefits the community. In many niche subreddits, you can find in-depth discussions and expert insights that aren’t easily available elsewhere. That might be why Reddit ends up as one of the top sources for AI training, since it’s a goldmine of perspectives.
@banglinhpham Yep, they simply do not like the idea / fact that their rich source of information will be scrapped without any reward (money)...
Cool topic, Nika (as always 😁)!
To be honest, it would be cool if AI mentioned the source it used to generate its answers (which I think it already does?) so end users can evaluate them and decide whether they want to trust it. I'm not sure it's the right way to just delete/archive tons of data trying to safeguard the AI performance. We'll might end up archiving the entire internet at this rate 😅
@helga_impalpable I think the source is included. But tbh, it is less visible because they will show logo of one source and they will add "and other 15+ sources" – and then, you need to roll out but honestly, human beings are too lazy to take another step to observe (like 90% of population) :D
@busmark_w_nika Fair enough 😄 Anyway, I don't see blocking and archiving old data as a solution. I agree with Aleksandar that those old threads are a treasure trove of wisdom at the time - it could be really great to look through such staff and draw specific parallels/compare those approaches and ideas with today's trends, etc.
History must be respected :)
@helga_impalpable yep, this will put users (who do not rely on AI) also at a disadvantage. But also understand the point of Reddit, because they will have nothing from being scrapped. 😭
I hate the idea that one day, years of community contributions could just disappear.
@malka_parveen Reading some comments here, I am starting to sympathise with this.
I get the safety angle, but history matters too both the good and the bad.
@dinda_nancy Wow, this statement hit me. First, I was okay with Reddit's approach until reading this comment. Slowly changing perspective :)
Scade.pro
People really trust info published on Reddit, and as a redditor myself I consider it a great source.
About 80% of redditors don't use any other social media, which makes this platform unique.
I still see quite old threads with a huge traffic. So I'm really surprised by their decision.
@nastassia_k + I would say that Redditors have very formed opinions. At least in my country, it is widely used by a highly educated group of people, so I rely on it as the "wiser" social media.
Nika, you’re right I completely agree with your last words. There are so many discussions there that honestly don’t even have much meaning. And you know, every community has its own strict set of rules. I think not just for me but for everyone, it’s not that easy to follow those rules perfectly people are bound to make some mistakes.
@sania_khan10 Probably it is seen as the main source because of hard mod restrictions, so conversations are more filtered.
@busmark_w_nika Yes, that makes sense. The stricter moderation probably makes the content more reliable, but it can also limit open discussion.
@sania_khan10 Very thick ice between censorship and making quality content. 🙈🥶
@busmark_w_nika honestly surprised but I guess its because of Google's AI mode and Google's AI overview...if you remove them the data will change dramatically. I've personally got very few reddit threads on my Perplexity/ GPT searches.
@shashwat_ghosh_gtm Perplexity doesn't use Reddit as the source? (I am asking because I do not use that one)
I wouldn't hate it if this meant I didn't run into 8 year old threads at the top of my Google searches. E.g. asked Google today "why is mac's spellcheck so bad" (I use Google for my sarcastic searches / Claude for my real ones). First results were Reddit threads from 2018 about how to change spell check in MacOS Mojave... not helpful.
(Spell check fail of the day was indepentely -> indecently 🙄).
@ben_mulch I am so happy that someone sarcastic is in my social bubble :D
ZapDigits
Why would you feed data to AI for free then all the AI companies can charge users? I think Reddit is doing the right thing.
@malithmcrdev At least, there should be some % (revenue system) for them. But this already happened e.g. with books, blogs etc and now websites have low traffic.
ZapDigits
would be nice if that % goes to the contributors of the discussions. But it will be super complex
@malithmcrdev good point as well because at the end of the day, social media does the same thing as AI... exploiting someone else's ideas.