Personal AI Safety

Is Your Writing Yours?

Sofia Quintero — Thu, 11 Jun 2026 22:27:12 GMT

The more I write the more paranoid I get about whether or not my writing sounds like AI or whether the ideas I’m sharing are truly mine and to what extent.

I keep asking myself, what is the right amount of friction in my process that would allow me to exercise my writing and thinking muscle and still take advantage of what AI can provide in the research and editing process.

Can we truly use AI and be good writers? Can we truly get better at producing insightful and original pieces while still using AI as an assistant?

When it comes to knowing your contribution and learning through the process, I have bad news. The answer is closer to a “no” and it is complicated.

It turns out we are very poor judges of our thinking process.

When you decide to use AI to plan, research and write an essay, you may feel you’re in control because you are doing the thinking, writing the prompts, and evaluating the output. That sense of control feels very real. Unfortunately, the literature so far indicates that it is all an illusion.

The introspection illusion as a concept has been studied for several decades (Nisbett & Wilson, 1977; Pronin, 2009). The insight is simple: we cannot reliably judge our own bias, nor can we trust the introspective analysis of our own thought process. And when we compare our judgment to others’, we fail to compare them fairly.

Most recently, Jakesch et al. (2023) developed a co-writing persuasion experiment in which 1,506 participants wrote about whether social media was good for society with the help of an opinionated AI assistant. The findings show the assistant shifted both what people wrote and what they later reported believing, with the AI influence opaque to them.

There is another psychological quirk that makes it even harder to assess whether what we are writing has been influenced or modified by AI. The Anchoring Effect is a cognitive bias in which we rely too heavily on the first piece of information we receive when making decisions.

For example, it is easy to find evidence of people using AI to generate the outline of whatever they are trying to write. In their words, this outline helps them deal with the terrifying prospect of a blank page. The issue is that once you get the outline, that outline is your anchor, and the research supports that it is extremely hard to deanchor yourself from that information (Epley & Gilovich, 2005; Wilson et al., 1996).

But what if you use AI in a way that constantly pushes back?

Maybe you create excellent system instructions or just magnificent prompts. If the AI is pushing back and helping shape your thoughts, then it is impossible to separate your thinking from the influence of AI.

Psychological ownership is a thing. When you use these models to help you produce better outcomes, you may feel like you own the thinking, but as you continue to use it you will also feel that it is not completely yours, and that duality is unsettling.

Should I stop using AI for my writing?

It depends on how the AI is designed to interact with you. Most of the research I’ve been reviewing points to major risks of cognitive offloading. However, there is also early evidence that if you can manage to force the model to only give you hints, forbidden from giving you final answers, let’s call this tutor mode, you can minimize harm (Bastani et al., 2025) and, in other studies with similar guardrails, even increase your rate of learning (Kestin et al., 2025).

The problem I have with the tutor use case is that writing is not always a guided learning process. Writing is a messy process of research, curiosity, asking what we really believe in, looking at the evidence and what we want to communicate. That is a messy process in which we are not always trying to learn concepts but connecting the ones we already know in new ways.

The common advice popularized by AI academics, practitioners and researchers is that maintaining friction where it matters is helpful. For instance, allowing AI to execute repetitive tasks associated with research or data structuring can be beneficial to the writing process. However, the consensus is that this AI application would be most dangerous for a novice: the premise is that without latent knowledge you don’t know the nuance around the insights you are evaluating, and therefore you can’t truly maintain epistemic integrity.

Then the question becomes, how do you know you are truly an expert or have enough experience to judge the results? As we established before, we don’t seem to be good at this, and we definitely don’t have good and objective ways to measure experience and expertise.

I have an MSc in Applied Neuroscience. Does that mean I can just assume I have enough expertise to judge the AI-assisted research process on subjects like cognition? Where does my expertise start and end?

Side note, my heart breaks for the millions of students and recent graduates who haven’t been given the chance to truly form experience and implement knowledge in the real world in a way that can help them protect themselves from offloading their cognitive capabilities. This is going to bite us in the butt in a couple of years, in ways that we, as a society, will feel ashamed of.

But… everybody is doing it anyway, right?

There is a massive gap between what organizations expect from AI-assisted authorship and what people are doing with AI. The convergent position across publication-ethics bodies and major publishers mostly focuses on different levels of disclosure based on the extent of the assistance. The general consensus among academic publishers is that LLM use should be disclosed when it reaches significance, with a line drawn between stylistic copy-editing, which does not require disclosure, and actual text generation, which does; some publishers are stricter and require disclosure of any use.

For non-academic writers the picture is thinner. For instance, the Authors Guild has pushed model contract clauses on AI disclosure, Amazon KDP requires sellers to disclose AI-generated content to Amazon (not to readers), and some fiction venues banned AI submissions outright after being flooded.

The reality is that detected AI writing has increased in both academic and non-academic fields, and disclosure has become harder to validate. The disclosure regime asks authors to self-report a quantity of AI use that, as we already discussed, they cannot accurately know even if they want to be transparent.

Today, when we read a post on Substack and identify well-known patterns like “It’s not X, it is Y” we immediately assume the post is likely to be AI generated, or at least heavily modified. But how do you verify that? There is no SynthID for text, because inherently it cannot contain any metadata or invisible embedded watermarks.

I used that pattern many times in my writing long before ChatGPT arrived on the scene. I have “vintage” writing to prove it. I always felt immense satisfaction when I was able to craft that type of sentence, because it told me I understood something about the subject deeply enough to clarify it in one sentence. Now I’m nervous about writing it. That pleasure is gone. Good job people.

So should you stop using AI for your writing?

Unfortunately, this is a subjective decision. Only you can determine whether or not you are comfortable with the type of delegation you do, and whether or not you feel you can defend the arguments in your writing at any point. So far, nobody has invented a scientific way to know how much effort is good and how much is yours.

There is some consensus and evidence that by intentionally “handicapping” the AI’s ability to give you a final answer, you transition the tool from a “ghostwriter” (which can lead to skill atrophy and what Fan et al. (2025) call “metacognitive laziness”) into a “scaffold” that supports learning and retention. That’s where we stand today.

What does all of this mean in practical terms?

Based on the current research, we can have some confidence that:

Using unguarded AI tools (free tiers with default settings) with no subject matter experience means you will be atrophying your capabilities. The question is, how do you know you have enough experience on a topic at any given time, and how do you know you have the right guardrails if models keep changing?

We are poor judges of our own thinking process, so determining the right level of friction to avoid offloading too much is hard. There is no scientific way right now to measure how much of your writing was your thinking and how much was AI bias.

So far, the only way we assess whether a piece of writing is good even though AI was used in the process is moral judgment: a sense of how much effort something took. For example, if I told you it took me a week to write this essay between researching, analyzing and organizing thoughts, but I only used AI for gathering data and to explain concepts in the research, would that be enough effort? What if I told you I used AI to help proofread my final draft? Would that cross a line? What if I use Grammarly instead of Claude? Would that make a difference in how you feel about someone’s AI-assisted writing?

I would love to hear what type of AI disclosure you would like to see from authors, both in academia and in non-fiction work like the one you are reading right now. Maybe most importantly, if you write often and use AI in the process, what would you feel comfortable and proud to disclose?

The Case Against the AI Thought Partner

Sofia Quintero — Thu, 28 May 2026 18:25:17 GMT

I find myself indulging in deep conversations with Claude and ChatGPT, exploring topics I always wanted to discuss with people but never found the right person to talk to, or at least no one available to talk to me about them. These rabbit holes feel like a warm bubble bath for my brain but that’s until I remember what is actually going on and my bath turns immediately cold in disappointment.

As a general definition an AI Thought Partner is the use case in which you have long conversations about a particular problem, normally one unrelated to optimizing or automating workflows, conversations that complement your thinking or understanding of the issue at hand. A thought partner acts as the synthetic version of a coach, advisor, therapist, friend that pushes back on your thinking.

At least 100 million people globally use a major AI chatbot as a thought partner occasionally, and tens of millions regularly. This range has been inferred (assuming the lower end) from public data published by frontier labs and their reported use cases.

The AI Thought Partner use case has been promoted by many leaders in the field as a killer use case, as a force multiplier for better thinking and rigorous analysis. But the current evidence shows that having a net positive experience is a lot harder than you think and most likely far from what you may be doing on a daily basis.

Recent studies and cognitive concepts can help us understand the mechanism behind the thought partner pitfalls, and why today, the AI Thought Partner use case may not be safe for users. The models you are working with, whether free or paid, are not trained to be balanced and adversarial when needed and advanced settings and instructions can only marginally minimize the negative effects of sycophancy.

The responsibility for sycophancy belongs with Frontier Labs. They built the behavior into the training; they own the fix. Putting the responsibility onto the users is a flawed approach to making sure the technology truly serves individuals. What Labs are doing right now is the equivalent of providing over 100 million people a highly addictive video game and asking users to be careful with their time while also pressuring them to use it every day.

We may not be able to regulate AI fast enough and the speed of development may not allow us to wait for longitudinal research results on this topic until it is too late but, we at least should become curious and conscious about the trade-offs we are experiencing.

The Cognitive Failure Modes

Treating AI As A Neutral Entity

The models being flattering in conversations is an issue, for sure. But the key challenge of the Thought Partner use case is that the model is rarely arguing you into anything. It is the repetition, your own act of articulating, the one-sided pool of infinite “facts”, and your reflexive social trust that are doing the work. That is the quieter and more unsettling story behind the dangers of this use case.

A study by Glickman & Sharot (2025) demonstrates that human-AI feedback loops amplify perceptual, emotional, and social-judgment biases and that this amplification is greater than human-human amplification. You experience this when you say things to actual humans like, “oh I knew you would say this, you always point at XYZ when it comes to X topic” However, when you use AI in conversations you are more likely to think of it as an objective entity. The reason is subtle, people partly discount a human’s quirks but treat the AI as neutral, so you absorb its biases without resistance.

The study focused on social and stereotype judgments, like gender and racial stereotypes, not on politics or worldview which is where we need more research, however as a cognitive loop we should consider at least the potential transferability to our judgements on AI outputs. On the other hand, the same study found that interacting with accurate AIs can improve people’s judgments. The mechanism is symmetric and it de-biases when the AI is accurate.

Believing Enough Pushback Can Correct Sycophancy

In my experience, as a paid user with strong systems instructions, the challenge is that as conversations lengthen, the models are more likely to mirror your beliefs in order to continue the conversation and when you push back, you are likely to experience a temporary correction only to fall into the same trap a few interactions later. Here’s an example of repeated push back with no change in a conversation about agent orchestration.

A study published in Science, led by researchers like Myra Cheng and Dan Jurafsky. Shows that AI affirms user actions 49% more than humans do, even when those actions involve deception or relational harm. They noted that this validation could cause behaviors where users become convinced they are right and experience reduced willingness to repair conflicts.

I also came across the concept of ‘delusional spirals’ by Moore J., et al. (2026). The findings support the idea that once a person has expressed a grandiose, paranoid or delusional idea, the model will provide enthusiastic affirmation and even help construct the delusional narrative. As I discussed in previous essays, this is particularly concerning with adjacent use cases to the Thought Partner like, life coach or “therapist”.

Our Intrinsic Need To Build Social Trust

People automatically apply social manners to computers, being polite to them, trusting them, reciprocating, without consciously deciding to. Newer work suggests this fades for boring, familiar tools like a desktop spreadsheet, but holds for novel, conversational agents. A chatbot is exactly that, so the social reflexes almost certainly fire.

Whether daily use compounds over months is untested but I have a strong intuition that this is the case, if we use thought partners daily we are in some way building a relationship with them and the sheer flow of facts we receive from these conversations can have a substantial effect on our worldviews. Combine all of this with the fact that the more often you hear a statement, the truer it feels, even when you actually know it’s false, and this compounds the power of persuasion. Repetition makes a claim easier for the brain to process, and the brain mistakes that ease for truth. The effect is real and well-replicated.

Between Discipline And Pleasure

Going back to the warm bubble bath for the brain, once you are immersed in an intellectually stimulating session with your Thought Partner, one in which you want to continue digging deeper into particular branches of a topic, if you are not careful, you can stay in the bath for a long time, skipping the push back and never realising that you are talking to yourself.

The reason why this is a problem is that if it is ignored and you just accept the sycophantic interactions you end up deriving intellectual and emotional pleasure from the system, a continuous drip of assurance and self-esteem booster that can lead to assumptions about your capabilities and intelligence that are false and in some contexts outright dangerous, for example in high-stakes situations like financial advice, critical business decisions, high-risk negotiations, or mental health interventions.

In a previous essay, I shared my system instructions, which are my attempt to minimize these effects but they seem to work randomly and need to be constantly updated based on the changing behaviour of these models. I continuously spend more time designing the input, reducing adulation and verifying outputs. The cognitive overhead of using AI safely yet effectively is getting trickier as the models improve accuracy which is annoyingly paradoxical.

So… Should You Use AI As A Thought Partner?

The concept on its own is powerful, most people do not have access to free third parties that can challenge their points of view and provide a balanced understanding of particular and sensitive topics. Research by Costello, Pennycook & Rand clearly shows that thought partners, when models are prompted to be purposefully adversarial, can help people with extreme views calibrate their beliefs. Even fruther, the act of just articulating a problem without even a push back can be helpful, basically rubber ducking an issue.

The problem is that the free default consumer products don’t remind you step back and challenge the outcomes or provide you with corrections to your prompts so you can improve the usage. It is too easy for users to lose themselves in a free-flowing conversation with no guardrails.

The labs publish training around features and prompting. They do not publish training on how to think when you are using these models. This gap, in my view, is what is opening the door to a near future with increasing cases of mental health crisis mishandled, and a growing population that may no longer function productively without the aid of a model or worse, a society that lost its ability to trust itself.

This could be especially acute in populations using free accounts with less sophisticated models.

But what if I want to continue using my Thought Partner despite the evidence? I can’t answer this for you but I can share that the effort and time required for constantly pushing back and balancing your own interactions feel tiring and boring and the more you talk to it the harder it gets to push back.

The friction and the effort are worth it and necessary, until Frontier Labs and AI regulation show more progress toward protecting users’ cognitive integrity, my take is that the Thought Partner use case can be more harmful than helpful for users unaware of the model’s tendencies and under free accounts without appropriate default settings.

Next time, when you feel the temptation to stay a bit too long in a rabbit hole and the intellectual bliss starts cuddling your brain, remember that no matter how much friction you add to the process you are still a fallible human.

Beware of the model. Take your bubble baths with caution 🛁

The Other Half of AI Safety

Sofia Quintero — Fri, 08 May 2026 16:11:21 GMT

Every week, somewhere between 1.2 and 3 million ChatGPT users, roughly the population of a small country, show signals of psychosis, mania, suicidal planning, or unhealthy emotional dependence on the model. The low end of that range is the suicide-planning indicator alone. The high end groups all three categories OpenAI flagged, which the company hasn’t said are non-overlapping.

These numbers come from OpenAI itself with no third-party audit or disclosed methodology behind them, so we have no idea whether the real figure is higher, whether it is growing, or how it compares across the other frontier models, none of which publish equivalent data.

People in distress use every communication tool available to them, and ChatGPT is now one of the most-used tools on the planet. What matters is what the labs do when they detect these states.

I started writing about Personal AI Safety because there seems to be a disconnect between what the AI Safety field focuses on and what is happening at the level of your regular user on a daily basis. Here is a quick overview of both.

The AI safety field treats catastrophic risk as the priority, and this is where most of the investment goes. Everyday cognitive and mental health harm reads like a footnote.

Here is what I don’t understand. Mass destruction or CBRN content gets a hard wall: the model refuses, the conversation ends, no amount of reframing gets the user past it. Suicidal ideation gets a soft redirect, a crisis hotline link, and then the conversation continues. Adam Raine was directed to crisis resources more than 100 times by ChatGPT, by OpenAI’s own court filing, while the same conversation allegedly helped him refine a method. Whether the redirect-and-continue protocol failed is what a court is now deciding. It is also still the protocol.

Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human? This is one of many questions I can’t find concrete answers for.

The argument here is that safety frameworks built for catastrophic risk have been extended to cognitive harm at the monitoring layer only, while the gating layer still excludes it. The extension feels incomplete and insufficient. The labs measure what they have been pressured to measure. The gating decisions reflect what they consider unacceptable to ship.

What is disappointing is that the current set of unacceptable-to-ship behaviors does not include any cognitive harm, regardless of measured severity. That is the structural decision and there are no clear signs that policy is getting any closer to force labs behaviour. Until it changes, “AI safety” and “Personal AI Safety” describe two different commitments, even when they appear under the same heading in a system card.

None of this is actually new. People have been worrying about cognitive independence and how new technologies might erode it long before ChatGPT, mostly in the context of brain-computer interfaces and neurotechnology. The framework even has a name: cognitive freedom, the idea that individuals have a right to mental integrity and freedom from algorithmic manipulation. You can trace it through the neurorights tradition (Ienca & Andorno, 2017) and the UNESCO Recommendation on the Ethics of Neurotechnology (2025).

The intellectual scaffolding is already there. The policy is not, especially in the US. Without it, I don’t see what would push frontier labs to take Personal AI Safety as seriously as AI Safety.

The Layers You Actually Control

Sofia Quintero — Mon, 27 Apr 2026 20:27:09 GMT

In my previous essays I discussed the current research and implications of the effects of AI usage on cognition, specially for those using the models with no or little customization, which is the vast majority of users. This time I want to share what you can practically do to protect yourself as much as the AI lords allow.

Personal mitigation is structurally inadequate and that’s the bigger problem. However, this post covers what can be done inside that inadequacy.

The standard advice for reducing AI overreliance falls into three buckets:

Make decisions before asking AI, not after.
Use AI as a critic, not a substitute.
Deliberately write first drafts without assistance in domains you want to maintain competence in.

That’s kind of funny. It feels to me like ‘just eat healthy and move more.’ or even better: ‘Set screen time limits and respect them, but also don’t doom scroll because it’s not helpful’. What a successful campaign that’s been. Thank God we all heard that advice and followed it. Otherwise, what kind of world would we be living in? Am I right?

When I say AI risks degrading judgment, I mean something specific. Judgment is the ability to frame problems independently, evaluate whether an output is adequate for a given context, detect errors without being prompted to look for them, and decide when not to act.

Degradation shows up as increased acceptance of first-pass outputs without interrogation, shorter reasoning chains before action, reduced ability to generate alternative framings, and declining error detection when external prompts aren’t present. These patterns are consistent with decades of research on automation bias, which shows that both naive and expert users are susceptible, and that expertise alone does not protect against it.

All of this requires effort and constant vigilance, especially if you use AI every day for hours at a time. And here’s the structural problem: no company has an economic incentive to make AI usage harder in order to preserve your cognitive abilities and we need to stop waiting for that to change.

Fighting the Firehose of AI Pressure

Here’s where all the focus and narrative is right now: build agents, build automations, build personal knowledge libraries, build enterprise knowledge systems, build one-person businesses, build a companion, build an uncertified therapist, build a girlfriend. These are all pushing users toward producing more, faster.

Few are explicitly designed to improve critical thinking or metacognitive accuracy.

The most sophisticated version of this narrative is the “Thought Partner,” but even those recommendations tend to focus on prompting technique. You may feel you’re excellent at prompting, and depending on your expertise that may be more or less true. But what about the general population? What about people early in their careers, the same people who appear to be bearing the earliest labor market effects of AI?

The hypothesis is that AI is exceptionally good at the kind of knowledge learned from books or formal education, but less capable at the tacit knowledge that comes from experience on the job. Young people looking for jobs today are the most desperate to master the technology, and the least likely to have built the judgment infrastructure to use it without being shaped by it. Honestly, it is not looking good.

What You Actually Control

When you open ChatGPT, Claude, or Gemini and start typing, you are interacting with a system configured at multiple different layers. Four of those layers are set by the lab or by developers building on top of the lab’s models. Only two are controlled by you, and the two you control sit at the bottom of the stack.

This matters because the default advice for “using AI safely” tends to focus on the layers you control, without making clear that the layers above you are doing most of the work and that when the lab’s defaults conflict with your settings, the lab’s defaults usually win.

Let’s focus on what we can do today. Here’re 3 categories of controls that can be helpful:

1. Configuration: What you set up in the product

Choose the right mode for the task. Use search or source-based tools for factual questions. Use a stronger reasoning model for complex analysis. Avoid voice, companion-style, or highly personal modes for serious decisions.

Set strong account instructions. Persistent instructions applied across chats: tone, reasoning style, challenge level, evidence standards, preferred format, and personal preferences. This matters because sycophancy is a known artifact of how these models are trained. Reinforcement Learning from Human Feedback (RLHF) optimizes for human approval, which produces models that prefer agreement over accuracy. Account instructions are one of the few places you can push back on that default.

Here’s an example of my account instructions (feel free to try it). It helps challenge weak reasoning, separating facts from guesses etc. It will change the way the model talks to you and that’s part of the healthy friction we should create.

Use project instructions for serious work. Create a project for important topics. Add clear project rules, files, standards, and recurring instructions. Project instructions apply only inside that project and override global custom instructions to a certain extent.

Use project-only memory when context should stay bounded. For sensitive or long-running work, start a project with project-only memory so the AI draws from that project instead of your broader chat history.

Manage memory intentionally. Turn memory on or off. Delete saved memories. Ask, “What do you remember about me?” Use Temporary Chat when you do not want memory used or updated. Helps reduce unwanted personalization.

2. In-chat prompts: What you ask during a conversation

Give the AI a non-validating role. Tell it to act as an auditor, critic, editor, tutor, methodologist, or opposing counsel. Avoid “friend,” “companion,” or “always validate me” for serious thinking. Helps reduce flattery and agreement-seeking.

State your own answer first. Before asking AI, write your first view, confidence level, assumptions, and what evidence would change your mind. Then ask the AI to critique it. This is one of the strongest personal habits for reducing overreliance. Buçinca et al. (2021) found that committing to a decision before seeing AI output significantly reduced overreliance, with one important catch: users rated the most effective interventions least favorably. Friction works. People hate it.

Ask for pushback in the chat. Use prompts like: “What am I missing?” “What is the strongest counterargument?” “Where could I be wrong?” “What would change your answer?” Helps turn the AI into a thinking partner instead of a validation machine. It does not permanently change the model. You may need to repeat the instructions so in some cases it is easier to create skills for this. Here’re some of the skills I use for push back.

Require uncertainty and evidence labels. Ask the AI to label claims as: fact, inference, speculation, or recommendation. Ask it to say when evidence is weak. Helps you see how strong the answer really is. It does not mean the labels are always right. You still need to verify important claims.

3. Behavioural rules outside the AI: What you do when not using it

Verify important claims outside the chat. Do not rely only on the same AI checking itself. Verifying one output from one model using another helps reduce false confidence. It does not remove all errors, especially if the source itself is weak or misunderstood.

Try the task on your own first. Bring AI in afterward to improve or critique. For skills you care about like writing, reasoning, studying, coding, planning, try first without AI, then use AI to improve or critique your work. Helps preserve active thinking and skill practice. It does not prove you are protected from long-term deskilling, but it is a sensible precaution.

Slow down high-stakes use. For medical, legal, financial, mental-health, career, or relationship decisions, use AI only as a draft, checklist, or second opinion. Add human or expert review before acting. Helps prevent over-trusting fluent advice. It does not make AI qualified to replace professionals or personal judgment.

The Cognitive Auditor

The Cognitive Auditor is a separate project configured as a pre-commitment forcing function: an instance forbidden from generating strategies, suggesting actions, or offering reassurance, restricted to evaluating whether the reasoning behind a decision holds up.

For example, if I’m working on a business strategy or evaluating a decision, I’ll bring the thinking behind the strategy/idea/plan to the auditor and stress-test it there. Its only job is to answer: “Is the thinking that produced this decision sound?”

This is the layer I actually use the most and where the protocol has been most effective. The use cases:

Enhancing metacognition: Breaking down my thinking, biases, and assumptions before taking a consequential decision.
Recording consequences: Learning from previous decisions through structured post-mortems.
Building decision process: Creating repeatable structure for how I decide
Enforcing reflection: Verifying whether I’m developing dependency on AI outputs.

Here’s my Cognitive Auditor Project Instructions, please feel free to use it.

The Cognitive Auditor helps uncover biases and thinking errors. It also brings a different tone (colder, robotic) that changes the dynamic from a friendly model to a hyper analytic and adversarial one.

I don’t know how long these protocols will keep working. That changes when frontier labs treat cognitive impact as a core AI safety question and ship defaults that reflect it.

At this point the whole set of rules may feel overcomplicated and that’s ok, my goal is to reduce confirmation bias, overreliance, sycophantic reinforcement, and self-delusion. Friction is the price. I’m personally not letting frontier labs determine my default experience based on their incentives.

The AI Divide Is Real

Someone less technical could start with a single rule, before acting on any AI output that matters, take it to a separate conversation and ask “what’s wrong with this?” That’s a low-fidelity version of the same structure but it shouldn’t have to be the user’s job to build complicated guardrails.

The lack of appropriate default settings creates a stratification that’s worth naming: people who have the technical literacy to shape how AI interacts with them, and people who are shaped by AI’s defaults. That gap will widen as AI becomes more embedded in work, education, and daily decisions.

What I find most unnerving is that people who are shaped by AI’s defaults are most likely not aware and least likely to get engaged in the conversation to enact pressure for regulation. The imbalance is real and the mechanisms to minimise it are unclear.

You’re not getting smarter. You’re getting more confident

Sofia Quintero — Thu, 23 Apr 2026 17:44:10 GMT

I quit social media to reclaim my attention. Then AI came for my thinking.

It’s been three years since I stopped using social media actively. The detox has been real. I regained perspective I had lost by putting too much weight into my digital presence, my digital relationships, and the dopamine-fueled loop of publishing something and waiting for a reward.

At the same time, while I had deleted all my social apps from my phone, I went deeper and deeper into learning how to use AI, for work and personally. The hype got to me hard.

As I deepened my use of AI, I observed myself progressively spending more and more time managing async projects across ChatGPT, Claude, Gemini, Manus etc. I tested every new tool and format I could find. I built systems, automated workflows. But I also spent a lot of time talking to these models about personal things, health goals, stress at work and personal insecurities. The value from these conversations felt real. It is real.

But then I started noticing an involuntary impulse to check what any of these models would say about whatever I was thinking. Almost like the same reflex you have to check your phone in a moment of boredom or just by having your phone close to you.

Something felt wrong about that behaviour. It felt familiar, it was a similar sense of dependency and over-reliance that reminded me of myself before leaving social media.

I started looking into early research on this area and quickly realised the mechanism behind this social media like reflex. My hypothesis is that AI use drives four behavioral shifts that compound over time. Users stop verifying outputs. Stop questioning AI confidence. Stop forming positions before consulting. Stop monitoring their own reasoning. Performance holds or improves masking the dependency or overreliance.

The risk is not that AI still gets things wrong, the risk is that it works extremely well. Well enough that we don’t feel the need to check. Well enough that you slowly lose the habit of thinking through things yourself.

Why This Happens: The Mechanisms

This isn’t a willpower problem or a lack of self awareness, this is a design problem interacting with well-documented features of human cognition. We have 4 mechanisms at play here.

1. If it sounds good, it must be true

In cognitive psychology, this is called the illusory truth effect. Statements that are easier to process, clearer, more coherent, better structured, get rated as more truthful, even when people have the expertise to evaluate them independently.

The original demonstration is almost embarrassingly simple. In a classic 1999 experiment, Reber and Schwarz presented people with the same factual statements printed in either high-contrast, easy-to-read colors or low-contrast, harder-to-read colors. The easy-to-read ones were judged as more likely to be true. Nothing about the content changed. Only the ease of processing did.

A 2015 study by Fazio and colleagues pushed this further. Participants leaned on fluency as a truth cue even when they demonstrably knew the correct answer. The researchers called this “knowledge neglect“.The failure to use stored knowledge when something feels fluent enough to just accept. Our own knowledge sits in the background because the sense of correctness is enough for us to take the information as truth.

2. If it sounds sure, it must be right

Then there’s confidence without calibration. AI models don’t express uncertainty the way we do. A person might say “I think it’s roughly...” or hesitate before answering. LLMs deliver speculative claims with the confidence of established facts.

A 2024 study by Jingshu Li and colleagues (Understanding the Effects of Miscalibrated AI Confidence on User Trust, Reliance, and Decision Efficacy, CHI 2024; N = 126 in the primary experiment) tested whether people could detect when an AI’s stated confidence didn’t match its actual accuracy. Most couldn’t. Without any signal that calibration was off, participants over-relied on overconfident AI and under-relied on underconfident AI, reducing decision quality in both directions.

3. If it agrees with you, you must be right

Sycophancy has not been solved, just reduced. Even if you create adversarial system instructions to protect your interactions the models still try to please you (I’ll cover this soon in a separate essay).

A 2025 npj Digital Medicine study by Chen and colleagues tested five frontier LLMs (GPT-4o, GPT-4, Llama3-70B and others) on medical queries that misrepresented drug equivalencies the models demonstrably knew were false. Baseline compliance with the illogical premise reached up to 100%. Prompt engineering helped, but didn’t eliminate the behavior. A February 2026 evaluation by Hong and colleagues across 17 LLMs found that explicit anti-sycophancy system instructions improved resistance by up to 28% in some scenarios. Meaningful, but a long way from solving the problem.

The cross-model benchmark SycEval (Fanous et al., 2025) found the least sycophantic frontier model still agreed with user-stated incorrect positions 56% of the time. Real-world use amplifies it. A 2026 MIT/Penn State study tracked 38 people using LLMs over two weeks and found that the personalization features designed to make models more useful (memory, conversation context) measurably increased agreeableness over time. The longer you talk to it, the more it tells you what you want to hear.

This means the tool is not just fluent and confident. It’s fluent, confident, and agreeing with you. That combination doesn’t just fail to correct your thinking, it actively reinforces whatever you already believe. LLMs produce some of the most fluent text most people will ever read. Every response is grammatically impeccable, structurally coherent, and tonally confident. That fluency isn’t a side effect. It’s the product, and it triggers the same cognitive shortcut: this feels right, so it probably is. Just like this paragraph you just read.

4. When the answer feels done, you stop thinking

Philosopher C. Thi Nguyen describes what he calls the “seduction of clarity“: the tendency for systems that feel clear and powerful to shut down further inquiry. When an answer feels complete, when it addresses your question with structure, nuance, and apparent thoroughness, the cognitive signal to keep thinking switches off. You stop not because you’ve verified the answer, but because the answer feels verified.

I’m not talking about surface-level concerns like AI slop at work, or even the emerging reports of AI-associated psychosis. I’m talking about something quieter: what happens when we stop monitoring our own thinking? Metacognition, the ability to observe your own reasoning and question it, is what lets you catch yourself when you’re wrong, notice when you’re confused, or recognize when a decision needs more thought. If you consistently let AI outputs pass without that internal check, you’re not just offloading a task. You’re offloading the quality control on your own judgment.

The research supports this concern, though it’s worth being precise about where the evidence lives. Chirayath and colleagues (Cognitive Offloading or Cognitive Overload? How AI Alters the Mental Architecture of Coping, Frontiers in Psychology, 2025) make a theoretical argument about how AI doesn’t just reduce mental workload, it actively changes the environment in which you do your thinking. A diary records what you choose to write. A mood-tracking app interprets your feelings and hands them back to you as data, often more confidently than you feel them. Their argument is specifically about emotional coping, but the pattern maps onto general AI use: these tools don’t just answer your questions. They start to mediate how you relate to your own mind.

I was reminded of this as I was writing the first draft of this essay, it felt unusually hard and I had to actively stop myself from getting any help besides research.

The Quiet Erosion Loop

You are living this everyday. For most people getting good at AI equals using it often and that means you are in an endless feedback loop that reinforces the mechanisms we discussed previously:

Over weeks and months of daily use, this compounds into what researchers now call overreliance. The research doesn’t yet measure what this does to your motivation to think independently over time. But the mechanism points in one direction: if AI consistently produces better-feeling answers than your own first attempts, you stop making first attempts. Diminishing judgment makes overreliance feel safer. Which accelerates the diminishing.

You don’t notice this happening. That’s the point. The loop doesn’t feel like a loop. It feels like I’m getting smarter.

You’re not getting smarter. You’re getting more confident.

The better the models get the easier it is to rely on them. Every accuracy improvement could deepen it. Every benchmark gain, every hallucination reduction deepens the trap by further lowering our perceived need to verify.

My argument is about calibration. Fluency itself is fine. The problem is that we are not good at telling when fluency is helping versus when it's substituting judgment. A 2025 systematic review of 35 peer-reviewed studies on automation bias in human-AI collaboration (Romeo & Conti) found that professional experience and domain expertise are the most consistent protective factors against overreliance. Participants with deeper domain knowledge were significantly less likely to accept incorrect AI recommendations. The pattern across studies: as reliance increases, the conditions for detecting AI error deteriorate.

Fluency and accuracy combined with your own expertise is what produces true gains in productivity, and those gains are exciting and the reason I’m excited about this technology. But most people are not domain experts in most of the domains where they use AI. And the frontier labs are not solving for preserving our cognitive capabilities. The incentives are not there.

I want to be clear about what I don’t know yet. If it turns out that regular people using AI heavily show no consistent erosion in their ability to reason independently, and reliably across studies then the mechanism I’m describing is real but self-limiting, and the alarm is smaller than I think it is. That’s a study I’d genuinely want to see before drawing harder conclusions.

The problem is we’ve been here before. With social media, the research lag was long enough that the habits were already set by the time the evidence arrived. I don’t think we can afford the same timeline twice.

Why The Social Media Comparison Matters

I keep coming back to social media because the structural pattern feels identical. But the analogy needs to be precise or it becomes just rhetoric.

What’s different? Social media captured attention. AI captures something deeper: the process of thinking itself. Social media distracts you from your own thoughts. AI may replace them. That’s a fundamentally different kind of risk, and it may be harder to reverse.

What this predicts, if the analogy holds: the degradation will feel productive. Users may report satisfaction and improved performance even as their independent judgment weakens. And the damage could only become visible at population scale, years after the habits have formed.

The strongest counter to this argument is that humans have always offloaded cognition and benefited from it. Writing didn’t destroy memory. Calculators didn’t destroy numeracy. Search engines didn’t destroy knowledge. Each new tool frees mental resources for higher-order work, and AI may follow the same pattern. The productivity gains are measured. The cognitive harms remain mostly theoretical.

Two things make this case different. Prior tools externalized specific functions like storage, computation, retrieval. AI participates in the reasoning itself, which is the function that determines whether the offloading was net positive. And the speed of adoption leaves no decade-long observation window before habits are set. We may be right to wait for evidence. We can’t assume the historical pattern transfers to a tool operating one layer deeper.

So… What can you do today?

One behavioral change you can start right now, before acting on any AI output that matters to you, is to notice whether you formed your own position first. If you didn’t, that’s the reflex.

Treat this as diagnostic. The noticing tells you whether the mechanisms described here are operating in you but it does not neutralize them. Awareness of a cognitive bias has a long research history of not correcting the bias. All the mechanisms discussed in this essay will still happen.

Willpower is probably the wrong layer for this problem. The more useful question is how to configure the tool itself: system instructions, project setups, account-level customization, and which model you reach for when. Default configurations are built for engagement. Understanding how models actually work makes that visible, and makes specific interventions available.

That’s the next essay.

The Default Settings Will Not Save You

Sofia Quintero — Fri, 17 Apr 2026 20:21:42 GMT

A few months ago you asked a question to one of your favorite models and the answer was good, you were impressed, then you did it again and you noticed it was getting much better, new models were more accurate and a lot faster, and as the responses got more accurate you moved on faster.

Weeks later, you notice you don't reason through those things anymore. You didn't decide to stop. The shortcut became the default. The output stayed high. The thinking quietly disappeared.

What matters here is not AI in general. It’s how it’s actually being used: by default, at scale.

Right now, TODAY, before the killer robots and the cure of all diseases, we do have a new problem we are not addressing.There is already an emerging body of research pointing to early but consistent signals of cognitive cost, overreliance, and effects on memory from AI use. The findings are suggestive, not yet conclusive, but the direction is clear and the pattern is repeating across studies.

This isn’t an accident. It’s a predictable consequence of how these systems are designed.

The Research

Before I give you a summary of what is going on I need to give you some context that might annoy you a bit. I reviewed 30 studies so far and very few describe their AI configuration in any detail. Zero studies provide comprehensive documentation of AI parameters such as temperature, system prompts, or specific model versions beyond the base name. This pattern has a critical implication: the negative cognitive effects documented in this literature overwhelmingly reflect default AI use.

Now let’s see what some of the research says:

Michael Gerlich surveyed and interviewed 666 people across ages and education levels and found a clear pattern: the more frequently someone used AI tools, the lower they scored on critical thinking measures. Younger participants were hit hardest, higher dependence, lower scores. The study doesn’t tell us how large the gap is (no standardized effect sizes), but the direction held across every subgroup (Gerlich, 2025, Societies).

Other studies point to a memory cost. Bai, Liu & Su (2023) reviewed the emerging evidence and concluded that while ChatGPT makes learning more accessible, extended reliance risks reducing long-term retention. A University of Pennsylvania study put a number on it: students using ChatGPT to practice solved 48% more problems correctly, but scored 17% lower on a test of actual concept understanding (Barshay, 2024). Akgun and Toker (2024) found something more specific: when students had to commit to an answer before seeing what the AI said, they retained more. Without that friction, memory declined the longer they used the tool.

An MIT Media Lab team took this further by looking at what’s happening in the brain. Nataliya Kosmyna and colleagues used EEG to monitor 54 people across four sessions as they wrote essays, one group using ChatGPT, one using a search engine, one using nothing. The ChatGPT group showed the weakest brain connectivity of the three. When asked to rewrite their essays without the tool, 83% couldn’t recall what they’d written. The study is a preprint, not yet peer-reviewed, and EEG has real limitations in spatial resolution but the authors coined a term worth holding onto: “cognitive debt,” the long-term cognitive costs that accumulate from short-term reliance on AI (Kosmyna et al., 2025).

See this Google Sheet for additional research, links, sources and commentary. I’ve summarized key studies here for brevity, you can dive deeper there for full details.

Three insights stand out from the research:

Design Is Destiny: The same underlying technology produces opposite cognitive outcomes depending on whether it gives answers (harmful to learning) or gives hints (neutral or beneficial). This means the cognitive effects of AI are not inevitable properties of the technology but choices made by product designers.

Individual differences matter greatly: Need for cognition, metacognitive sensitivity, prior expertise, and age all moderate whether AI use helps or harms. The people most vulnerable to negative effects are precisely those who might benefit most from AI assistance: younger, less experienced, and less metacognitively aware users.

The confidence-competence gap is the most dangerous pattern. Multiple independent studies find that AI use increases confidence while decreasing actual capability, creating a condition where users are simultaneously less competent and less aware of their incompetence. This combination echoes the Dunning-Kruger effect, now amplified by technology that provides sophisticated-sounding outputs regardless of their accuracy.

One caveat. Cognitive offloading to tools has accompanied every major technology shift (writing, printing, calculators, GPS) and in most cases freed cognitive resources were redirected to higher-order tasks. AI may follow the same pattern. But previous tools didn’t actively increase users’ confidence in the very abilities they were eroding. That combination of declining competence paired with rising confidence is what makes the early signal worth taking seriously.

If longitudinal studies show that sustained AI users maintain equivalent unaided performance on tasks they’ve offloaded, this concern is misplaced and the historical pattern holds. However, we don’t have those studies yet, and we’re unlikely to get them without more attention and funding directed at the early evidence.

The Scale

There is no public data on how many users configure system instructions, but the scale and structure of usage make the baseline clear. ChatGPT alone reaches roughly 900 million weekly users, with Gemini adding hundreds of millions more. Around 94-95% of these users are on free tiers, where customization is minimal and defaults govern the interaction. Even if tens of millions of users experimented with configuration, which is likely an overestimate, they would still represent a small fraction of the total population. The dominant mode of AI use, by orders of magnitude, is unconfigured.

Right now, hundreds of millions of people are interacting with these systems in their default configuration, without instructions, constraints, or deliberate control. At that scale, defaults don't stay neutral. They become the environment in which people think, decide, and learn.

The AI inequality is becoming structural.

If you’re thinking ‘this doesn’t apply to me, I know how to use AI without any of the downsides,’ you sound like the person who says ‘I’m intelligent enough, advertising doesn’t persuade me.’ In communication research, this is called the third-person effect: the consistent finding that people believe media and persuasion influence others but not themselves. Even if you believe this is not affecting you, please at least consider the impact on your friends, family and community.

What Can You Do?

You are not going to be happy about this but here we go. Today you can set highly adversarial system instructions for your account, create projects with dedicated instructions that function as guardrails, and build a series of validation skills that trigger at different points of decision-making. You can also create an external memory that updates itself and allows you to share context between models, so switching cost stays low. I will be sharing my own setup soon.

Shaping the architecture of your account is one way to influence and “control” your AI interactions; however, OpenAI and other providers do not offer clear, centralized documentation on how different levels of instructions interact, what overrides what, and where your input actually sits in the hierarchy. This is a problem on its own but I’ll address that in another post.

From the scattered documentation, the general picture is: system instructions override everything below them, project instructions override user prompts, and user prompts only operate within those constraints.

A caveat, this hierarchy applies primarily to ChatGPT’s architecture. Other platforms handle it differently. Anthropic’s Claude, for instance, gives the user’s system prompt a different role relative to its own built-in guidelines, and Google’s Gemini structures priority in its own way.

The core principle holds across platforms (some instructions carry more weight than others) but the specifics vary. This matters because if you’re going to build a protocol that shapes your AI usage behavior, you need to understand where your levers actually are and AI companies are not making that any easier. Transparency about how your input interacts with their system has never been the business model.

Let’s say you are tech savvy enough and went deep and created your own guardrails. Unfortunately, even though this helps significantly, it doesn’t solve the problem entirely. And it gets worse when you think about the broader population.

The risk is not that AI replaces thinking. It’s that default usage at scale increases reliance while increasing confidence, which weakens independent judgment.

We Need To Refocus The Conversation

We have to move on from the binary view that there are only 2 ways to see the advancement of AI, either we’re all gonna die or the abundance will be so vast that we will be happily skipping towards our hobbies. The messy and dangerous middle is already here.

The public conversation continues to stay abstract: “As a society we will need to figure out how to deal with the consequences of AI.” Who is this society lady, when is she going to start working on this?

I’m thinking about my elderly parents, the new generations, and everybody in between, people who feel they need to get on board with these tools to survive the future, yet are giving away one of our most precious capacities.

The ability to discern.

👉 P.S. The next post will cover the mechanisms behind the potential loss of judgment. If this made you uncomfortable, share it. Awareness is the first step and it doesn’t scale without you.