Reinforcement Learning with Human Feedback (RLHF) isn’t enough: why economists use instrumental variables to fix bias

KeithLee · August 29, 2025, 12:16am

Source: Not Your Therapist: Why AI Companions Need Statistical Guardrails Before They Enter the Classroom | Swiss Institute of Artificial Intelligence

There is a term called ‘economist by training’. The term is used for researchers who can understand math and stat in economics that can be used to analyze the world.

And, for the growing number of people relying on ChatGPT (or sorts) for their mental therapy, I think the chatbot dev teams now need to be ‘economist by training’, at least they have to learn key statistical techniques from the discipline.

Then, what’s the academic topic that they missed from computer science department’s machine learning classes?

It’s called ‘endogeneity’.

I have never seen any field but econometrics (and related statistics) deal with this topic, so I can safely say that engineers have hardly heard of the topic, let alone how to deal with it.

In fact, it is no surprise they don’t deal with this topic at all, because in real world, hardly in lab-like experimental settings we can observe such cases for ‘endogenous’. In real world, what you did yesterday is the best indicator of what you do today. After all, this is why we learn history. Learn from ancestors and do not make the same mistake.

Case A - No endogeneity case

In econometrics, the situation is a bit more grave than simple history, though. Say, the US Fed announced rate cut by 0.25%p yesterday. The stock market went crazy over the day. On the following day, all Asian stocks also went red numbers and upper arrows. Because of the rebounding momentum in economic activities across the world, the Dubai Crude index also went up. And, in the Eurozone trading time, given the high oil price, traders in London feel that inflationary pressure may deter economic rebound in short term. This further retracts the earlier trading day’s gain in the US stock market. On the following day, the Dubai Crude index retreats, because the global stock markets have retreated all together.

I can go on and on, but I think you are smart enough to see the feedback loop making ups and downs. Eventually the situation will be contained. i.e. the economy goes to its equilibrium state.

But, what if the belief does not autocorect? What if it reinforces on and on?

Case B - Some endogeneity case

Now, let’s think about another stock that a ‘strategy team’ is clandestinely pushing the price up. The same team has hired a few ‘recruiters’ in a secret chat. They claim the stock will be jumped by $5, then reach $20 by the end of today. Later the day, it reached to $20.45. The chat visitors can’t believe that the prediction was so close. On the next day, the price goes up again, up to $23. After a few days of this momentum episode, you don’t need the ‘strategy team’ to manipulate the price anymore. People will just shout “Shut up and take my money”.

By then, the strategy team has spread the rumor that the stock worths $1 million dollar. If you have a time machine, you would be happy to pay $100,000 and go back to your younger-self to make him buy the stock.

In a few weeks, the stock prices reaches to $600, like 40-fold of the original price. It’s the time the ‘strategy team’ to sell off the position and walk away.

During the weeks, if you just rely on previous day’s stock price, there is no path that it will fall down, It will continue going up. Is it really a self-fulfilling prophecy?

Self-fulfilling prophecy vs. auto-correction

Then, how can you make a right model to break that ‘fake’ prophecy?

It’s simple. You stop relying on the past data. Because it is made by the ‘strategy team’. It’s the company’s earnings that should be the right indicator of the stock price, not the fake news articles made by the fraud team.

So, there can be two ways.

Use entirely different set of data - like earnings per share
Remove the fraud team’s effect in the past weeks’ stock price

Often, the 2nd method is not easy to implement.

But, if you have full set the fraud team’s trading data, you can almost exactly recover how much manipulation they have created.

Instrumental variable regressions (IVR)

Econometricians use the 2nd method for social data (together with the 1st method).

The 2nd method, in econometrics, is called Instrumental variable regressions (IVR), and this is one of the key components of SIAI GSB’s teaching. In fact, from 2024, we have made the basic stat course public on SIAI Square, so that outsiders can understand how we approach AI-based data scientific system.

If you make an AI system without this kind of statistical tools, people just fall into the system like they do to religion. This is precisely what we see these days.

Given my experience with engineers so far, I think they don’t apply IVR on their AI Chatbots because they just don’t know it, but even if the management knows that, I still think they wouldn’t apply that to the system.

Why? Because then, there will be less user.

Not necessarily ‘paid’ user, but certinaly largely portion of users will just look for other chatbots telling them what they want to hear. After all, this is how YouTube algorithm works. Leftwing always consume leftwing agenda, the same as rightwing expose themselves only for the rightwing perspective. For other side, they go angry and ignore eventually.

Less popular, but necessary adjustment

I know it won’t be poular, but it’s necesary adjustment.

This is why at SIAI GSB, we integrate econometric guardrails into AI education — because without them, systems are bound to mislead. After all, this is what we do at SIAI Research.