AI🔗Trust & Safety

Behind the Curtain

I consider myself a discerning user of technology. I have an Echo (aka Alexa) but not a smart refrigerator. The first makes my life easier, the second would tip me over the edge. So although I’m not in the field, I recognize that we’re living in “interesting times” technologically speaking.

In my lifetime we have seen the advent of the PC, ATMs, MRIs, cell phones, the internet, DNA testing, barcodes, GPS, bluetooth, hybrid cars, iPhones and the one I most love to hate, social media-to name just a few of the most prominent innovations that have absolutely and irrevocably changed the way we live our lives.

I’m also an avid reader, so when AI was thrust so prominently into the spotlight, particularly so dramatically in the last year, it was no surprise to me. The dumb money (to borrow a financial term from another life) was finally catching on.

AI was first developed in the ’50s and is already being used more extensively than the average person realizes or thinks about (at least until they need customer support for, well, anything really). The only thing holding AI back before was computational power. But it was turbocharged in 2020 with several breakthroughs in neural network design. Welcome to the age of big data.

So yes, I agree with those who call AI our age’s industrial revolution. There’s no going back and it will change everything and there will be growing pains and a necessary realignment of the workforce. It will bring positive advances and negative, sometimes unintended and unforseen, consequences.

So in thinking about AI two interralted issues concern me the most, as an average American. What data, exactly, is being used to train these AIs and who’s monitoring their output to at least sound the alarm when inevitably something goes wrong?

Right now, we’re at the frontier of large language models, which are the types of AI that mimic human intelligence. I’ll explain more about how these AI are trained in the second part of this post, but basically vast amounts of data are fed into the models so they can learn the patterns and connections between words and phrases. The theory being the more data used, the better the software will be at generating new content.

But common sense, and a bit of life experience, tells me garbage in, garbage out. Computer scientiests and mathmeticians even have an acronym for it, GIGO. But I don’t see much discussion of it in mainstream media.

Who’s making sure the companies developing these LLMs are actually using high quality data as opposed to feeding the models every bit of garbage that comprises the modern day internet? I know who’s not doing it…the governement. Which leads to my second interrelated concern.

Specifics vary across fields and companies, but basically trust and safety departments exist on online platforms to protect their users, whether from fraud, harassment, offensive content, spam, or misinformation and disinformation.

As someone concerned about the state of American democracy, that last one concerns me the most, specifically on social media sites. A democracy cannot survive in the absence of a shared reality based in a factual world. The only thing today maybe we can all agree on is our society is fractured. IMHO misinformation and even more so disinformation will only widen those fissures.

Recently the head of Twitter’s trust and safety team resigned and pretty much every major news outlet has reported on how tech layoffs have affected trust and safety departments leading to fears of backsliding on efforts to curb online abuse. (Google it.) So who’s watching out for that stealth bolt quietly spreading the next conspiracy theory on social media?

QAnon started on 4chan and in no time exploded to mainsteam social media sites and now it’s as popular as some major religions. And they didn’t have the easy access to chatbots available right now. (More on chatbots below.) The left is not exempt from its own share of conspiracy enthusiasts either with sites like the Palmer Report, Shareblue, and Patribotics. The danger has disinformation researchers in a tizzy and led to the man commonly called the “Godfather of AI” to resign from Google.

So right when chatbots are poised to become not only much smarter, but also better at mimicing human speech patterns, social media sites are downsizing the very people poised to keep them in check or at least let us know when something’s amiss? Even in the era of “move fast and break things” this seems like a sure fire recipe for disaster.

Cliches and truisms are oft-repeated because sometimes they just say it best and right now one comes to mind. Bells cannot be unrung. At this point no one is going to dissuade QAnon believers, election deniers, or even moon landing skeptics their beliefs are factually incorrect. I’d rather not see what kind of conspiracy theory an intelligent AI could quickly proliferate at the behest of a bad actor, foreign or domestic.

The Deep Dive

Before I get into the next part of my post I want to explicitly state something that’s probably obvious if you’ve made it this far: I’m a bit of a geek. But if you’re still reading, you’re probably a bit of a geek too. I have read many scholarly articles on AI and chatbots, and large language models from sources I consider credible, mostly peer reviewed scientific journals and niche publications. Whenever I’m interested in a topic I tend to do a deep dive and learn more than any of my friends and family are willing to listen to. You can find some, though not all, of them here.

chatbots

Chatbots have been around for a long time but really came into popular use in the late ’90s. The first one, Eliza, was created in 1966 by an MIT computer scientist. ChatGPT and BARD are the two most talked about right now, but generic chatbots, essentially computer programs that can interact and respond to humans in real-time “conversations” are literally everywhere in customer service departments gathering data and routing calls.

Early on, it was easy to tell that you were “speaking” to a computer. Something that I think is lost in a lot of the noise over the very real capabilities of modern AIs is that what we’re really talking about is computers that can imitate humans convincingly. I’m simplyfying, but if you’re interested in the underlying philosophy and scientific priciples, see here.

Large Language Models (LLMs)

Neural networks are a type of machine learning model that are loosly based on the human brain and the heart of deep learning algorithims. These algorithms are how AIs are trained. Large language models are basically computers that understand and generate text in a human like fashion.

They are trained using a technique called “deep learning”. This involves feeding massive amounts of text data into a neural network with multiple layers, allowing the network to learn the patterns and structures of language. Language models are trained on vast amounts of text data because, unlike humans, they don’t have innate knowledge of language and must learn from scratch. The training data provides the model with a vast range of linguistic patterns to learn.

Jetpack AI

The quote above was generated by the Jetpack AI availble right now on WordPress to subscribers. I asked it how large language models are trained and that was the initial part of its response, which I edited so as not to put anyone to sleep.

When I asked it where the data comes from, it responded as follows:

The text data used in deep learning is often obtained from a variety of sources such as publicly available datasets, web crawls, or data partnerships with companies. While some large language model training datasets are curated by academic or industry organizations, in many cases the data is sourced or purchased from a variety of sources. In general, no one entity or organization controls all the text data used in deep learning.

Jetpack AI

When I asked it if hate speech was used in the data fed into the large language models it told me this:

The use of hate speech in deep learning is a controversial topic, andit is up to individual data providers and researchers to decide whether or not to include it in their training data. Some argue that limiting the training data to exclude hate speech will result in more ethical and responsible language models, while others argue that omitting such data will lead to models that are less generalizable to real-world scenarios.

However, it is important to note that hate speech is not condoned and has no place in society. Large language models can help in identifying and mitigating hate speech by using natural language processing techniques to better understand the context in which hate speech occurs and develop more accurate detection algorithms.

Jetpack AI

Of course if no one is spending the money to employ AIs to monitor hate speech or disinformation the point is moot. But as you can see, the capabilities of the technology are quite remarkable and easily accessible to anyone already.

Now, I’m not a coder or a hacker or a programmer, but its extremely possible and plausible that anyone with those skillsets could program and deploy a chatbot to spread hate or disinformation on social platforms. It’s already happened. It’s why Microsoft had to shut down its Tay chatbot after only a few hours on Twitter in 2016, when it started spewing offensive and hate-filled tweets “learned” from a post on 4chan.

Hence why I’m concerned about the use and proliferation of these chatbots and the lack of oversight. In my opinon it’s already impossible to tell whether you’re actually chatting with a person or a computer online.

Just in the last year, a fake twitter account caused stocks to plummet for Eli Lilly when it announced free insulin, a lawyer used ChatGPT to create a brief only to find out later it made everything up after he’d filed it in federal court and BING, Microsoft’s AI chatbot, threatened users.

I think its important we’re informed so we can govern ourselves accordingly.