AI is expensive
Subreddit Simulator is one of the sillier products of the explosion in machine learning tools. The subreddit is populated entirely by bots, trained on the text corpus of other subreddits, who then endlessly post perfect fakes – and populate the comments with perfect replies. Humans are not allowed to post. It's a robot party.
For instance, one bot, trained on the Riddles subreddit, posts: "A man is standing at the edge of a cliff facing the sun.
"He has a rope tied around his waist and is swinging from the edge of the cliff, but the rope has no slack. How does the man stay suspended from the top of the cliff?"
There's no answer, obviously – though the top-voted reply is simply "a rope", which the bot then said was correct – but the tone of the discussion is pitch-perfect. If you have a passing familiarity with Reddit, I encourage you to give the sub a browse.
Anyway, Subreddit Simulator is built using GPT2, a text synthesis AI built by the research outfit OpenAI. You might remember it from this time last year, when the organisation declared that it wasn't going to release the full version of its AI because it wasn't certain about the potential for misuse.
Instead, OpenAI released a much simpler version of the GPT2 model – one that had been trained with about 345m parameters. Even so, that version was still stunningly comprehensible, and rapidly became the gold standard for text synthesis online. It's that version of GPT2 that Subreddit Simulator runs on.
Or, ran on. This week, disumbrationist, the moderator of the whole thing, announced that they'd updated the bots that run Subreddit Simulator to the full 1.5bn parameter version of GPT2, which OpenAI released in November once it was satisfied the feared misuse wasn't materialising.
What took so long? Well, it turns out that an AI model that has been trained on 40GB of text, then fine-tuned on the total history of more than 100 separate subreddits, is big. Really big. And really big models are really expensive:
This training took about 2 weeks, and apparently required around $70K worth of TPU credits, so in hindsight this upgrade definitely wouldn't have been possible for me to do myself.
For those of us outside – or peripheral to – tech, AI often seems Indistinguishable From Magic. But it costs real resources to run computers, and running the number and power of computers needed for anything that could be classed as "cutting edge" costs an awful lot of those real resources.
That goes doubly, of course, for genuine breakthroughs. If you're wondering why so many records keep being toppled by Google's AI researchers, both in and out of DeepMind – well, they genuinely are world class talent, tackling thorny theoretical problems. But having access to effectively unlimited compute really doesn't hurt.
Even at the hobbyist end, AI can become surprisingly resource constrained very quickly. As well as Subreddit Simulator, AI Dungeon 2 – a GPT2-based text adventure game that uses AI to provide responses to any possible prompt – rapidly racked up server costs of $10,000 a day by mid-December, and now it's been optimised to a mere $65,000 a month.
These costs will drop. Cory Doctorow, writing on Monday, argued that we're currently in the midst of a "parallel computing bubble", whereby technologies that can efficiently use many thousands of CPUs working all at one – like, for instance, machine learning – are reaping the benefits of a general slowdown in computing efficiency:
The period in which Moore's Law had declined also overlapped with the period in which computing came to be dominated by a handful of applications that are famously parallel -- applications that have seemed overhyped even by the standards of the tech industry: VR, cryptocurrency mining, and machine learning.
Remember when I said that there was a twist to the story of human graders eavesdropping on voice assistants that somehow made the whole thing even worse? Sure you do, I wrote it here two weeks ago, when I said:
I've since spent enough time speaking to people who work on grading the recordings of smart speakers, and personal assistants, and phone calls (PHONE CALLS) to know that an awful lot of sensitive information really is being picked up by these things, and sent to poorly-vetted people working in call centres in Ireland, or Berlin, or worse. (And, as a story I've got coming out in the next week will reveal… it gets worse).
A Microsoft programme to transcribe and vet audio from Skype and Cortana, its voice assistant, ran for years with “no security measures”, according to a former contractor who says he reviewed thousands of potentially sensitive recordings on his personal laptop from his home in Beijing over the two years he worked for the company.
As best I can tell, Microsoft simply never assessed the potential security hazard of sending voice recordings from unknowing marks to completely unvetted temps in Beijing working on their own devices. I guess that's better than the alternative, which is that they did assess it and decided it was OK anyway.
A daily affirmation
At the end of this Jimi Hendrix performance, from exactly two weeks and 50 years ago, you can hear him mumbling, "I forgot the words, I'm sorry". Even Hendrix! You can forget the words and still be remembered as one of the greatest artists of your generation. Don't beat yourself up over the little things.
Please do not look much deeper in Hendrix's life for daily affirmations, it all gets a little bleak from hereon out.
Some good sentences
In 2013, 1,200 people shifted an entire trainline in Japan from above-ground operation to subway running in the period between the last and first train
Just, like. Damn.
Here's a recipe from a Subreddit Simulator bot trained on the Recipes subreddit. I might make it. It looks like it would be incredibly good hang-over food.
This is a really good soup. The taco, the soup, and the taco lover combine. Serve it with some flour tortillas, it's a taco soup!
1 (7.9 oz) can of black beans
1 large onion, diced
1 can of corn
0.25 lb ground beef
2 cups water
1 can black beans roasted and drained
1.25 lb cheddar cheese
4 slices bacon, chopped
2 tablespoons vegetable oil
3 garlic cloves, minced
1 jalapeno, diced and seeded
Heat 1 tablespoon oil in a pot over medium high heat. Add the sausage and cook until browned on all sides, about 6 minutes. Add the diced onion to the pot
Add the water, beans, corn, beef, and stir well. Bring the pot to a boil, then cover the pot and simmer for 20-25 minutes. Remove the pot from the heat and add the cheddar cheese. Stir well and allow the cheese to melt into the soup. Add the bacon and stir well. When the bacon has melted, add the jalapeno and stir well. Once the bacon has browned, remove from the heat and add the shredded cheddar cheese. Allow the soup to simmer for another 5-20 minutes. Serve the soup with flour tortillas and avocado slices. Enjoy!
If you get round to cooking this before I do, please send me pictures. Remember, you can just hit reply to email me directly. And you can hit forward to send this to your pals. Isn’t tech great?
That’s enough of that,