Pre-Project

Once upon a time there was a wee little boy that picked up a science fiction novel. In it, cars could fly and robots could talk. The internet, also, never cut out. The boy kept these ideals at heart, until one day he could bring to life in three weeks for a Hawken Project.


Radical, I know. Society has seen the news. Anyone can talk to a robot from anywhere at anytime. All they have to do is visit https://openai.com/chatgpt, click Start now, and their off to the races. When they do, a multi-million dollar computer running a Large Language Model spools up and they can have a near real-time conversation with a computer: just like the movies. Many easily consider it silicon's valley's biggest debut of the past 10 years and for good reason. 

My experience of this technology has been a little more 'down to earth' compared to the rest of society, and I'm not calling everyone crazy. Before they were large, language models were just, ya know, language models. You take a sentence or a whole bunch of them and feed it into the model, much like y = mx + b, and out came basic stuff. They could perform sentiment analysis (Does this sentence carry a cynical or a positive tone? ), detect hate speech, filter spam emails, translate between languages, or fill in the blank. 

Here is a basic demo from Bert, a language model created by google from 2018. I've set it up to do a 
basic fill in the blank.

Input sentence:
"Owen is doing a Hawken Project. BLANK is going to be really fun."
Output: 

It 66%
This 28%
That 3%
Everything 1%
He 1%

For each word there is the associated guess that BERT thinks would belong in the blank. The strongest guess is "It" at 66%. Really cool right. I'll get into more details later in the blog, but the net takeaway is that language models where once a feeble technology that only a researcher could appreciate. 

Since then, they've acquired a face in popular culture. OpenAI's ChatGPT holds the record for shortest time to acquire 100 million users at only 2 months, a surprising fact considering how basic the interface is. It is a chat app with a input bar at the bottom and a list of previous conversations on the left. That's it. OpenAI took the most conservative approach possible which is great for initial adoption but exists far from an idealized form.

For my project, I will create an full stack application where a person will carry out a verbal conversation with a robot. The 'full stack' refers to two components, a phone and computer app that the user interacts with and a server that the phone or computer interacts with. These components are the frontend and backend, respectively. The user

The frontend will be an iOS application written in the swift programming language. This will be where the user interacts with services provided in the backend, mainly the chat. Their voice will be transcribed and sent to the backend.

The backend is a separate computer (at home) where the heavy lifting takes place. It will take in requests from the IOS application and respond with a computer synthesized voice.

These components will combine to create an experience that is much akin to Jarvis. He... He...