Posts

Sal's Signature

Image
T he first macintosh, said hello - so must Sal. Whenever the user opens the App, Sal's signature will animate across the screen.  The mechanics of it end up being pretty similar to the FFT wave. There are two types of Bezier curves: quadratic and quintic. The wave uses a quintic curve because its a little easier to get the slope is zero properties at the peaks and valleys. The signature uses a quadratic curve as it is easier to get the smooth properties of handwriting.  Rendering a curve requires a series of points for the curve to pass through. Because it's a signature, ( surprise, surprise) I needed a signature. I've taken inspiration from famous people, of course, having the 'S' be massive and the 'a' be small. It had to be drawn in one continuous line which makes  the animation prettier. In addition, i t needed to start at mid height on left and finish at mid height on the right.  After an hour in desmos, each of the 32 points (below) was  loaded into a

Don't follow the sheep

Image
A s fixated as Amazon is on shipping times or Google on its search results, there exists a market of people that most people don't talk about:  Developers! Developers! Developers! - Developers, developers,  developers ... When most people think of tech products, they'll bring up watches, laptops, phones, and the like. To most people in the external world, these products keep companies afloat. When it comes to software however it becomes a different story.  All the apps on the App Store or programs on Windows are made by developers. Developers, like other primates, only have a limited amount of time and skill which dictates how much work can be accomplished. Crazy I know. Thankfully, Apple, Microsoft, and friends can give it back in exchange for a developer's soul. Such  companies host annual developer's conferences where they pitch to shareholders, "look how well we're doing," and to developers, "look how cool out platform is," through their soft

The Voice: Part 1

Image
A  man once said, "if you want a robot to talk to you, it should have a sassy voice." Above, is  Michael B. Paulson, also known as the primeagen on twitch. He was just born with one of those voices that is easy to read because it matches his personality. Both are whiney, kinetic, funny, overcooked, intelligent, and something you just want to witness. It would be great if Sal used his voice. (A CLIP OF HIS VOICE - INSERT HERE) Thankfully, we have the technology! T hough I don't have TikTok myself, I have seen a couple videos featuring AI generated content. It is quite easy for anyone to generate a voice of some celebrity using publicly available tools.  Though said tools produce accurate voices, they are quite slow, taking 30sec - 2min for just one sentence. Because of the time and compute budgets of Sal (sub 1 second response time) , I sought out a open source models in the ~100 million parameter range (lower parameters means faster generations - Sal's LLM is 8 billio

This project is not a scam

Image
  O penAI peed on my rug - It's tough being me. Ok, so... I'm not only one in the industry who wants an iPhone app they can talk to. The component technologies (speech generation, text generation, voice transcription) have been of a high enough caliber such that some company could've packaged them up and shipped them in a product.  Of the companies that could release such a product, it was going to be OpenAI who  could've come up with 'something' a 1 year ago. GPT4-o brings a fundamental breakthrough to the scene that explains away a lot of the 'packaging.' GP4-o - the 'o' is omni - accepts a plethora of inputs which gives it the 'multimodal' designation.  Multimodal models have have existed previously but we have yet to see one that encapsulates all the conceivable input types (audio, text, images, video) at the same time. For Sal, user input is being handled in a 3 stage pipeline with one model handling each task. At the beginning, its

The Transformer: Part 2

Image
T hink chatting with robots is crazy, out of this world technology. Think again... To get the veneer of a conversation, the vanilla LLM needs some guidance. Remember, the transformers that power chat applications just predict the next word (they're really sub words called tokens - doesn't matter). For an application environment, outputs aren't compared to any 'ground truth' as there isn't any; The model is just inferring .  Context matters in this process.   To get the model to continuously synthesize, the newly modified sentence must be fed back into the transformer, over and over again.  Going back to 2019's BERT... Example 1:  I have a cool  BLANK . mind 6% head 5% attitude 4% personality 4% life 3% Example 2:  I have a cool BLANK . He sounds great. boyfriend 20% friend 12% brother 9% dad 7% guy 4% By adding subsequent words to the sentence, BERT's predictions not only change their meaning but become more confident.  An exam

Are you a Swifty?

Image
T o prove I'm a real boy, Sal is built in Swift. Oh my gosh, just like... shut up. AAAAhhhh.. Yes. Cats versus dogs, pineapple on pizza, toilet paper orientation, vi versus emacs (search one Google and it will suggest the other) : the age old debates... What programming language is best? Well of course, you know, it depends on the project and the programmer and their experience and their preferences and the weather and the answers to some of those other debates. Why choose Swift? Its not the common choice; that's for sure. Swift is ranked 20th on the stack overflow developer survey. Of these swifties, a none trivially large number of which are iOS developers. There is no mention of AI, data science or the like. The top three were JS, HTML, and Python. Because Sal was never going to be website, I'm ignoring JS and HTML.  Python fits the mold for my application perfectly as it is the language synonymous with deep learning. It's popularity gives me access to tons of peopl

The Transformer: Part 1

Image
  I n 2017, engineers at Google invented the transformer and then did nothing with it. Arguably the greatest paper of the past 10 years, "Attention is all you need"  introduced the model architecture that became the standard for the entire field. The paper wasn't radical. Nerds at Google made some incremental changes on preexisting work, but whatever... At the time there was almost no indication that it a invention worthy of a million bucks. Above, lies the 'graph' of the network. Liken it to an assembly line of operations that are processed one after the other. At the bottom, the 'Input Embeddings' are a altered representation of one language, say English, and 'Output Embeddings' are similarly altered representation of the second language, say Spanish. Both of these are fed in the bottom, work their way through the 'nodes' of the graph, and out the top, arrive the result.  There are many details on what each node does, what arrows represen