The General Layout and Design

famous man once said, "never trust a computer you can't throw out a window." It explains the handle on the first iMac.

The app has to be simple. So, I decided to travel back in time. 




Layout:

In the top half of the screen, a large, CRT style monitor displays one of two items: A live waveform of speech or the text transcription of the current conversation (the user gets to pick). Upon the first load, the app will default to the waveform which has zero functional purposes. It warps and flings based on the speech of Sal or the user. That's it.
 Just like the red eye of HAL9000 or the blue eyes of EVE from WALL-E, this waveform makes up for the visual iconography for the app. In many ways, it becomes the 'face' of interface as it draws the most visual attention. I've decided to completely copy the design of Karen from spongebob. (More details on this in another article)

As with the rest of the app, anything overtly new will evoke a dystopian novelty that must be avoided. Sal's wave only morphs based on the audio characteristics of speech. It doesn't reflect any underlying emotional or cognitive... whatever. This choice is intended to suggest the simplicity of Sal's underlying mechanics.

Moving down the interface, there are two utilities of equal size: a red microphone button and a 2x3 grid of controls. 

The red button serves two purposes: "Start" and record. When the app first loads, the button will initialize connections to Sal and load the voice transcription model, Whisper. Afterwards, it changes into microphone mode. While holding this button, the app starts to record and transcribe incoming speech. Upon the conclusion of the press, the transcription ends and gets sent to Sal. These functions make the microphone button the primary means of interaction for the whole app. It's size, color, and placement near the right thumb suggest to users that it should be the interactive landmark of the whole app.

The 2x3 is as follows:

1 - A toggle between the waveform and the chat views.
2 - A toggle for constant recording. Instead holding the red button, the user can toggle constant recording. Their request will be sent automatically. (More details on this in another article).
4 - A new conversation button.
6 - A toggle for google enhanced conversation. ooo, fancy.
3,5 - Controls for microphone sensitivity.

As opposed to distributing these controls throughout the interface, I've kept them in a confined box. To me, at least, it restrains the potential stagnation associated with chaotic interfaces.

The aforementioned items (the screen, 2x3, and microphone button), can all be swiped to the top of the screen like so. :

This zone is under construction.

At the very bottom of the screen, resides a small digital clock-face style indicator. Any status messages will appear here. These include, "load model to start usage", "connecting...", "Error, no internet", etc.

Design:

The general design is meant to emulate old computers, calculators, and oscilloscopes. 

Before the touch screen, these devices maintained a clear separation of user input and computer output. The screens were basic. The buttons were labeled (coming soon). Everything was two-tone beige. I started with:


To invoke a sense of modernity, the entire app inherits from the tenets of neumorphism. All buttons and icons are shaded to using a subtle shadow and a subtle highlight from a shared 45 degree light source, adding a 3D softness to everything. 



These principles combine to suggest the simplicity and early stage nature of the underlying technology. LLMs are pretty simple after all.

Comments