Key Takeaways
- Apple is actively working on developing an LLM that can run locally on your iPhone for increased privacy.
- Siri may become far more accurate at understanding user commands through context and the ability to reference text on the screen.
- Siri’s future may include features like advanced image recognition, triggerless activation, and improved conversation holding capabilities.
Some Apple products fail to live up to the hype (the Apple Newton, anyone?) and arguably the biggest disappointment of all has been Siri. When Siri arrived, it heralded a new dawn of how we would interact with our smart devices.
Yet, since 2011, when Siri first appeared on the iPhone 4S, the virtual assistant has changed very little, becoming a feature that often remains unused by many iPhone owners. The rise of LLMs such as ChatGPT has also shown Siri to be horribly outdated.
How to use Siri Shortcuts with your Tesla to supercharge voice control
We put together a simple guide on using Shortcuts in iOS to control your Tesla with Siri.
All is not lost, however. Apple is finally showing some signs on getting on the AI bandwagon, and it’s highly likely that we may see a vastly improved version of Siri (Super-Siri?) in the not too distant future. Several publicly available Apple research papers hint at some of the areas that Apple has been focusing on. Here’s what might become part of a much, much better version of Siri.
AI-powered Siri that runs on your device
Use Super-Siri privately
LLMs such as ChatGPT are available as apps on your phone, but all the magic happens elsewhere. Your prompts are sent to ChatGPT’s servers, which process them using large amounts of computational power and models that contain billions of parameters that take up a lot of storage space. The response is then sent back to your phone.
Apple is actively working on creating an LLM that can run locally on your iPhone.
Apple is actively working on creating an LLM that can run locally on your iPhone. This is a significant challenge, since computational power and storage are limited on a smartphone. However, Apple has published a research paper about its endeavors to create an efficient LLM with limited memory and has produced some impressive results. This means that we could see a vastly improved version of Siri that can run completely offline, offering increased privacy, which is very Apple indeed.
Siri that understands you better
No more ‘calling Heather’ when you ask for the weather
Pocket-lint
All voice assistants can suffer with misinterpreting your commands. We’ve all had situations where we’ve asked a voice assistant for one thing, only for it to mishear us and give us something completely different. You say to Siri ‘wake me up at three’ and get the response ‘playing Wake Me Up by Avicii’.
Apple admits its latest iPad Pro ad ‘missed the mark’
Apple genuinely chose violence with its hydraulic press ad, only to walk it back a day later.
The good news is that another Apple research paper has focused on ranking intents for voice assistant commands. This paper discusses a method of choosing from multiple potential intents by considering contextual information to reduce ambiguity. Another paper discusses the use of LLMs to select several likely intents, rather than just one, and using these multiple intents to provide an answer that is likely to be more useful. The upshot is that Siri may become far more accurate at understanding what you mean.
Siri that understands what you’re looking at
Ask Siri to use content on your screen
One issue that Siri has always had is that she can’t see what you can see. And while it still may not be possible yet to ask ‘Siri, what’s that bird in the tree over there?’ it may soon be possible to refer to content on your screen.
Another Apple research paper proposes a model that can refer to text from the screen when dealing with user prompts. For example, if you’re on the contact page of a website, you could say ‘Siri, send this number to Alan’ and Siri would understand that you are referring to the phone number, would extract that number, and send it to Alan. The paper references the possibility of referencing different types of text, such as email addresses, URLs, and dates. In other words, Siri may soon be able to read alongside the things she is already capable of.
Ferret out parts of an image
You may be able to tell Siri where to look
Accessing text is fine, but Siri still can’t see the image you’re looking at. Or can it? Apple has developed Ferret, a multimodal large language model (MLLM) that can understand spatial references. This would vastly increase the types of things you can ask Siri, and allow you to draw a circle around part of an image, for example, and ask ‘Siri, what make of car is this?’ or ‘Siri, where can I buy these shoes?’
5 mind-blowing AI tools to try other than ChatGPT
The potential of gen AI is limitless. Here are a few tools — other than ChatGPT, Copilot, and Gemini — currently using it in fun, interesting ways.
Ferret also offers grounding capabilities. This is where the model will identify regions of an image based on a prompt, such as ‘Siri, where are all the monkeys in this image?’ The power of Ferret means that Siri could be able to identify objects that you draw around with your finger, or even solve Where’s Waldo for you. This has potentially huge implications; Siri could finally live up to the initial promise and change the way we interact with our phones.
A Siri without a wake word?
Say goodbye to Hey Siri
If you own an Amazon Echo device, and you live with someone whose name is Alexa, life must be hard. It is possible to change the wake word, but all of the major voice assistants require either a wake word (such as Hey Siri, or just Siri) or a gesture (such as raising your wrist with your Apple Watch) to get the voice assistant to start listening.
3:16
How to use your Amazon Echo as a Bluetooth speaker
Amazon’s Echo speakers offer plenty of smart home functionalities, but you can also connect to your smartphone to use as a Bluetooth speaker.
One Apple research paper indicates that a triggerless voice assistant may be on the way. It explores using a multimodal system to identify when someone is speaking to the virtual assistant, including analyzing the words spoken using an LLM as well as the audio waveform (commands might be louder in volume than background speech, for example). The study found that it was possible to accurately gauge when a voice command is being uttered. This could mean that you never need to say ‘Siri’ ever again.
Or maybe a wake word after all?
Hey Siri, but better
Omid Armin on Unsplash
Or maybe you might. Another Apple research paper took a different approach and focused not on removing the wake word but making the response to the wake word more accurate. The paper proposes that a multichannel acoustic model might be more accurate at recognizing a wake word than a single channel model, and this was found to be the case, with the multichannel model performing better in both quiet and noisy conditions.
If ‘Hey Siri’ does stick around, it may become far more accurate that it is currently.
No more repeating yourself
Siri could know when it’s a new question
A Super-Siri would be able to do this much better than it can currently, but the challenge is knowing when a question is referring to the current conversation and when it’s a brand new line of inquiry.
Siri is terrible at holding a conversation. If you try to reference something from a previous response, the chances are that Siri won’t know what you’re talking about. LLM chatbots are far better at this and can hold a conversation by referencing previous questions and responses.
A Super-Siri would be able to do this much better than it can currently, but the challenge is knowing when a question is referring to the current conversation and when it’s a brand new line of inquiry. Another research paper indicates that Apple has been working on something called STEER, which is a steering detection model that predicts whether a follow-up is an attempt to refine (or steer) a previous command, or a brand new command. This should make having multi-stage conversations with Siri much more effective.
Tell Siri how to edit your images
AI has a lot to offer when it comes to image editing. The Google Pixel 8 Pro focused largely on the AI editing capabilities, with the ability to fix group photos by selecting the best shot from multiple images for each person’s face, or to remove objects from images.
Apple has been working on what it calls instruction-based image editing. This would allow you to edit a photo just by asking Siri. For example, you could say ‘Siri, remove the person in the background’ or ‘Siri, add more contrast’ and the appropriate edits would be applied. This would make editing your images as simple as saying what you want.
The Pixel 8 Pro’s latest update allows users to record body temps. Here’s how
The Pixel 8 Pro’s Thermometer app can record body temps and random objects. We’ll show you how to use it, and why it might not be very accurate
Edit animations with Siri
Use prompts and editing together
Apple
Apple isn’t just focusing on instruction-based editing for images, either. A paper presenting an LLM-powered animation tool called Keyframer explains how it is possible to use a combination of prompts and manual edits to edit an animation based on static images.
For example, you could iterate on your 70s disco animation by asking to ‘make it even more disco’ and then manually remove the 38th disco ball because it’s just one too many. There’s the potential to be able to create bespoke animations with Siri’s help, that go far beyond Apple’s Animoji.
Trending Products