Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action

p.kaczmarek2 7647 25

Treść została przetłumaczona

Zobacz oryginalną wersję tematu

Reply Cool? Ranking DIY | New topic

Notify about new articles

» | Helpful post? (+10)

Post #1
20548901 20 Apr 2023 20:18

I will show here how you can use the OpenAI API to create a smart home assistant. Our assistant will have ChatGPT abilities, and at the same time will be able to control the status of lights (and other devices) in our house. The assistant will understand complex language commands, and will even be able to deduce from the context of the conversation what room we are in and turn on the light there. In addition, you will be able to talk to him as with ChatGPT, i.e. basically ask him about everything, sometimes with better, sometimes with worse results.

This topic will focus on using the OpenAI API and choosing the right prompt. In this topic, I will omit issues such as speech-to-text and then text-to-speech.

First experiment with prompt
ChatGPT can only write, but it can write a lot - as requested by us, it is able to role-play, behave appropriately and simulate many situations.
So I tried to 'ask' him to divide his speech into two sections - the "SAY" section (what he says) and the "CODE" section - what he does.
Here are my first attempts at prompts:

This language model is able to simultaneously remember (and act on) the initial prompt and at the same time have a general, casual conversation about cooking or whatever he has learned. When asked to perform an operation on the lighting, the model correctly uses the API offered to it . Pretty good, fit for assistant...

ChatGPT API library used
I wrote the prototype in C#, I used the OpenAI API library:
https://github.com/OkGoDoIt/OpenAI-API-dotnet
This API can be installed in Visual Studio via NuGet Packages:

Enter what you need and bingo:

In case of problems, we make sure that we have a sufficiently high version of the NET framework (I had to install 4.7.2 myself):

Of course, you need to buy an API (and an API key), but I wrote about it some time ago.
The OpenAI_API library makes everything easy. Here is the smallest example:

Code: C#
Log in, to see the code

It's really only three lines! A total of two...
The first is to run the API (entering our key), it can be done in several ways:

Code: C#
Log in, to see the code

Then you can use it at will. Here is an example of chatting with ChatGPT:

Code: C#
Log in, to see the code

The library also supports streaming of responses, i.e. loading them in real time, character by character. It looks like this, C# 8.0 version of the async iterator:

Code: C#
Log in, to see the code

Pre-C# 8.0 version:

Code: C#
Log in, to see the code

Action plan
Based on the collected information and experiments with prompts, an action plan for creating a home assistant was created. Here is its shortened version:
1. The system launches ChatGPT with a prompt for 'SAY' and 'CODE', along with a list of rooms
2. The user can communicate with ChatGPT through our application and ChatGPT API, ChatGPT can talk about anything, even making up their own stories
3. The application additionally parses ChatGPT messages and extracts CODE and SAY blocks separately from them, SAY is displayed and CODE is executed (e.g. by sending a request to the appropriate lights)

Connecting the lights to the system
This topic focuses on the use of the language model, but I will add that in my case the lights would be controlled, for example, by Tasmota HTTP, which supports both Tasmota and OpenBeken.
To turn on a given light, just send an HTTP request to it:

Code: text Expand Select all Copy to clipboard
http://192.168.0.201/cm?cmnd=POWER%20ON

But you can read about it, for example, in the Tasmota documentation.
Thanks to this connection of devices, we also have integration with Home Assistant - more precisely, Home Assistant can also control the same devices.

App demo
In order to test the idea, I prepared a small window application that allows you to enter answers/commands for AI, as well as add ready-made answers from a predefined list, in order to speed up testing.
The application also tracks the state of the (at the moment simulated) lights and colors the fragments of the conversation accordingly.
Here is a short demonstration:

As announced - ChatGPT is able to correctly understand loose commands given in natural language, it has no problem recognizing what we mean and correctly uses the API provided to it to control the lights. This is not an ordinary bot that looks for "Turn on kitchen" rigidly, here the entire advanced language model interprets our statements.
Here is a test of the context of the conversation, with the Home Assistant you can even talk about cooking while in the kitchen and then he will figure out where to turn on the light:

Continuing the conversation above - the AI infers from the context that you need to turn off the lights in the kitchen and turn them on in the dining room:

Continued - you can ask the AI to turn off all the lights:

Here's another example where the AI can guess what room it is without giving it a name:

In addition, the AI seems to understand the order in which the lights are turned on - it correctly responded to the request to turn off the light in the first room in question...
Another example - AI guesses the room from the context of the utterance:

Another example:

Problems encountered
I encountered a very interesting problem in the initial stage of the project, where my prompt used the word "CODE" instead of "DO" - then the AI insisted on executing supposed Python code and crashed when I asked for a random light:

Conversation transcript:

Code: text Expand Select all Copy to clipboard
Prompt: You are a Home Assistant. You can only prefix your messages with SAY if you want to say something, and CODE if you want to run code command. You can control lights by writing text command setLightEnabled(room,state), where room can be: Kitchen, Bedroom, Bathroom, Hall, Toilet, Dining Room, and state can be on or off.Do not use any other code commands.Please keep talking with user and use the command when asked to. Please do not ask user to use commands.Please do not use commands without any reason.Now, start conversation with user. AI raw reply: SAY: Hello there! How can I assist you today? AI says: Hello there! How can I assist you today? User: This is just a test. For a testing purpose, pick one room randomly and turn on there light. AI raw reply: CODE: import random room_list = ['Kitchen', 'Bedroom', 'Bathroom', 'Hall', 'Toilet', 'Dining Room'] random_room = random.choice(room_list) setLightEnabled(random_room, "on") SAY: I have randomly selected the {random_room} and turned on the light.

I modified the prompt so that it did not contain the word Code, which is associated with Python, and replaced it with the word DO:

Interesting observation - will the AI turn on a random light?
My other interesting observation is that the AI, when asked to turn on a random light, politely refuses us and sometimes explains that it would surprise the user and cause confusion. In order to get the desired result, we must first emphasize that we are turning on a random light as part of the test. Example below:

Example number two (requesting two random lights):

Interesting observation - will AI turn on the light in a room that is not on the list?
What will the language model do when we ask it to turn on a light in a room that is not on the list? Let's check:

The model doesn't make it up, it correctly detects that attic is not listed and refuses to perform the operation. Moreover, he gives a list of rooms known to him.

Potential problem
One of the problems I encountered during the tests is the tendency of the AI to occasionally send us links, which are of course pointless in the case of voice communication.

It's possible that a proper note in the prompt will fix it, but I haven't checked that yet.

Playing with light?
Now for the most interesting experiment. Let's propose an AI game - the AI turns on all the lights and asks the user math questions. If the user answers incorrectly, they turn off one light. Will AI make it?
Let's start the fun:

Quote:

Let's play a game. First turn on all lights with DO. Then ask me math questions. If I answer correctly, do nothing. If I give wrong answer, use DO to turn off random light.

It seriously works:

Unfortunately, after a while there is a problem... the language model spilled on multiplication:

I'm hoping GPT 4 will be a bit better at this.

Summary
An appropriate prompt makes the ChatGPT language model able to sensibly control the lighting based on the conversation with the user. The model quite sensibly keeps the context of the conversation and uses the primitive API available to it, while still offering everything that it offers in its normal version. The model is even able to "understand" the context of the conversation so that it can deduce from it where we are and what light we want to turn on.
I see here the potential for an "intelligent" helper for the disabled or sick, in an ordinary home some people would probably like such a gadget, although I don't think it is necessary. It's just a useless story.
So far, I've only done preliminary tests, but it would be worth checking out:
- how does this model cope with a longer conversation (does it lose the thread, maybe the soft should reset the conversation)
- how the system copes with more lights
- whether the system can be developed and, for example, give it control over the brightness of light or color
- you can also consider connecting other systems so that AI, for example, controls heating and understands the "it's too hot in the kitchen" command, etc.
- I also wonder if GPT-4 would do better here
In the topic, I also showed several situations where the ChatGPT model crashed. The first was where the use of the word "CODE" (later replaced by "DO") to denote light control made the model think he was writing in Python, and after asking for a random light to turn on instead of just selecting a room, he wrote the code room draw... the second situation was with a fun game where I asked the model to turn off the lights if I answered wrong. He was a bit wrong here.
I am posting the code of the demo - written quickly, of course without the exe and without the key, you need to complete Form1.cs with your own key before starting.

Attachments:

ChatGPT3-nokey-20230420.zip Download (15.18 KB) Punkty: 4 for the user

Cool? Ranking DIY
Helpful post? Buy me a coffee.
About Author
p.kaczmarek2 p.kaczmarek2

Moderator Smart Home
Offline

Joined: 26 Dec 2014

Posts: 11986

Help: 573

Posts rating: 10015

Points: 114792
p.kaczmarek2 wrote 11986 posts with rating 10015, helped 573 times. Been with us since 2014 year.
ADVERTISEMENT
#2 20549325 21 Apr 2023 07:38

LordZiemniak LordZiemniak

Level 15

» | Helpful post? (+2)

Post #2
20549325 21 Apr 2023 07:38

Everything is fine, now it's a matter of time when it will be like in s / f movies you enter and talk to the house, he makes coffee, prepares the bathroom for a bath, makes the right temperature, turns on the radio, turns on an even more intelligent thermomix, based on shortages in the fridge, he does it himself shopping and these are delivered by a drone to the house, offers us a menu, etc ... well, cool, only now we won't pay for the subscription and kuniec kindness .... apart from total surveillance and enslavement
ADVERTISEMENT
#3 20549396 21 Apr 2023 08:47

ADI-mistrzu ADI-mistrzu

Level 30

» | Helpful post? (0)

Post #3
20549396 21 Apr 2023 08:47

@lucaspatecki only soon there will be such systems operating locally without the need for a network connection.
So the communication itself will be local, do not be afraid
#4 20549443 21 Apr 2023 09:41

gulson gulson

System Administrator

» | Helpful post? (0)

Post #4
20549443 21 Apr 2023 09:41

I'm waiting for a solution with speech recognition, you can use Whisper! :)

By the way, they just released Whisper for speech recognition that you can run on your computer.
https://github.com/ggerganov/whisper.cpp

Language models are already betting, even on the Raspberry Pi.
And that would be it, the world will not be the same.
#5 20549446 21 Apr 2023 09:44

czareqpl czareqpl

Level 33

» | Helpful post? (0)

Post #5
20549446 21 Apr 2023 09:44

The first post under the article and already DOOM, destruction and enslavement ... Or maybe just such a comfortable world as in the second part of "Back to the Future"?

Besides, the subscription to this facilitation is VOLUNTARY. You don't pay, you do it yourself.

It's the same with using a car. The subscription expires when the fuel in the tank is used up. If you want to go further - you pay. If you don't pay it on foot or by bike. And there is also surveillance, because after where you refuel, SYSTEM knows where you are.
ADVERTISEMENT
#6 20549533 21 Apr 2023 11:01

khoam khoam

Level 42

» | Helpful post? (+2)

Post #6
20549533 21 Apr 2023 11:01

ADI-mistrzu wrote:
only soon there will be such systems operating locally without the need for a connection to the network.

I very much doubt it. Already today, HA solutions in the "cloud" version offer more functionality than the same, in the "home" version without access to the Internet.

czareqpl wrote:
It's the same with using a car. The subscription expires when the fuel in the tank is used up. If you want to go further - you pay.

Completely wrong comparison. The car does not require paying a permanent subscription to be able to get into it and then go, for example, to a gas station to refuel.

Added after 3 [minutes]:

gulson wrote:
And that would be it, the world will not be the same.
#7 20549555 21 Apr 2023 11:23

czareqpl czareqpl

Level 33

» | Helpful post? (+1)

Post #7
20549555 21 Apr 2023 11:23

khoam wrote:
Completely wrong comparison. The car does not require paying a permanent subscription to be able to get into it and then go, for example, to a gas station to refuel.

Similarly, you do not need to pay a subscription to Home Assistant equipment to be able to connect it to electricity and enter the control panel to pay for the next month of use.

And where is the fuel in the tank to travel to the station to refuel? From a previously paid "subscription".
Those who rebel and refuse to pay the "subscription" to the very end, go with a canister
#8 20549601 21 Apr 2023 11:53

gulson gulson

System Administrator

» | Helpful post? (0)

Post #8
20549601 21 Apr 2023 11:53

Home Assistant is free, only external access service is paid if you use NAT. You can use services that forward ports and save our current IP and expose Home Assistant outside.
#9 20549632 21 Apr 2023 13:11

khoam khoam

Level 42

» | Helpful post? (+1)

Post #9
20549632 21 Apr 2023 13:11

Quote:
You can use services that forward ports and save our current IP and expose Home Assistant outside.

While opening the appropriate ports on the home router will be possible and feasible by Jan Kowalski - 99% percent of the population does not even know what these ports are and why it can be dangerous.

Added after 1 [hour] 1 [minute]:

Quote:
I am waiting for a solution with speech recognition, you can use Whisper!

I'd be careful with that too. There are already available applications (based on the so-called AI) that can talk in the voice of a selected person, based on samples collected, for example, by phone.
#10 20549746 21 Apr 2023 13:24

czareqpl czareqpl

Level 33

» | Helpful post? (0)

Post #10
20549746 21 Apr 2023 13:24

The Microsoft Azure platform provides speech-to-text recognition. There is a simple API for python. I used to play with it to dictate the content of SMS to the GSM module from the PC.
#11 20549957 21 Apr 2023 16:11

ADI-mistrzu ADI-mistrzu

Level 30

» | Helpful post? (0)

Post #11
20549957 21 Apr 2023 16:11

khoam wrote:
I very much doubt it. Already today, HA solutions in the "cloud" version offer more functionality than the same, in the "home" version without access to the Internet.

I started Polish speech recognition on MCU locally without internet access, any commands, any wakeup word.
What I was able to run at the moment max is recognizing 4 languages at the same time.
#12 20549960 21 Apr 2023 16:14

khoam khoam

Level 42

» | Helpful post? (0)

Post #12
20549960 21 Apr 2023 16:14

@ADI-mistrzu I mean commercial solutions.
#13 20550204 21 Apr 2023 20:15

krzbor krzbor

Level 27

» | Helpful post? (0)

Post #13
20550204 21 Apr 2023 20:15

ADI-mistrzu wrote:

khoam wrote:
I very much doubt it. Already today, HA solutions in the "cloud" version offer more functionality than the same, in the "home" version without access to the Internet.

I started Polish speech recognition on MCU locally without internet access, any commands, any wakeup word.
What I managed to run at the moment max is recognizing 4 languages at the same time.

Can you describe this solution? I am interested in IOT voice control.
@p.kaczmarek2's description is very promising. If you add Whisper and an additional invoking command (the idea is to fully control what goes online and what doesn't), it would be quite a promising solution. However, an offline solution will always have a speed advantage.
#14 20550253 21 Apr 2023 20:37

khoam khoam

Level 42

» | Helpful post? (+1)

Post #14
20550253 21 Apr 2023 20:37

@krzbor Have you heard of ESP-Skainet? It can work offline. link
ADVERTISEMENT
#15 20550346 21 Apr 2023 21:39

krzbor krzbor

Level 27

» | Helpful post? (0)

Post #15
20550346 21 Apr 2023 21:39

It looks interesting, but from what I've read, it recognizes Chinese and English. I care about the Polish language.
I found something like this Vosk . Has anyone used this?
#16 20550434 21 Apr 2023 23:08

rafels rafels

Level 25

» | Helpful post? (+1)

Post #16
20550434 21 Apr 2023 23:08

I used Vosk, it's nothing more than a multilingual API for Kaldi. Very convenient and easy to use tool, I used python version. It requires models prepared in the Kaldi ASR framework to work. I don't know if anything has changed now, but unfortunately a year ago no acoustic models for the Polish language were available on the Vosk website. You can look for Polish models provided by Clarin.
You can also train the Polish model yourself, but it requires a lot of cognitive work, a large number of tagged recordings and a lot of computing resources, so it's out of the question.
I developed an application in CPP based on Kaldi and Polish models as part of research and development work in the company where I worked. And as a hobby, I dreamed of small offline voice-controlled devices. So I was doing experiments on raspberry. Starting with Pi 0 (now Vosk is no longer compiled for this architecture) but up to Pi 3 it was impossible to process free speech in real time. It is possible to process saved samples, but not the signal from the microphone. The small English language models still worked on pi 3 somehow in realtime, but with very limited grammar, only for command and controll applications. On Pi 4 it is already possible in realtome. I even tested the specially slimmed-down FST grammar models for weaker raspberries, which were supposed to recognize only a dozen commands, unlike the tens of thousands of nGram models, with no result. I made lean acoustic models by reducing the dimensions of the network and accepting a higher error rate, but that also did not help. When Pi zero 2 came out some time ago, I bought it specially, or even the simplest Polish models will be processed in realtime, but unfortunately they were not. Also for now, I abandoned the topic of small voice-controlled devices. Although I have a dormant idea for some time to put a Kaldi server on my home raspberry Pi 4 and control the audio from the microphones on the esp, so that I can, for example, control the HA that works there. I know, everything can be easily done in the cloud, but that's not the point 😃

Out of curiosities, I also did experiments combining the full extensive Kaldi ASR model with the NLP Rasa framerow. But it's already on PC.
If someone wants to play with training language models, I recommend Ras. It's easy to get started with NLP and quickly start creating simple dedicated bots in text chat. Of course, not even lying next to the GPT models, but it's fun and because the results can surprise.
#17 20550438 21 Apr 2023 23:15

krzbor krzbor

Level 27

» | Helpful post? (0)

Post #17
20550438 21 Apr 2023 23:15

Thank you for valuable information. I wrote about VOSK because it has many language models link including Polish: vosk-model-small-pl-0.22. The only question is whether the model with 50 MB is suitable for anything? Will, for example, the sound from an electret microphone from a distance of e.g. 3m be recognized?
#18 20550456 21 Apr 2023 23:36

rafels rafels

Level 25

» | Helpful post? (+1)

Post #18
20550456 21 Apr 2023 23:36

Oh, it's nice that someone added the Polish model. I'd love to test it sometime. In general, recognition is different, a lot depends on the training data and their acoustic conditions, interference, reverberation. The omnidirectional microphone from 3m may not be recognized the best. It's better to have a directional one and preferably something like Respiker.
It's best to test this model yourself, there is a folder with example on github vosk api. There is a finished example using a microphone in python. Very easy.
These models consist of two parts. At the input, the AM acoustic model is a neural network and context phonemes, incomprehensible vectors of numbers, come out of it. It hits the HCLG grammar model, it's such a huge FST directed graph. And only speech comes out of this, but also written in the form of a graph, only the best path (the most probable) in this graph will be a set of words understandable to humans.
With a well-trained AM, you can easily create and replace grammar models, adapting them to your field and your needs and the desired set of words. The basics of this are readily available on the Vosk pages and more extensively on the Kaldi pages.
#19 20550651 22 Apr 2023 09:52

gulson gulson

System Administrator

» | Helpful post? (+1)

Post #19
20550651 22 Apr 2023 09:52

And here is a free library for generating Polish speech and sounds:
https://github.com/suno-ai/bark
Bark can generate highly realistic, multilingual speech and other sounds - including music, background noise and simple sound effects. The model can also generate non-verbal messages such as laughing, sighing and crying.
#20 20551312 22 Apr 2023 20:58

ADI-mistrzu ADI-mistrzu

Level 30

» | Helpful post? (0)

Post #20
20551312 22 Apr 2023 20:58

@krzbor, read about it and what engine it is on:
https://www.nxp.com/design/designs/nxp-edgere...tion-for-local-voice-control:MCU-LOCAL2-VOICE

Soon these will also support the Polish language:
https://www.nxp.com/applications/enabling-technologies/voice-processing:VOICE
#21 20551534 22 Apr 2023 23:47

gulson gulson

System Administrator
» | Helpful post? (+1)

Post #21
20551534 22 Apr 2023 23:47

And here is the audio generated by Bark.
Unfortunately, if you don't have the right GPU, it took 30 minutes.
With a good GPU, it practically generates speech right away.

Attachments:

audio.wav Download (1.35 MB)
#22 20552219 23 Apr 2023 15:21

dawid123123123 dawid123123123

Level 9

» | Helpful post? (0)

Post #22
20552219 23 Apr 2023 15:21

An interesting idea, but not feasible for the average blacksmith
#23 20558746 28 Apr 2023 11:45

gulson gulson

System Administrator
» | Helpful post? (+2)

Post #23
20558746 28 Apr 2023 11:45

Article read? Why not. listen.

Attachments:

J7YR5iRYFIbSy52YOeKG.mp3 Download (369.59 KB)
#24 20580487 15 May 2023 00:51

NorthernMan54 NorthernMan54

Level 4
Helpful post? (0)

Post #24
20580487 15 May 2023 00:51

>>20548901

For people who want to give this a try and are a node-red user, I recreated this using node-red and the flow https://flows.nodered.org/node/node-red-contrib-custom-chatgpt.

You will need to supply the API_KEY and organization.

Now to see what is possible with this.

Attachments:

flows.json.zip Download (3.97 KB)
#25 20580571 15 May 2023 08:21

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

Topic author Helpful post? (0)

Post #25
20580571 15 May 2023 08:21

Thank you! I haven't tested your script yet, but it looks interesting. I have only basic knowledge about Node-Red, but I will give it a try.

I might be soon creating a Github repo for that prompt mini project, can I include your version there as well?

I also need to experiment with extra brightness control....

PS: @NorthernMan54 does your code handle edge case where ChatGPT responds with message containing CODE but later with text without SAY prefix? It happened for me several time and my C# code parses that correctly.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#26 20581202 15 May 2023 16:00

NorthernMan54 NorthernMan54

Level 4

Helpful post? (0)

Post #26
20581202 15 May 2023 16:00

Quote:
I might be soon creating a Github repo for that prompt mini project, can I include your version there as well?

Yes you can, it was just a little bit of playing around to see how the concept worked. My next steps is to enable it to control my Homebridge setup.

Quote:
PS: @NorthernMan54 does your code handle edge case where ChatGPT responds with message containing CODE but later with text without SAY prefix? It happened for me several time and my C# code parses that correctly.

I found that it never used the SAY phrase, but it was consistent with DO. My logic was to split the result into lines based on `\r`, and any lines starting with DO were sent to the home automation flow, and every thing else sent to a different output.
Create an account, log in and become active in a forum and ads will not appear. You will receive points by participating in discussions.
Join this discussion.

Install the application

Reply Cool? Ranking DIY | New topic

Notify about new articles

Report a violation of the law

Topic summary

The discussion revolves around creating a smart home assistant using the OpenAI API, specifically leveraging ChatGPT's capabilities to control home lighting and other devices through natural language commands. Participants express excitement about future advancements in home automation, including local operation without internet dependency and the integration of speech recognition technologies like Whisper. Various solutions for offline voice control are explored, including ESP-Skainet and Vosk, with emphasis on the importance of language models and their training for effective recognition. The conversation also touches on the implications of subscription models for smart home systems and the potential for local processing to enhance privacy and performance.
Summary generated by the language model.