Elektroda.com
Elektroda.com
X

Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action

p.kaczmarek2 4380 25
This content has been translated flag-pl » flag-en View the original version here.
  • Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    I will show here how you can use the OpenAI API to create a smart home assistant. Our assistant will have ChatGPT abilities, and at the same time will be able to control the status of lights (and other devices) in our house. The assistant will understand complex language commands, and will even be able to deduce from the context of the conversation what room we are in and turn on the light there. In addition, you will be able to talk to him as with ChatGPT, i.e. basically ask him about everything, sometimes with better, sometimes with worse results.

    This topic will focus on using the OpenAI API and choosing the right prompt. In this topic, I will omit issues such as speech-to-text and then text-to-speech.

    First experiment with prompt
    ChatGPT can only write, but it can write a lot - as requested by us, it is able to role-play, behave appropriately and simulate many situations.
    So I tried to 'ask' him to divide his speech into two sections - the "SAY" section (what he says) and the "CODE" section - what he does.
    Here are my first attempts at prompts:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    This language model is able to simultaneously remember (and act on) the initial prompt and at the same time have a general, casual conversation about cooking or whatever he has learned. When asked to perform an operation on the lighting, the model correctly uses the API offered to it . Pretty good, fit for assistant...

    ChatGPT API library used
    I wrote the prototype in C#, I used the OpenAI API library:
    https://github.com/OkGoDoIt/OpenAI-API-dotnet
    This API can be installed in Visual Studio via NuGet Packages:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Enter what you need and bingo:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    In case of problems, we make sure that we have a sufficiently high version of the NET framework (I had to install 4.7.2 myself):
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Of course, you need to buy an API (and an API key), but I wrote about it some time ago.
    The OpenAI_API library makes everything easy. Here is the smallest example:
    Code: csharp
    Log in, to see the code

    It's really only three lines! A total of two...
    The first is to run the API (entering our key), it can be done in several ways:
    Code: csharp
    Log in, to see the code

    Then you can use it at will. Here is an example of chatting with ChatGPT:
    Code: csharp
    Log in, to see the code

    The library also supports streaming of responses, i.e. loading them in real time, character by character. It looks like this, C# 8.0 version of the async iterator:
    Code: csharp
    Log in, to see the code

    Pre-C# 8.0 version:
    Code: csharp
    Log in, to see the code



    Action plan
    Based on the collected information and experiments with prompts, an action plan for creating a home assistant was created. Here is its shortened version:
    1. The system launches ChatGPT with a prompt for 'SAY' and 'CODE', along with a list of rooms
    2. The user can communicate with ChatGPT through our application and ChatGPT API, ChatGPT can talk about anything, even making up their own stories
    3. The application additionally parses ChatGPT messages and extracts CODE and SAY blocks separately from them, SAY is displayed and CODE is executed (e.g. by sending a request to the appropriate lights)

    Connecting the lights to the system
    This topic focuses on the use of the language model, but I will add that in my case the lights would be controlled, for example, by Tasmota HTTP, which supports both Tasmota and OpenBeken.
    To turn on a given light, just send an HTTP request to it:
    
    http://192.168.0.201/cm?cmnd=POWER%20ON
    

    But you can read about it, for example, in the Tasmota documentation.
    Thanks to this connection of devices, we also have integration with Home Assistant - more precisely, Home Assistant can also control the same devices.

    App demo
    In order to test the idea, I prepared a small window application that allows you to enter answers/commands for AI, as well as add ready-made answers from a predefined list, in order to speed up testing.
    The application also tracks the state of the (at the moment simulated) lights and colors the fragments of the conversation accordingly.
    Here is a short demonstration:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    As announced - ChatGPT is able to correctly understand loose commands given in natural language, it has no problem recognizing what we mean and correctly uses the API provided to it to control the lights. This is not an ordinary bot that looks for "Turn on kitchen" rigidly, here the entire advanced language model interprets our statements.
    Here is a test of the context of the conversation, with the Home Assistant you can even talk about cooking while in the kitchen and then he will figure out where to turn on the light:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Continuing the conversation above - the AI infers from the context that you need to turn off the lights in the kitchen and turn them on in the dining room:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Continued - you can ask the AI to turn off all the lights:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Here's another example where the AI can guess what room it is without giving it a name:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    In addition, the AI seems to understand the order in which the lights are turned on - it correctly responded to the request to turn off the light in the first room in question...
    Another example - AI guesses the room from the context of the utterance:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Another example:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action

    Problems encountered
    I encountered a very interesting problem in the initial stage of the project, where my prompt used the word "CODE" instead of "DO" - then the AI insisted on executing supposed Python code and crashed when I asked for a random light:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Conversation transcript:
    
    Prompt: You are a Home Assistant. You can only prefix your messages with SAY if you want to say something, and CODE if you want to run code command. You can control lights by writing text command setLightEnabled(room,state), where room can be: Kitchen, Bedroom, Bathroom, Hall, Toilet, Dining Room, and state can be on or off.Do not use any other code commands.Please keep talking with user and use the command when asked to. Please do not ask user to use commands.Please do not use commands without any reason.Now, start conversation with user.
    AI raw reply: SAY: Hello there! How can I assist you today?
    AI says: Hello there! How can I assist you today?
    User: This is just a test. For a testing purpose, pick one room randomly and turn on there light.
    AI raw reply: CODE: 
    import random
    room_list = ['Kitchen', 'Bedroom', 'Bathroom', 'Hall', 'Toilet', 'Dining Room']
    random_room = random.choice(room_list)
    setLightEnabled(random_room, "on")
    SAY: I have randomly selected the {random_room} and turned on the light.
    

    I modified the prompt so that it did not contain the word Code, which is associated with Python, and replaced it with the word DO:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action

    Interesting observation - will the AI turn on a random light?
    My other interesting observation is that the AI, when asked to turn on a random light, politely refuses us and sometimes explains that it would surprise the user and cause confusion. In order to get the desired result, we must first emphasize that we are turning on a random light as part of the test. Example below:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Example number two (requesting two random lights):
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action

    Interesting observation - will AI turn on the light in a room that is not on the list?
    What will the language model do when we ask it to turn on a light in a room that is not on the list? Let's check:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    The model doesn't make it up, it correctly detects that attic is not listed and refuses to perform the operation. Moreover, he gives a list of rooms known to him.

    Potential problem
    One of the problems I encountered during the tests is the tendency of the AI to occasionally send us links, which are of course pointless in the case of voice communication.
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    It's possible that a proper note in the prompt will fix it, but I haven't checked that yet.


    Playing with light?
    Now for the most interesting experiment. Let's propose an AI game - the AI turns on all the lights and asks the user math questions. If the user answers incorrectly, they turn off one light. Will AI make it?
    Let's start the fun:
    Quote:

    Let's play a game. First turn on all lights with DO. Then ask me math questions. If I answer correctly, do nothing. If I give wrong answer, use DO to turn off random light.

    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    It seriously works:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    Unfortunately, after a while there is a problem... the language model spilled on multiplication:
    Home assistant for controlling lights - prompt ChatGPT - OpenAI API in action
    I'm hoping GPT 4 will be a bit better at this.


    Summary
    An appropriate prompt makes the ChatGPT language model able to sensibly control the lighting based on the conversation with the user. The model quite sensibly keeps the context of the conversation and uses the primitive API available to it, while still offering everything that it offers in its normal version. The model is even able to "understand" the context of the conversation so that it can deduce from it where we are and what light we want to turn on.
    I see here the potential for an "intelligent" helper for the disabled or sick, in an ordinary home some people would probably like such a gadget, although I don't think it is necessary. It's just a useless story.
    So far, I've only done preliminary tests, but it would be worth checking out:
    - how does this model cope with a longer conversation (does it lose the thread, maybe the soft should reset the conversation)
    - how the system copes with more lights
    - whether the system can be developed and, for example, give it control over the brightness of light or color
    - you can also consider connecting other systems so that AI, for example, controls heating and understands the "it's too hot in the kitchen" command, etc.
    - I also wonder if GPT-4 would do better here
    In the topic, I also showed several situations where the ChatGPT model crashed. The first was where the use of the word "CODE" (later replaced by "DO") to denote light control made the model think he was writing in Python, and after asking for a random light to turn on instead of just selecting a room, he wrote the code room draw... the second situation was with a fun game where I asked the model to turn off the lights if I answered wrong. He was a bit wrong here.
    I am posting the code of the demo - written quickly, of course without the exe and without the key, you need to complete Form1.cs with your own key before starting.

    Cool? Ranking DIY
    About Author
    p.kaczmarek2
    Moderator Smart Home
    Offline 
    p.kaczmarek2 wrote 5827 posts with rating 5812, helped 278 times. Been with us since 2014 year.
  • #2
    LordZiemniak
    Level 13  
    Everything is fine, now it's a matter of time when it will be like in s / f movies you enter and talk to the house, he makes coffee, prepares the bathroom for a bath, makes the right temperature, turns on the radio, turns on an even more intelligent thermomix, based on shortages in the fridge, he does it himself shopping and these are delivered by a drone to the house, offers us a menu, etc ... well, cool, only now we won't pay for the subscription and kuniec kindness .... apart from total surveillance and enslavement
  • #3
    ADI-mistrzu
    Level 30  
    @LordZiemniak only soon there will be such systems operating locally without the need for a network connection.
    So the communication itself will be local, do not be afraid :)
  • #4
    gulson
    System Administrator
    I'm waiting for a solution with speech recognition, you can use Whisper! :)

    By the way, they just released Whisper for speech recognition that you can run on your computer.
    https://github.com/ggerganov/whisper.cpp

    Language models are already betting, even on the Raspberry Pi.
    And that would be it, the world will not be the same.
  • #5
    czareqpl
    Level 33  
    The first post under the article and already DOOM, destruction and enslavement ... Or maybe just such a comfortable world as in the second part of "Back to the Future"?

    Besides, the subscription to this facilitation is VOLUNTARY. You don't pay, you do it yourself.

    It's the same with using a car. The subscription expires when the fuel in the tank is used up. If you want to go further - you pay. If you don't pay it on foot or by bike. And there is also surveillance, because after where you refuel, SYSTEM knows where you are.
  • #6
    khoam
    Level 42  
    ADI-mistrzu wrote:
    only soon there will be such systems operating locally without the need for a connection to the network.

    I very much doubt it. Already today, HA solutions in the "cloud" version offer more functionality than the same, in the "home" version without access to the Internet.

    czareqpl wrote:
    It's the same with using a car. The subscription expires when the fuel in the tank is used up. If you want to go further - you pay.

    Completely wrong comparison. The car does not require paying a permanent subscription to be able to get into it and then go, for example, to a gas station to refuel.

    Added after 3 [minutes]:

    gulson wrote:
    And that would be it, the world will not be the same.




  • #7
    czareqpl
    Level 33  
    khoam wrote:
    Completely wrong comparison. The car does not require paying a permanent subscription to be able to get into it and then go, for example, to a gas station to refuel.


    Similarly, you do not need to pay a subscription to Home Assistant equipment to be able to connect it to electricity and enter the control panel to pay for the next month of use.

    And where is the fuel in the tank to travel to the station to refuel? From a previously paid "subscription".
    Those who rebel and refuse to pay the "subscription" to the very end, go with a canister :)
  • #8
    gulson
    System Administrator
    Home Assistant is free, only external access service is paid if you use NAT. You can use services that forward ports and save our current IP and expose Home Assistant outside.
  • #9
    khoam
    Level 42  
    Quote:
    You can use services that forward ports and save our current IP and expose Home Assistant outside.

    While opening the appropriate ports on the home router will be possible and feasible by Jan Kowalski - 99% percent of the population does not even know what these ports are and why it can be dangerous.

    Added after 1 [hour] 1 [minute]:

    Quote:
    I am waiting for a solution with speech recognition, you can use Whisper!

    I'd be careful with that too. There are already available applications (based on the so-called AI) that can talk in the voice of a selected person, based on samples collected, for example, by phone.
  • #10
    czareqpl
    Level 33  
    The Microsoft Azure platform provides speech-to-text recognition. There is a simple API for python. I used to play with it to dictate the content of SMS to the GSM module from the PC.
  • #11
    ADI-mistrzu
    Level 30  
    khoam wrote:
    I very much doubt it. Already today, HA solutions in the "cloud" version offer more functionality than the same, in the "home" version without access to the Internet.

    I started Polish speech recognition on MCU locally without internet access, any commands, any wakeup word.
    What I was able to run at the moment max is recognizing 4 languages at the same time.
  • #12
    khoam
    Level 42  
    @ADI-mistrzu I mean commercial solutions.
  • #13
    krzbor
    Level 25  
    ADI-mistrzu wrote:
    khoam wrote:
    I very much doubt it. Already today, HA solutions in the "cloud" version offer more functionality than the same, in the "home" version without access to the Internet.

    I started Polish speech recognition on MCU locally without internet access, any commands, any wakeup word.
    What I managed to run at the moment max is recognizing 4 languages at the same time.

    Can you describe this solution? I am interested in IOT voice control.
    @p.kaczmarek2's description is very promising. If you add Whisper and an additional invoking command (the idea is to fully control what goes online and what doesn't), it would be quite a promising solution. However, an offline solution will always have a speed advantage.
  • #14
    khoam
    Level 42  
    @krzbor Have you heard of ESP-Skainet? It can work offline. link
  • #15
    krzbor
    Level 25  
    It looks interesting, but from what I've read, it recognizes Chinese and English. I care about the Polish language.
    I found something like this Vosk . Has anyone used this?
  • #16
    rafels
    Level 25  
    I used Vosk, it's nothing more than a multilingual API for Kaldi. Very convenient and easy to use tool, I used python version. It requires models prepared in the Kaldi ASR framework to work. I don't know if anything has changed now, but unfortunately a year ago no acoustic models for the Polish language were available on the Vosk website. You can look for Polish models provided by Clarin.
    You can also train the Polish model yourself, but it requires a lot of cognitive work, a large number of tagged recordings and a lot of computing resources, so it's out of the question.
    I developed an application in CPP based on Kaldi and Polish models as part of research and development work in the company where I worked. And as a hobby, I dreamed of small offline voice-controlled devices. So I was doing experiments on raspberry. Starting with Pi 0 (now Vosk is no longer compiled for this architecture) but up to Pi 3 it was impossible to process free speech in real time. It is possible to process saved samples, but not the signal from the microphone. The small English language models still worked on pi 3 somehow in realtime, but with very limited grammar, only for command and controll applications. On Pi 4 it is already possible in realtome. I even tested the specially slimmed-down FST grammar models for weaker raspberries, which were supposed to recognize only a dozen commands, unlike the tens of thousands of nGram models, with no result. I made lean acoustic models by reducing the dimensions of the network and accepting a higher error rate, but that also did not help. When Pi zero 2 came out some time ago, I bought it specially, or even the simplest Polish models will be processed in realtime, but unfortunately they were not. Also for now, I abandoned the topic of small voice-controlled devices. Although I have a dormant idea for some time to put a Kaldi server on my home raspberry Pi 4 and control the audio from the microphones on the esp, so that I can, for example, control the HA that works there. I know, everything can be easily done in the cloud, but that's not the point 😃

    Out of curiosities, I also did experiments combining the full extensive Kaldi ASR model with the NLP Rasa framerow. But it's already on PC.
    If someone wants to play with training language models, I recommend Ras. It's easy to get started with NLP and quickly start creating simple dedicated bots in text chat. Of course, not even lying next to the GPT models, but it's fun and because the results can surprise.
  • #17
    krzbor
    Level 25  
    Thank you for valuable information. I wrote about VOSK because it has many language models link including Polish: vosk-model-small-pl-0.22. The only question is whether the model with 50 MB is suitable for anything? Will, for example, the sound from an electret microphone from a distance of e.g. 3m be recognized?
  • #18
    rafels
    Level 25  
    Oh, it's nice that someone added the Polish model. I'd love to test it sometime. In general, recognition is different, a lot depends on the training data and their acoustic conditions, interference, reverberation. The omnidirectional microphone from 3m may not be recognized the best. It's better to have a directional one and preferably something like Respiker.
    It's best to test this model yourself, there is a folder with example on github vosk api. There is a finished example using a microphone in python. Very easy.
    These models consist of two parts. At the input, the AM acoustic model is a neural network and context phonemes, incomprehensible vectors of numbers, come out of it. It hits the HCLG grammar model, it's such a huge FST directed graph. And only speech comes out of this, but also written in the form of a graph, only the best path (the most probable) in this graph will be a set of words understandable to humans.
    With a well-trained AM, you can easily create and replace grammar models, adapting them to your field and your needs and the desired set of words. The basics of this are readily available on the Vosk pages and more extensively on the Kaldi pages.
  • #19
    gulson
    System Administrator
    And here is a free library for generating Polish speech and sounds:
    https://github.com/suno-ai/bark
    Bark can generate highly realistic, multilingual speech and other sounds - including music, background noise and simple sound effects. The model can also generate non-verbal messages such as laughing, sighing and crying.
  • #21
    gulson
    System Administrator
    And here is the audio generated by Bark.
    Unfortunately, if you don't have the right GPU, it took 30 minutes.
    With a good GPU, it practically generates speech right away.
  • #22
    dawid123123123
    Level 8  
    An interesting idea, but not feasible for the average blacksmith
  • #25
    p.kaczmarek2
    Moderator Smart Home
    Thank you! I haven't tested your script yet, but it looks interesting. I have only basic knowledge about Node-Red, but I will give it a try.

    I might be soon creating a Github repo for that prompt mini project, can I include your version there as well?

    I also need to experiment with extra brightness control....

    PS: @NorthernMan54 does your code handle edge case where ChatGPT responds with message containing CODE but later with text without SAY prefix? It happened for me several time and my C# code parses that correctly.
  • #26
    NorthernMan54
    Level 3  
    Quote:
    I might be soon creating a Github repo for that prompt mini project, can I include your version there as well?


    Yes you can, it was just a little bit of playing around to see how the concept worked. My next steps is to enable it to control my Homebridge setup.

    Quote:
    PS: @NorthernMan54 does your code handle edge case where ChatGPT responds with message containing CODE but later with text without SAY prefix? It happened for me several time and my C# code parses that correctly.


    I found that it never used the SAY phrase, but it was consistent with DO. My logic was to split the result into lines based on `\r`, and any lines starting with DO were sent to the home automation flow, and every thing else sent to a different output.