logo elektroda
logo elektroda
X
logo elektroda

BK7231N Door Sensor Bricking After Battery Drain: Flash Write Issue and Solutions

protectivedad 273 9
ADVERTISEMENT
  • #1 21819205
    protectivedad
    Level 9  
    Has anyone tracked down the problem with battery devices that become bricked after the batteries drain? I've seen it mentioned but I can't remember where I saw the information and I'm not using the right search to find it.

    I just had it happen to my non-TuyaMCU door sensor. Unfortunately, I didn't save a backup of the FUBAR'd firmware before erasing it. I did hook it up the serial log and saw:
    
    V:BK7231N_1.0.1
    REG:cpsr     spsr     r13      r14
    SVC:000000D3          00401C1C 000033AC
    IRQ:000000d2 00000010 00401e0c 3ffff7f5
    FIR:000000d1 00000010 00401ffc 77ff856f
    SYS:000000df          0040192c 00000158
    ST:00000003
    J 0x10000
    bk_misc_init_start_type 3 3
    prvHeapInit-start addr:0x412bb8, size:119880
    [Flash]id:0x1c7015
    sctrl_sta_ps_init
    cset:0 0 0 0
    OpenBK7231N, version DoorSensors_88af71253b87_doorsensor
    Entering initLog()...
    Commands registered!
    initLog() done!
    Info:GEN:0 - Main_Init_Before_Delay
    


    From this I assumed either the flash Vars or flash CFG had become corrupted. For my device I rewrote the flash vars code to be a little better in how it stores and finds the latest stuff on boot. During this I did think of two solutions that people might benefit from.

    The first is a simple one, after the device registers enough failed boots to trigger RESTARTS_REQUIRED_FOR_SAFE_MODE stop writing to the flash. I did a quick test with some "dead" AAA batteries (~1v each). The device boots and reboots hundreds of times a minute. When I hooked it back up to a power supply, I noticed the device was writing to the flash each boot. With your current implementation that means the flash is being hammered with updates when the device has low batteries. Thousands of writes an hour, etc.

    The second is to NOT write to the flash until "####### Set Boot Complete #######" has been reached. This is a little more complicated and of course there is an exception for the boot counter (which is capped at RESTARTS_REQUIRED_FOR_SAFE_MODE ), but it works well for me. For the CFG it doesn't require a lot of extra coding. I added a safe to write flag and it is tripped after the boot is marked complete or when you exit safe mode which causes a CFG_Save_IfThereArePendingChanges and any stored changes are written. The vars was a bit more complicated. The firmware potentially writes when reading, reads (with potential writes) and explicitly writes all over the place. I suggest a simple rule don't write on a read, ever. Still it's not too bad because the channel changes can get written to memory and when the boot is marked complete all the channel information is written to flash anyway. So I just stop writing (except boot count) while in the "boot" phase.
  • ADVERTISEMENT
  • #3 21819583
    miegapele
    Level 16  
    Yes, not excessively writing is probably good idea. However, it's interesting would could cause this crash. Broken config should not be an issue, because crc is checked. Maybe easyflash data is corrupted and the action of trying to read crashes somehow.
  • ADVERTISEMENT
  • #4 21819865
    protectivedad
    Level 9  
    I didn't like the flash vars code; it was very hard to follow. I just redid it so it was more efficient and logical. It now reads and ONLY reads, and if it needs to write it, then either appends to a good flash vars or resets the partition to hold good information. As for the config, from a logical point of view, I don't want my device changing itself if it isn't booted to a useful state. So, it just makes good sense to have it wait for the boot complete; plus, that reduces some wear on the flash. Win-win.

    I'll see if I get a "bricked" sensor problem anymore. If I do, I'll update this.
  • ADVERTISEMENT
  • #5 21836185
    protectivedad
    Level 9  
    So, this device on the current OBK is a battery killer. I found out that every 50 (it's a random amount) boots or so, the device will hang on boot with the LED on, and will stay there until the batteries drain. The hang happens VERY early in the boot. I got lucky and it happened in my test environment. The last known point was bk_misc_init_start_type and sometime before or during prvHeapInit the device hangs.

    It happened again on the front door sensor; this time new batteries brought it back to life, no soft brick.

    I compiled OBK using the _ALT SDK (after updating it to 3.0.78). I have loaded it on the front door sensor. If it hangs again, I'm giving up. I'll switch to TuyaMCU hardware versions. If they hang at the start, the TuyaMCU will at least cycle the module after a second.
  • ADVERTISEMENT
  • #6 21836634
    DeDaMrAz
    Level 22  
    Battery drain will corrupt flash after it drops under certain voltage, if memory serves 1.8V is the threshold where I observed random corruptions in flash (consult datasheet for a given flash chip).

    It would be interesting to debug the problem that you are facing and it sounds like a DoorSensor driver issue. I got a long running test device that is running on a deepsleep 900 (15 minutes wake) cycle and it's been going for about 3+ months without problems on non-stop. Granted on 18650 battery but it's been powering on 96 times a day. DoorSensor and deepsleep share the general operating idea.

    Are you using HA for control or something else? My solution was to create automations in HA to alert me if device battery is low or if it becomes unavailable after certain amount of time.
  • #7 21836731
    protectivedad
    Level 9  
    DeDaMrAz wrote:
    Battery drain will corrupt flash after it drops under certain voltage, if memory serves 1.8V is the threshold where I observed random corruptions in flash (consult datasheet for a given flash chip)

    Even if there are no writes? In my OBK version I have a safety mechanism that doesn't allow any writes if the battery is below ~2.5 V (except boot failure and then just the first 5). The idea being that by the time the batteries are drained "corrupt flash low" the device has already failed to boot 5 times and so there are no writes to the flash and the flash won't get corrupted. There was no corruption last time, so I hope I've solved that problem. The last "freeze" occurred a few days after putting in new batteries in a new test device.

    DeDaMrAz wrote:
    It would be interesting to debug the problem that you are facing and it sounds like a DoorSensor driver issue. I got a long running test device that is running on a deepsleep 900 (15 minutes wake) cycle and it's been going for about 3 months without problems on non-stop. Granted on 18650 battery but it's been powering on 96 times a day. DoorSensor and deepsleep share the general operating idea.

    It's not the door sensor. It is early in the boot process, before even the "early boot" code, before the first malloc call. I was thinking it was a low battery problem, but it happened on my test bench hooked up to a power supply. I thought maybe I fried a device either taking it apart the first time or from the stress of writing new firmware and testing, but I now had it happen on three different XTREME door sensors. The last one, the only mods I made were soldering two lines to flash the new firmware.

    DeDaMrAz wrote:
    Are you using HA for control or something else? My solution was to create automations in HA to alert me if device battery is low or if it becomes unavailable after certain amount of time.

    I use OpenHAB, but I have the same type of setup. It's not the battery that causes the initial "freeze". It's the initial freeze that causes the battery to drain that then causes the flash corruption, etc. I have alerts coming to my phone every time the door opens and closes. When I know the door has opened/closed but I don't get an alert, I will take it down and test the batteries.

    What are the details on your device that you have running, rebooting every 15 minutes? Specific hardware details and modifications you've made. I'd like to find the differences from my device to see how device-specific the problem is. Right now I think it is a model-specific problem. It might be triggered by the SDK boot code. I am hoping the newer SDK (which has different boot code) might solve the problem. Of course it could have been multiple reasons and I've been slowly going through them; hard to tell since I can't see the serial log of the device on my door. The only hint I have was the one time on my test bench when the serial was hooked up that it froze.
  • #8 21836769
    DeDaMrAz
    Level 22  
    protectivedad wrote:
    Even if there are no writes?


    On first write attempt I would say.

    protectivedad wrote:
    What are the details on your device that you have running rebooting every 15 minutes?


    It's a regular PR release of OBK that introduced quick connect and flashvars, nothing special just wanted to make sure it is working before release and it stayed on for almost 6 months now and it is running on BN7231N - that's it.

    I'd go one item change at the time to try and replicate or eliminate the behavior on the bench... never got the time to attach one to OpenOCD to test, that would be the ultimate answer.

    Reference - https://www.elektroda.com/rtvforum/topic3989434.html#20654215
    original post - https://www.elektroda.com/rtvforum/topic3866123-600.html#20028605
  • #9 21836778
    protectivedad
    Level 9  
    DeDaMrAz wrote:
    On first write attempt I would say.

    I'm sure I've solved that problem then. By the time the battery gets that low, there are no longer any writes occurring to the flash.

    DeDaMrAz wrote:
    It's a regular PR release of OBK that introduced quick connect and flashvars, nothing special just wanted to make sure it is working before release and it stayed on for almost 6 months now and it is running on BN7231N - that's it.

    I was hoping for a bit more specifics. For example, mine is using the CBU module.

    Does your test setup go into full deep sleep? How do you have it wake up? I could set up a test device, but I can't think of how to get it to wake on the GPIO change without physically moving the mechanism.

    A true test would require the deep sleep, a pause, and a wake up by GPIO over and over. Any idea how I could accomplish this so I can see for sure if my SDK changes solve the problem?
  • #10 21836784
    DeDaMrAz
    Level 22  
    protectivedad wrote:
    I was hoping for a bit more specifics. For example mine is using the CBU module.


    Well I'd be more specific but it is a toss up on these chips. Mine is using CBU module which is BK7231N based. Yes it goes in deep sleep and is awaken via timer (quicktick I believe) reports data on MQTT connect and goes back to sleep, I didn't go into deeper tests as you can we have been releasing like crazy there is no way I can test all that.

    protectivedad wrote:
    ...but I can't think of how to get it to wake on the gpio change without physically moving the mechanism.


    Talking about the code mechanism you've implemented or?

    protectivedad wrote:
    A true test would require the deep sleep a pause and a wake up by GPIO over and over. Any idea how I could accomplish this so I can see for sure if my SDK changes solve the problem?


    Do you have DeepSleep in your implementation if so use that, easiest way to test on a bench.

Topic summary

The discussion addresses the issue of BK7231N-based door sensors becoming bricked after battery depletion, focusing on firmware corruption and flash memory write problems. The original poster experienced a bricked non-TuyaMCU door sensor after battery drain and observed serial log outputs indicating system initialization and flash ID recognition. Responses suggest that excessive flash writes may contribute to device crashes, though CRC checks should prevent configuration corruption from causing failures. It is hypothesized that corruption of EasyFlash data could lead to crashes during read operations. Attempts to replicate the low-battery bricking behavior have been inconclusive, and the exact cause remains undetermined.
Summary generated by the language model.
ADVERTISEMENT