T3-TB and TSTAT-10 with WiFi, dropped connections

I believe there is a bug in the firmware (currently running latest) of the WiFi devices. After ending the TCP session with these devices, sometimes they will no longer allow new connections to port 502 through WiFi until they’re power cycled. As such any external modbus polling (or T3000) will cease to function if it is routed through WiFi. However, the wired PHY on the T3-TB (on port 502) and the RS-485 PHY on the TSTAT-10 remain fully functional and are capable of communicating with T3000 and external modbus masters.

I first noticed this issue when I was having packet loss issues with the new wireless AP’s we installed in our new office, this issue has been largely corrected. It appears to only ever occur after a TCP session is terminated (through either RST or FIN,ACK). Below is a snippet from wireshark capture from yesterday, as you can see I closed my modbus master (FIN ACK sequence. Capture 322), about twenty minutes later I started it again and it sent out SYN packets as per normal but the controllers and thermostat responded with an RST, ACK. No amount of retries will successfully create a new TCP session to the devices until they’re powered cycled or rebooted through T3000 (through hardwire).

TemcoTCPLockoutFiltered.pcapng (46.2 KB)

This is quite the problem for me because there are 26 controllers mounted inside of enclosures 10+ feet off of the ground and 26 thermostats that all need to be power cycled periodically when this issue occurs.

Any assistance in resolving this would be most appreciated.

Edit:
I will also mention that polling is set to a 10s interval with block reads and block writes through modbus.

Lijun will check and report back.

There’s a known issue with the wifi connections. they cannot handle a large number of packets in a short time. If you are running into the issues described make sure you do ALL polling using the block read and write commands. And as always be sure to update the device firmware.

Maurice

We are having a similar problem. Our configuration it a Tstat10_wifi which is being read/written to via MODBUS TCP by a third party PLC. After a random period of time the Tstat10 MODBUS interface fails. The Tstat10 does respont to aping and can be found on the network at the IP address it was given, however, a power down restart is required to access it via wifi.

1 Like

this is a known issue. please update the Tstat firmware and switch all reads and writes to block mode.

Or reduce the polling rate. the WiFi module does work but at a certain point it cannot keep up and will lock up.

Maurice

The T3000 software indicated that the firmware was updated successfully. Now the Tstat10 flashes the screen continually but will not complete the boot cycle. Is there way to recover?

~WRD255.jpg

Please search this forum for key word “unbrick”. basically you cycle power and flash again in the first few seconds after power up.

Maurice

An update.

I am able to replicate this issue somewhat reliably. If after several hours of polling the TCP session ends, the devices become non responsive to new TCP sessions. I also noticed that a few devices randomly came back, some were dead for a few hours, others for days.
They then died again after I closed the session to change some configuration to my polling.

Currently I am polling a block of 20 registers at 30s intervals but the devices will still stop listening for TCP connections after a session is ended. The devices are running the latest firmware. The polling rate is extremely slow with very few registers actually being requested and yet these devices are still failing…

Fandu will study this and report back ASAP.

Wifi connectivity is a know issue under heavy polling with the current WiFi module. we have a new module making it’s way into production now.

If you have a wireshark log of your network traffic perhaps we can spot something there.

Maurice

We have set up several wifi devices and are trying to replicate this phenomenon, and we will report back in time.

The maximum number of TCP connections supported by our module is 3, so you can confirm that you have disconnected the old connection when opening a new one. We did not find problems with the modbus software for routine testing.

We are now shipping products with the ESP32 cpu which can handle a lot more wifi traffic. If you have an earlier version device with the ST processor and aux wifi module you need to reduce polling and make use of the block read & write functions.

Sorry for the delay in responding.

The only ways that the TCP session is being closed is either a reset packet sent from the T3 or TStat in the case of a network hiccup, or with a proper FIN ACK termination hand shake initiated from my software. I currently have my reconnect interval in my software set to 60s, so when I lose a connection I will attempt to reconnect on that interval. During the period that they are down I receive only a reset packet from them anytime I try to initiate a new connection with a SYN packet (aka connection refused).

I’ve spent the last while looking for any signs that anything else is connecting to these devices but I’ve seen nothing over days of sniffing packets with a wireless adapter in monitor mode. I also setup a network bridge (and firewall rules to forward) between the wired interfaces on my Linux workstation and connected a T3-TB. There was no communication to the T3/TStat devices from anything other my workstation where I’ve been running my testing and development of our hvac control system from. I’ve monitored with netstat what the Linux kernel is reporting for TCP sessions between my workstation and these devices and it shows only one session established to each device. Subsequently If I terminate my polling software the session is no longer listed. I’ve also tried using other 3rd party modbus masters (modpoll on linux workstation, mdbus with my windows laptop) to see If I could replicate the effects and I have so its not just limited to my software implementation.

I haven’t been able to find the Temco source code for the devices that utilize the ESP-07, but I am assuming it utilizes the Espressif AT command set. If it was a case of a half closed connections, I would expect it to terminate after at most 7200 seconds which is the maximum timeout that is supported with the AT+CIPSTO command unless the time parameter is set to 0 which is an infinite timeout (I am assuming that its not set to 0 since that would be a terrible practice). So I really don’t think its half closed connections or erroneous connections since there is zero evidence of it occurring.

Polling rates and size of polls really seem to play no part in the behavior that I am experiencing. I am polling only what I need for registers (which honestly isn’t much. The Tstats for example are only input 9 for the temperature and the related calibration/filter registers + the temperature setpoint) and they’re always performed as grouped as I can get them. My software only ever initiates a write when it receives an event from MQTT (Which haven’t been occurring since I am not running my control application yet). I’ve tried 5s, 10s, 15s, 30s, 60s, 2m, 5m poll rates and they all lead to the same problem.

–Important—
I’ve also done some port scanning on the devices and it seems that all ports on the device are closed when it dies not just port 502. For example Bacnet on 47808 is also non functional, Port 1234 which is open on the my hardwired test device is also dead on the wireless device.
–Important—

It also seems like I am not the only one with this issue as @christopher.cook
has mentioned and @jason.kania from this thread last year exhibiting the exact same problem. We really need a solution to this problem.

1 Like

Yes, there are known issues with the wifi adapter we have been using on the Tstat10 and other wifi products on our site. The modules cannot handle much network traffic before they choke up and need a reset. The solution, and we do have many clients successfully using our wifi products, is to poll slower and/or use the block read and write commands. Each block transaction uses about the same resources as a single modbus register/bacnet object read or write.

If you have many devices installed in teh field already we can work with you to settle the projects down. If you are evaluating our products for a new project then we should swap them for new ones using the new ESP32 cpu. The Tstat10 with esp32 is in production now and other items are ramping up slowly.

Here’s the repo for the new ESP32 series products. All our ESP based products use this same repository.

Maurice

A post was split to a new topic: Polling rate