Difficulty with Threaded TCP

Peter Castine

I am writing an mxj external that needs to communicate with a TCP Server, sending and receiving data. The server is a custom hardware system. It is *not* something from the Max/MSP world, it knows nothing about atoms, it uses a binary format that I have to adhere to. So no [mxj net.send], no [mxj net.receive], and I cannot use the TcpReceiver/TcpSender classes.

Sending data is OK. I have written test code that simply blocks while waiting to receive a reply in response to the sent packet and that works. But I cannot afford to block and I need to be well-behaved if the host dies or a LAN cable is broken. So I am trying to put the wait-for-response code into a Java Thread using the runnable Interface. This is where problems begin.

What the attached code is supposed to do:

Initialization: allocate a buffer as a template for data to send over TCP. Some other standard stuff.

bang():
- Check if we're connected. If necessary, build a connection.
- Send data to server
- Start thread to listen for a response

run():
- allocate buffer to store data once it's come in over the network.
- If the data has arrived:
...then parse it
...else increment a counter and sleep for a while; punt if the counter indicates a time out.

Does this make sense? What is the code doing wrong? The attachment has been stripped of the gory details of the data formats, otherwise the real thing. In my real life example the code hits the timeout at the outlet(0, -1) statement (line 171).

I greatly appreciate the help.

Owen Green

Hi Peter,

I'll try and have a proper look tomorrow (it's late now, and I'm en route to bed :) , but in the meantime...

You're starting a new thread, based on the same instance of runnable, each time you bang, which may not break your code but is expensive and idiomatically unusual, as well as increasing the scope for deadlock weirdness appreciably.

A more standard idiom for this sort of thing is to have a single worker thread that calls wait() within its while loop, and is woken up by the main thread calling notify() when there's something to be done. Examples abound on the web, if you're in a hurry, but I'll try and post a modified version of your code tomorrow demonstrating. You also seem to be over-synchronizing by calling your synchronized method in a synchronized block.

Also, you can defer timeout handling to the underlying socket, which may make life simpler.

That notwithstanding, at first glance, nothing would appear to be
outstandingly wrong with your run() method. Obviously, gotData is never getting set to true. Have you tried checking to see that data is being read at all, and if so, how much? Have you tried locking on a different object, or getting rid of that superfluous synchronized(this) block?

More useful things tomorrow, hopefully.

--
O

Peter Castine

Thanks -- I'll make that a single thread for listening to replies.

I'm sure there must be examples on the web, but my Google queries have been looking in the wrong place.-( Actually, *after* I put the code together I found that my son had brought his Java programming book home from university. The final chapter is a sample Chat client, which is structured similarly to what I need to do and would have saved me quite a bit of uncertainty. And shown me that the way to do this is with a single worker thread.

I'm probably being over-paranoid with the synchronize keyword, particularly while I was moving chunks of code around. I had read that the double synchronization was supposed to be safe. Is the second lock expensive? (A lot of tutorials and sample code will show how to use a feature without getting into relative expense of different approaches.)

-- P.

Owen Green

Peter Castine wrote on Sun, 22 February 2009 15:16shown me that the way to do this is with a single worker thread.

Which probably provides you with enough detail to refactor - don't hesitate to shout if that proves gnarly.

Btw - In the absence of the forums yesterday, I sent you a message off-list, the gist of which was that I couldn't get your code not to work when testing against a simple server that belched out 1089 byte packets in response to anything. Made me wonder if the problem was in the thread code after all. Didn't test in max though, and I'm not sure of the effect of synchronizing on a maxobject instance would be (maybe mxj also makes synchronized calls on the instance?)

Quote: Is the second lock expensive? (A lot of tutorials and sample code will show how to use a feature without getting into relative expense of different approaches.)

AFAIK, the second lock is as expensive as the first (quite). In this case, the second lock is on the method, not the object or its members, so it offers no additional protection if that data is modified elsewhere.

If you only have a single worker thread now, this at least eliminates one source of contention. If the members you're changing are primitives, and you're relatively sure about there being clear circumstances about when they're read and written, you could make them volatile, which would guarantee atomicity. If there's likely to be contention over write access, you're best off using synchronised setter methods for all writes (hang the expense), which will synchronise (effectively) on the data in question.

--
O

maxmspjit

I tried to send this to the list on Friday, but I guess it got eaten by the forum switchover.

---

I haven't tried to run your code, but generally the point of moving I/O to a thread is so you don't need to sleep. You can just let your read() block that thread, and it will return whenever there is data, at which point you process it, and immediately read (and block) again.

Also my understanding is that since read is blocked, replyWaitCount will only increment when you do a read, so I'm not sure that construct is useful. If you want to know if you're timed out, you'll need a separate timer, or you'll need to set the socket to timeout whether it's received data or not (read up on SET_SO_TIMEOUT).

In your real code you may not ever be getting to execute IsValidResponse(buf) because TCP is a streaming protocol not a packet-based protocol. Your construct on lines 150-151 checks if there is at least packetsize data available, but only processes it if there is _exactly_ packetsize data available, which may not be the case. You will likely have to implement your own buffering to read a bunch of data, and once you have enough, pull a packet from it, saving any potential extra data to be used for the next packet. Actually you may be ok since you're specifying a buffer of the exact size of your packet, but I think you could still receive less. My point is that I don't think there's any requirement for read() to return a specific amount of data, so if you were hitting packetSize on each read() that would be more luck than anything.

When in doubt, break the conditions of your test statements out into their constituent parts and throw some traces in between. Stuff like if(dataIn.read(buf) == kPacketSize) is nice and compact, but it makes it difficult to debug. try something like

int dataRead = dataIn.read(buf);
post("read " + dataRead + " bytes from the socket");
if (dataRead == kCmd20PacketSize) {
// ...
}

etc.

Let me know if that gets you any closer, it's just a shot in the dark.

Peter Castine

Status update: over the weekend I implemented the single-thread approach, peppered it with post()s, and sent the updated external to the developer who builds a Max/MSP/Jitter standalone with it and sends that to the hardware manufacturer in Switzerland who tests it with the one existing model of the hardware in the world.

I will probably not get feedback until Tuesday at the earliest. (There are more reasons for the very long delay, but that's an even longer story. It's not the way anyone really likes to test code.)

In the meantime I'll go through the logic with the replyWaitCount member. When I started working on this I was trying to avoid reading from a socket when I knew the read() method would block... there will be a pretty significant latency between sending a request and getting a response. I thought it would be more CPU-friendly to sleep() for a bit.