fixed terminating stale threads on trap/proc_exit#1929
fixed terminating stale threads on trap/proc_exit#1929wenyongh merged 1 commit intobytecodealliance:mainfrom
Conversation
| /** | ||
| * create a copy of wait_map with no lock | ||
| * this allows invoking methods in the callback | ||
| * which require acquiring lock on the original map |
There was a problem hiding this comment.
I don't see there's any lock in the notify_stale_threads callback. Should we rather have a lock on the shared_memory_list_lock mutex at the beginning of the function and release it at the end? Otherwise the map could be modified in the other thread making the traversal corrupted.
There was a problem hiding this comment.
the wasm_runtime_atomic_notify step in callback calls acquire_wait_info which tries to acquire shared_memory_list_lock, so I guess encapsulating it with that will cause issues?
There was a problem hiding this comment.
HashMap doesn't duplicate the key/values pairs, so wait_map_without_lock will point to the same elements of wait_map
There was a problem hiding this comment.
I think you at least want to lock while you copy hashmaps. Also, what happens if after making a copy a new waiter is added to the map?
There was a problem hiding this comment.
@loganek While copying the original map, the original map already acquires a lock as it traverses. As inserts happen in the copy map, every insert locks the copy map. So I think the copy should be consistent? If you meant wrapping copy with shared_memory_list_lock -> addressed that.
What could be concerning here is that during the traversal of copy map, the original map could be updated. If I put a lock on the original map during traversal, the execution gets stuck because the callback steps in notify_stale_threads try to put a lock on the original map. (This is exactly why I had to create a copy). This might lead to some entries (newly added ones) in the original map not catered to at that point of time, but would be eventually handled, right?
There was a problem hiding this comment.
either way I'm still not sure it would be safe since you're not cloning the map but copying the pointers iiuc
|
closing and reopening to reinitiate the build |
c5e57ca to
64abe09
Compare
64abe09 to
22ba795
Compare
| { | ||
| AtomicWaitAddressArgs *data = (AtomicWaitAddressArgs *)user_data; | ||
| if (data->len > 0) { | ||
| if (!(data->addr = wasm_runtime_realloc(data->addr, |
There was a problem hiding this comment.
When to free the memory? And why do nothing except LOG_ERROR when malloc/realloc memory failed?
| memset(args, 0, sizeof(*args)); | ||
| os_mutex_lock(&shared_memory_list_lock); | ||
| // create list of addresses | ||
| bh_hash_map_traverse(wait_map, create_list_of_waiter_addresses, args); |
There was a problem hiding this comment.
How about traversing two times: the first time to get the total element count of wait_map, use it to allocate memory for AtomicWaitAddressArgs *args with size == offsetof(AtomicWaitAddressArgs, addr) + sizeof(void *) * total_elem_count , and then traverse the second time without allocating memory?
static void
xxx_callback(void *key, void *value, void *p_total_elem_count)
{
*(uint32 *)p_total_elem_count = *(uint32 *)p_total_elem_count + 1;
}
os_mutex_lock(&shared_memory_list_lock);
total_elem_count = 0;
bh_hash_map_traverse(wait_map, xxx_callback, (void *)&total_elem_count);
allocate memory
bh_hash_map_traverse(wait_map, ..., args); /* set each data->addr[i] */
os_mutex_unlock(...)There was a problem hiding this comment.
just wondering if there is any specific advantage of traversing two times? is it to ensure the final traversal of data->addr is safe i.e. we don't go out of bounds?
There was a problem hiding this comment.
The purpose of that is to avoid multiple reallocations - knowing the size in advance will let you make only one allocation.
22ba795 to
aa481de
Compare
aa481de to
bba90ce
Compare
bba90ce to
ee62523
Compare
8d79672 to
6119dec
Compare
6119dec to
7cd2e0b
Compare
|
Does this PR work in AOT mode? Also, can you try it with the sample thread_termination.c after replacing with a wait operation?And try with Just to make sure that the base cases are covered. |
7cd2e0b to
5424205
Compare
5424205 to
ec33f47
Compare
This is to terminate suspended threads in case an atomic wait occurs with a huge or indefinite (-1) timeout, followed by a proc exit or trap.
This is to terminate suspended threads in case an atomic wait occurs with a huge or indefinite (-1) timeout, followed by a proc exit or trap.