Skip to content

[lte] Handling Implicit detach timer while mme restart #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

rsarwad
Copy link
Owner

@rsarwad rsarwad commented Jul 17, 2020

Summary
Added code to handle Implicit detach timer when MME restarts
Test Plan
Executed s1ap tester sanity test suite

Tested by executing new test case, test_implcit_detach_timer_with_mme_restsart.py

rsarwad added 26 commits April 20, 2020 20:25
Fixed the memory leaks for mme_app timers
…enb_s1ap_id_same_ue.py

2. With this fix basic attach-detach test case is passing on enabling stateles MME
…fferent_enb_s1ap_id_same_ue.py"

This reverts commit f6658a0.
rsarwad pushed a commit that referenced this pull request Jul 27, 2020
Summary: Pull Request resolved: magma/fbc-js-core#5

Test Plan: I've tested on my fork and it's working.

Reviewed By: naor9991

Differential Revision: D22627395

Pulled By: a8m

fbshipit-source-id: bca548e597cc97c509b43c93d7fc782d1583962c
@rsarwad rsarwad force-pushed the rsarwad_stateless_mme_implicit_detach branch 2 times, most recently from cebcae7 to f17c9bd Compare July 28, 2020 13:32
@rsarwad rsarwad force-pushed the master branch 3 times, most recently from 1bddb53 to aa0ce78 Compare July 30, 2020 07:50
rsarwad pushed a commit that referenced this pull request Jul 30, 2020
Summary:
ASAN error:
```
Jun 16 08:56:03 magma-dev sessiond[9856]: =================================================================
Jun 16 08:56:03 magma-dev sessiond[9856]: ==9856==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x61800001fc80 in thread T14:
Jun 16 08:56:03 magma-dev sessiond[9856]:   object passed to delete has wrong type:
Jun 16 08:56:03 magma-dev sessiond[9856]:   size of the allocated type:   816 bytes;
Jun 16 08:56:03 magma-dev sessiond[9856]:   size of the deallocated type: 808 bytes.
Jun 16 08:56:03 magma-dev sessiond[9856]: I0616 08:56:03.149473  9886 SessionEvents.cpp:53] Could not log session_created event {"session_id":"IMSI001010000000001-120251","imsi":"IMSI001010000000001"}, Error Message: Connect Failed
Jun 16 08:56:03 magma-dev sessiond[9856]:     #0 0x7ff448b0d7f0 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc37f0)
Jun 16 08:56:03 magma-dev sessiond[9856]:     #1 0x559f94cbc742 in magma::AsyncGRPCRequest<magma::lte::LocalSessionManager::WithAsyncMethod_ReportRuleStats<magma::lte::LocalSessionManager::WithAsyncMethod_CreateSession<magma::lte::LocalSessionManager::WithAsyncMethod_EndSession<magma::lte::LocalSessionManager::Service> > >, magma::lte::LocalCreateSessionRequest, magma::lte::LocalCreateSessionResponse>::proceed() /home/vagrant/magma/lte/gateway/c/session_manager/SessionManagerServer.cpp:98
Jun 16 08:56:03 magma-dev sessiond[9856]:     #2 0x559f94ca8f54 in magma::AsyncService::wait_for_requests() /home/vagrant/magma/lte/gateway/c/session_manager/SessionManagerServer.cpp:39
Jun 16 08:56:03 magma-dev sessiond[9856]:     #3 0x559f94c7cae9 in operator() /home/vagrant/magma/lte/gateway/c/session_manager/sessiond_main.cpp:259
Jun 16 08:56:03 magma-dev sessiond[9856]:     #4 0x559f94c81c51 in _M_invoke<> /usr/include/c++/6/functional:1391
Jun 16 08:56:03 magma-dev sessiond[9856]:     #5 0x559f94c81a70 in operator() /usr/include/c++/6/functional:1380
Jun 16 08:56:03 magma-dev sessiond[9856]:     #6 0x559f94c8192b in _M_run /usr/include/c++/6/thread:197
Jun 16 08:56:03 magma-dev sessiond[9856]:     #7 0x7ff446dbde6e  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xb9e6e)
Jun 16 08:56:03 magma-dev sessiond[9856]:     #8 0x7ff447d384a3 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x74a3)
Jun 16 08:56:03 magma-dev sessiond[9856]:     #9 0x7ff446532d0e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xe8d0e)
Jun 16 08:56:03 magma-dev sessiond[9856]: 0x61800001fc80 is located 0 bytes inside of 816-byte region [0x61800001fc80,0x61800001ffb0)
Jun 16 08:56:03 magma-dev sessiond[9856]: allocated by thread T14 here:
Jun 16 08:56:03 magma-dev sessiond[9856]:     #0 0x7ff448b0cbf0 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc2bf0)
Jun 16 08:56:03 magma-dev sessiond[9856]:     #1 0x559f94ca91f0 in magma::LocalSessionManagerAsyncService::init_call_data() /home/vagrant/magma/lte/gateway/c/session_manager/SessionManagerServer.cpp:60
```

According to the internet, "Polymorphic base classes should declare virtual destructors. If a class has any virtual functions, it should have a virtual destructor".
Resource : https://stackoverflow.com/questions/41552966/getting-new-delete-type-mismatch-from-asan

Reviewed By: uri200

Differential Revision: D22065108

fbshipit-source-id: ff146a9a92f71408ea25dc86943938ca5afe88a1
rsarwad pushed a commit that referenced this pull request Jul 30, 2020
Summary:
Pull Request resolved: magma#1833

Under heavy load, Sessiond crashes multiple times during memory allocation.

```
(gdb) frame 4
#4 0x00007f857cd2a62f in _int_malloc (av=av@entry=0x7f857d04ab00 <main_arena>, bytes=bytes@entry=40) at malloc.c:3765
3765 malloc.c: No such file or directory.
(gdb) info locals
p = 0x55a9f11e7cc0
iters = <optimized out>
nb = 48
idx = 4
bin = <optimized out>
victim = 0x55a9f11e7cc0
...
remainder_size = 528
block = <optimized out>
...
errstr = 0x0
func = "_int_malloc"
(gdb) p victim
$1 = {prev_size = 13249033401518231684, size = 577, fd = 0x7f857d04ad88 <main_arena+648>, bk = 0x55a9f1184180, fd_nextsize = 0x0, bk_nextsize = 0x55a9f057dae1
<std::_Function_base::Base_manager<magma::AsyncPipelinedClient::deactivate_flows_for_rules(const string&, const std::vector<std::cxx11::basic_string<char> >&, const std::vectormagma::lte::PolicyRule&, magma::lte::RequestOriginType_OriginType)::<lambda(grpc::Status, magma::lte::DeactivateFlowsResult)> >::M_manager(std::Any_data &, const std::Any_data &, std::Manager_operation)>}
```

Crash address hints that the issue is related to aync calls to pipelined/directoryd/AAAclient.

Looking at the code of GRPCReceiver.h, it seems that we could hit a memory corruption if the AsyncLocalResponse object is deleted in handle_response before
moving the reader to the response_reader_ attribute.

```
78  void set_response_reader(
79      std::unique_ptr<grpc::ClientAsyncResponseReader<ResponseType>> reader) {
80    reader->Finish(&response, &status, this);
81    response_reader_ = std::move(reader);
82  }
...
121  void handle_response() {
122    this->callback(this->status, this->response);
123    delete this;
124  }
```

This can be confirmed by printing the address of this and this->response_reader_ just before the delete. the latest would be NULL just before the sigfault.
In addition, the destructor is set to virtual as the class has a virtual method to force calling the correct destructor for sub-classes.

Reviewed By: themarwhal

Differential Revision: D22066268

fbshipit-source-id: c45c32a031e26bbd86c80f1d0c2286f4015abff2
rsarwad pushed a commit that referenced this pull request Nov 30, 2020
* Add T3489 tests

Introduce a new test to validate T3489 expiry.

Credit to ulaskozat for the diff

Testing done:
Verified that an ASAN use after free occurs on timer expiry

=7031==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000093460 at pc 0x555807545462 bp 0x7f87093fd2b0 sp 0x7f87093fd2a8
WRITE of size 8 at 0x603000093460 thread T16
    #0 0x555807545461 in nas_stop_T3489 /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_data_context.c:101
    #1 0x5558075c47c5 in esm_proc_esm_information_response /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_information.c:119
    #2 0x55580759339b in esm_recv_information_response /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/sap/esm_recv.c:575
    #3 0x555807551fba in _esm_sap_recv /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/sap/esm_sap.c:679
    #4 0x555807550f33 in esm_sap_send /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/sap/esm_sap.c:283
    #5 0x5558075195a0 in lowerlayer_data_ind /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/emm/LowerLayer.c:276
    #6 0x55580757848f in _emm_as_data_ind /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/emm/sap/emm_as.c:688
    #7 0x555807574ec4 in emm_as_send /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/emm/sap/emm_as.c:180
    #8 0x55580753147f in emm_sap_send /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/emm/sap/emm_sap.c:105
    #9 0x5558074d74fc in nas_proc_ul_transfer_ind /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/nas_proc.c:326
    #10 0x5558071bd634 in handle_message /home/vagrant/magma/lte/gateway/c/oai/tasks/mme_app/mme_app_main.c:97
    #11 0x7f871bb277bd in zloop_start (/usr/lib/x86_64-linux-gnu/libczmq.so.4+0x287bd)
    #12 0x5558071bf169 in mme_app_thread /home/vagrant/magma/lte/gateway/c/oai/tasks/mme_app/mme_app_main.c:447
    #13 0x7f871e11f4a3 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x74a3)
    #14 0x7f871a494d0e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xe8d0e)

0x603000093460 is located 0 bytes inside of 32-byte region [0x603000093460,0x603000093480)
freed by thread T16 here:
    #0 0x7f871e602a10 in free (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc1a10)
    #1 0x5558070dc054 in free_wrapper /home/vagrant/magma/lte/gateway/c/oai/common/dynamic_memory_check.c:47
    #2 0x555807545496 in nas_stop_T3489 /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_data_context.c:103
    #3 0x5558075c517a in _esm_information /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_information.c:269
    #4 0x5558075c4e15 in _esm_information_t3489_handler /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_information.c:199
    #5 0x5558074e2e8a in mme_app_nas_timer_handle_signal_expiry /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/util/nas_timer.c:100
    #6 0x5558071be2d2 in handle_message /home/vagrant/magma/lte/gateway/c/oai/tasks/mme_app/mme_app_main.c:235
    #7 0x7f871bb277bd in zloop_start (/usr/lib/x86_64-linux-gnu/libczmq.so.4+0x287bd)

Signed-off-by: Amar Padmanabhan <[email protected]>

* Invalidate the T3849 timer id while processing esm information retransmit

The _esm_information function stops the existing T3849 timer as referenced
by the esm_ctxt datastructure timer before rescheduling a new T3849 timer
when it requests for the esm info from a UE.
Stopping the timer has a side effect of freeing up the UE related
retransmission data associated with it. This causes issues during
the T3849 timer expiry handling as the cancelled timer and the rescheduled
one reuse the same retransmission data datastructure.

Fix this by unsetting the T3849 timer in the handling of the timer expiry
as the esm_ctxt is not associated with any valid timers anymore. Further
as the timer is a oneshot timer it will be cleaned up after the processing
of the timer callback.

Signed-off-by: Amar Padmanabhan <[email protected]>
rsarwad pushed a commit that referenced this pull request Apr 19, 2021
* Add T3489 tests

Introduce a new test to validate T3489 expiry.

Credit to ulaskozat for the diff

Testing done:
Verified that an ASAN use after free occurs on timer expiry

=7031==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000093460 at pc 0x555807545462 bp 0x7f87093fd2b0 sp 0x7f87093fd2a8
WRITE of size 8 at 0x603000093460 thread T16
    #0 0x555807545461 in nas_stop_T3489 /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_data_context.c:101
    #1 0x5558075c47c5 in esm_proc_esm_information_response /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_information.c:119
    #2 0x55580759339b in esm_recv_information_response /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/sap/esm_recv.c:575
    #3 0x555807551fba in _esm_sap_recv /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/sap/esm_sap.c:679
    #4 0x555807550f33 in esm_sap_send /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/sap/esm_sap.c:283
    #5 0x5558075195a0 in lowerlayer_data_ind /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/emm/LowerLayer.c:276
    #6 0x55580757848f in _emm_as_data_ind /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/emm/sap/emm_as.c:688
    #7 0x555807574ec4 in emm_as_send /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/emm/sap/emm_as.c:180
    #8 0x55580753147f in emm_sap_send /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/emm/sap/emm_sap.c:105
    #9 0x5558074d74fc in nas_proc_ul_transfer_ind /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/nas_proc.c:326
    #10 0x5558071bd634 in handle_message /home/vagrant/magma/lte/gateway/c/oai/tasks/mme_app/mme_app_main.c:97
    #11 0x7f871bb277bd in zloop_start (/usr/lib/x86_64-linux-gnu/libczmq.so.4+0x287bd)
    #12 0x5558071bf169 in mme_app_thread /home/vagrant/magma/lte/gateway/c/oai/tasks/mme_app/mme_app_main.c:447
    #13 0x7f871e11f4a3 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x74a3)
    #14 0x7f871a494d0e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xe8d0e)

0x603000093460 is located 0 bytes inside of 32-byte region [0x603000093460,0x603000093480)
freed by thread T16 here:
    #0 0x7f871e602a10 in free (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc1a10)
    #1 0x5558070dc054 in free_wrapper /home/vagrant/magma/lte/gateway/c/oai/common/dynamic_memory_check.c:47
    #2 0x555807545496 in nas_stop_T3489 /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_data_context.c:103
    #3 0x5558075c517a in _esm_information /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_information.c:269
    #4 0x5558075c4e15 in _esm_information_t3489_handler /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/esm/esm_information.c:199
    #5 0x5558074e2e8a in mme_app_nas_timer_handle_signal_expiry /home/vagrant/magma/lte/gateway/c/oai/tasks/nas/util/nas_timer.c:100
    #6 0x5558071be2d2 in handle_message /home/vagrant/magma/lte/gateway/c/oai/tasks/mme_app/mme_app_main.c:235
    #7 0x7f871bb277bd in zloop_start (/usr/lib/x86_64-linux-gnu/libczmq.so.4+0x287bd)

Signed-off-by: Amar Padmanabhan <[email protected]>

* Invalidate the T3849 timer id while processing esm information retransmit

The _esm_information function stops the existing T3849 timer as referenced
by the esm_ctxt datastructure timer before rescheduling a new T3849 timer
when it requests for the esm info from a UE.
Stopping the timer has a side effect of freeing up the UE related
retransmission data associated with it. This causes issues during
the T3849 timer expiry handling as the cancelled timer and the rescheduled
one reuse the same retransmission data datastructure.

Fix this by unsetting the T3849 timer in the handling of the timer expiry
as the esm_ctxt is not associated with any valid timers anymore. Further
as the timer is a oneshot timer it will be cleaned up after the processing
of the timer callback.

Signed-off-by: Amar Padmanabhan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant