Search code examples
cobalt

Cobalt 23 net fetcher crash


We have several times got crash in the net fetcher part (Cobalt 23.lts.3.310436). Trouble shooting with extra logs in the code led me to the url_fetcher_core.cc file. It seems to be a thread related problem. An object is deleted while it is still in use.

In this log the 2574091304 is okay, delete after use. For the 2574072352, it is deleted approximately when SetLoadTimingInfo is called and leads to a crash.

╭[NetworkModule/21:0306/170243.371985:INFO:url_request.cc(1171)] Call Run
│[NetworkModule/21:0306/170243.372074:INFO:url_fetcher_core.cc(1134)] URLFetcherCore::GetLoadTimingInfo() delegate_=2574091304, this=2573807496, &timing_info=2574194072
│[NetworkModule/21:0306/170243.372140:INFO:url_fetcher_core.cc(1138)] Call ReportLoadTimingInfo, d=2574091304, delegate_=2574091304
│[NetworkModule/21:0306/170243.372233:INFO:net_fetcher.cc(305)] Call SetLoadTimingInfo, h=2574143520, this=2574091296, &timing_info=2574194072
│[NetworkModule/21:0306/170243.372308:INFO:fetcher.h(68)] SetLoadTimingInfo() this=2574143520, load_timing_info_callback_=2574143524
│[NetworkModule/21:0306/170243.372399:INFO:fetcher.h(70)]                     load_timing_info_callback_.is_null()=0
 [MainWebModule/68:0306/170243.372732:INFO:url_fetcher_core.cc(181)] URLFetcherCore::Stop(), delegate_=2574121288, will now set it to NULL.
╰[NetworkModule/21:0306/170243.705613:INFO:fetcher.h(72)]                     Call load_timing_info_callback_.Run, &timing_info=2574194072
 [MainWebModule/68:0306/170243.706208:INFO:url_fetcher_core.cc(181)] URLFetcherCore::Stop(), delegate_=2573905472, will now set it to NULL.
╭[NetworkModule/21:0306/170243.706464:INFO:url_request.cc(1171)] Call Run
 [MainWebModule/68:0306/170243.706509:INFO:url_fetcher_core.cc(181)] URLFetcherCore::Stop(), delegate_=2574091304, will now set it to NULL.
│[NetworkModule/21:0306/170243.706545:INFO:url_fetcher_core.cc(1134)] URLFetcherCore::GetLoadTimingInfo() delegate_=2574072352, this=2573828040, &timing_info=2574229600
│[NetworkModule/21:0306/170243.706622:INFO:url_fetcher_core.cc(1138)] Call ReportLoadTimingInfo, d=2574072352, delegate_=2574072352
![MainWebModule/68:0306/170243.706648:INFO:url_fetcher_core.cc(181)] URLFetcherCore::Stop(), delegate_=2574072352, will now set it to NULL.
│[NetworkModule/21:0306/170243.706679:INFO:net_fetcher.cc(305)] Call SetLoadTimingInfo, h=2573730272, this=2574072344, &timing_info=2574229600
│signal 11

Summary:

This call:            delegate_->ReportLoadTimingInfo(timing_info);

goes to this:         void NetFetcher::ReportLoadTimingInfo(const net::LoadTimingInfo& timing_info) {
                        // About here another thread comes in and set delegate_ to null.
                        handler()->SetLoadTimingInfo(timing_info);
                      }
                      // About here the other thread deletes the URLFetcherDelegate object, that delegate_ points to.

and down into this:   virtual void SetLoadTimingInfo(const net::LoadTimingInfo& timing_info) {
                        if (!load_timing_info_callback_.is_null()) {
                          load_timing_info_callback_.Run(timing_info);
                        }
                      }

It seems like these methods are executed by the same thread: Stop, InformDelegateFetchIsComplete, InformDelegateResponseStartedInDelegateThread, InformDelegateDownloadProgressInDelegateSequence, InformDelegateUploadProgressInDelegateSequence
However this method is executed by another thread: GetLoadTimingInfo

Solution proposal
I believe it is a need to prevent simultaneous execution of Stop and GetLoadTimingInfo. Can be achieved by calling PostTask so the GetLoadTimingInfo function will be executed by the delegate thread. This also makes the check of the delegate_ pointer variable before using it, to function as expected. Since previously the variable could be changed between the check and the use of it. I also did some fine tuning of some code.

--- a/net/url_request/url_fetcher_core.cc
+++ b/net/url_request/url_fetcher_core.cc
@@ -175,8 +175,9 @@ void URLFetcherCore::Start() {
 }
 
 void URLFetcherCore::Stop() {
-  if (delegate_task_runner_)  // May be NULL in tests.
+  if (delegate_task_runner_) {  // May be NULL in tests.
     DCHECK(delegate_task_runner_->RunsTasksInCurrentSequence());
+  }
 
   delegate_ = NULL;
   fetcher_ = NULL;
@@ -779,8 +780,11 @@ void URLFetcherCore::StartURLRequest() {
   if (!extra_request_headers_.IsEmpty())
     request_->SetExtraRequestHeaders(extra_request_headers_);
 
+#if defined(STARBOARD)
   request_->SetLoadTimingInfoCallback(base::Bind(&URLFetcherCore::GetLoadTimingInfo,
       base::Unretained(this)));
+#endif
+
   request_->Start();
 }
 
@@ -1129,6 +1133,14 @@ void URLFetcherCore::AssertHasNoUploadData() const {
 #if defined(STARBOARD)
 void URLFetcherCore::GetLoadTimingInfo(
     const net::LoadTimingInfo& timing_info) {
+  delegate_task_runner_->PostTask(
+      FROM_HERE,
+      base::Bind(&URLFetcherCore::GetLoadTimingInfoInDelegateThread,
+                 this, timing_info));
+}
+
+void URLFetcherCore::GetLoadTimingInfoInDelegateThread(
+    const net::LoadTimingInfo& timing_info) {
   // Check if the URLFetcherCore has been stopped before.
   if (delegate_) {
     delegate_->ReportLoadTimingInfo(timing_info);
diff --git a/net/url_request/url_fetcher_core.h b/net/url_request/url_fetcher_core.h
index 4359ac5495..a06f82f721 100644
--- a/net/url_request/url_fetcher_core.h
+++ b/net/url_request/url_fetcher_core.h
@@ -164,6 +164,7 @@ class URLFetcherCore : public base::RefCountedThreadSafe<URLFetcherCore>,
   static void SetIgnoreCertificateRequests(bool ignored);
 #if defined (STARBOARD)
   void GetLoadTimingInfo(const net::LoadTimingInfo& timing_info);
+  void GetLoadTimingInfoInDelegateThread(const net::LoadTimingInfo& timing_info);
 #endif  // defined(STARBOARD)
  private:
   friend class base::RefCountedThreadSafe<URLFetcherCore>;

Is this a proper solution?

PS. This problem might be related to the fact that some requested files (images) are not available on the server. However I never verified this possible connection.


Solution

  • This looks like it may be an actual bug, and the proposed change could be one possible way to fix it. Would it be possible to make a pull request at https://github.com/youtube/cobalt so we could review and test the proposed fix ?

    Also, in case of crash bugs like this i recommend filing a Issue Tracker bug as suggested here: https://cobalt.dev/communication.html#filing-bugs-and-feature-requests