Skip to content

Download with getData() stalls for large file on slow network #1156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
vzukanov opened this issue Mar 23, 2022 · 23 comments
Closed
4 tasks done

Download with getData() stalls for large file on slow network #1156

vzukanov opened this issue Mar 23, 2022 · 23 comments
Labels
type:bug Impaired feature or lacking behavior that is likely assumed

Comments

@vzukanov
Copy link

vzukanov commented Mar 23, 2022

New Issue Checklist

Issue Description

Parse Android SDK hangs when issuing getData() call for large file on slow network.

Steps to reproduce

Store a large file in Parse and try to get its data synchronously (on background thread) using getData() call when a mobile device is connected to slow network (or slowed down emulator).

Actual Outcome

getData() call hangs indefinitely.

Expected Outcome

getData() should either succeed, or throw an exception.

Environment

Parse Android SDK

  • SDK version: 3.0.0
  • Operating system version: Android 11

Server

  • Parse Server version: 5.1.1
  • Operating system: Ubuntu 20.04
  • Local or remote host (AWS, Azure, Google Cloud, Heroku, Digital Ocean, etc): Digital Ocean

Database

  • System (MongoDB or Postgres): MongoDB
  • Database version: 3.6.8

Logs

Nginx last entry in access.log when Andorid SDK hangs on slow network:

85.250.100.140 - - [23/Mar/2022:19:43:41 +0000] "GET /parse/classes/Book/Cq1hFjlytH HTTP/2.0" 200 267 "-" "Parse Android SDK API Level 30"

On faster network (LTE), I Nginx's access.log shows the following:

85.250.100.140 - - [23/Mar/2022:19:45:34 +0000] "GET /parse/classes/Book/Cq1hFjlytH HTTP/2.0" 200 267 "-" "Parse Android SDK API Level 30" "-"
85.250.100.140 - - [23/Mar/2022:19:45:37 +0000] "GET /parse/files/appid/67dfb4b64a2682765d9b34f34e5bfe00_shvilim-14.pdf HTTP/2.0" 200 28613718 "-" "Parse Android SDK API Level 30" "-"

Additional info

First, I thought that the problem is client-side timeout. So, I increased it:

        Parse.initialize(new Parse.Configuration.Builder(this)
                .applicationId("<appid>")
                .server("<url>")
                .enableLocalDataStore()
                .allowCustomObjectId()
                .clientBuilder(
                        new OkHttpClient.Builder()
                                .readTimeout(60, TimeUnit.SECONDS)
                                .writeTimeout(60, TimeUnit.SECONDS)
                )
                .build()

Unfortunately, did not work.

Then I increased timeouts for Nginx:

http {                                                                                                                  
                                                                                                                        
    proxy_read_timeout 300;                                                                                             
    proxy_connect_timeout 300;                                                                                          
    proxy_send_timeout 300;                                                                                             
    send_timeout 300;  

    ...
}

Did not work either.

This is a critical problem for us. If Parse Android SDK would just throw an exception instead of hanging it would make the situtation much better.

@parse-github-assistant
Copy link

parse-github-assistant bot commented Mar 23, 2022

Thanks for opening this issue!

  • 🚀 You can help us to fix this issue faster by opening a pull request with a failing test. See our Contribution Guide for how to make a pull request, or read our New Contributor's Guide if this is your first time contributing.

@mtrezza mtrezza changed the title Parse Android SDK hangs when issuing getData() call for large file on slow network. getData() stalls for large file on slow network Mar 23, 2022
@mtrezza mtrezza changed the title getData() stalls for large file on slow network Uploading with getData() stalls for large file on slow network Mar 23, 2022
@mtrezza
Copy link
Member

mtrezza commented Mar 23, 2022

  • It's not clear what you mean by "hanging indefinitely" - do you know whether the file upload is still progressing but maybe very slowly so that it just seems to stall indefinitely?
  • What behavior are you expecting if the upload speed is very slow?

@mtrezza mtrezza added the type:question Support or code-level question label Mar 23, 2022
@vzukanov
Copy link
Author

vzukanov commented Mar 24, 2022

Providing more information below.

First, when I wrote "hanging indefinitely", I meant that literally: the calling thread is blocked and never released. It's not about waiting a bit longer.

That's the state of my prefetch-thread as seen in the thread dump:

"prefetch-thread@17783" prio=5 tid=0x1e7 nid=NA waiting
  java.lang.Thread.State: WAITING
	 blocks prefetch-thread@17783
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:442)
	  at java.lang.Object.wait(Object.java:568)
	  at com.parse.boltsinternal.Task.waitForCompletion(Task.java:583)
	  - locked <0x45bc> (a java.lang.Object)
	  at com.parse.ParseTaskUtils.wait(ParseTaskUtils.java:28)
	  at com.parse.ParseFile.getData(ParseFile.java:374)
	  at com.myapp.initialization.PrefetchDataFlowSync.prefetchMaterial(PrefetchDataFlowSync.java:233)

As a workaround, I refactored the code from calling ParseFile.getData() to using regular OkHttp and downloading the file from a URL obtained with ParseFile.getUrl():

            Request request = new Request.Builder().url(materialFile.getUrl()).build();
            OkHttpClient client = new OkHttpClient.Builder().build();
            try (Response response = client.newCall(request).execute()) {
                fileHelper.writeDataToFile(response.body().byteStream(), localMaterialFilePath);
            } catch (IOException e) {
                MyLogger.e("failed to cache the file: " + fileName, e);
            }

This resulted in this explicit TimeoutException:

2022-03-24 10:48:41.114 6601-6816/com.myapp.android E/PrefetchDataFlowSync: [prefetch-thread] failed to cache the file: f7040f68aa136062a2e9aa9ab832e630_Shvilim_14.pdf
    java.net.SocketTimeoutException: timeout
        at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.kt:677)
        at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.kt:686)
        at okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.kt:143)
        at okhttp3.internal.http2.Http2ExchangeCodec.readResponseHeaders(Http2ExchangeCodec.kt:96)
        at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:106)
        at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:79)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
        at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)
        at com.myapp.initialization.PrefetchDataFlowSync.prefetchMaterial(PrefetchDataFlowSync.java:236)
        at com.myapp.initialization.PrefetchDataFlowSync.prefetchMaterials(PrefetchDataFlowSync.java:207)
        at com.myapp.initialization.PrefetchDataFlowSync.prefetchDataSync(PrefetchDataFlowSync.java:108)
        at com.myapp.initialization.PrefetchDataFlow.lambda$prefetchDataAndNotify$0$PrefetchDataFlow(PrefetchDataFlow.java:51)
        at com.myapp.initialization.-$$Lambda$PrefetchDataFlow$qV9NPeJSXMVjJSRt70U9cghW590.run(Unknown Source:2)
        at java.lang.Thread.run(Thread.java:923)

Then I tried to increase the timeouts of this custom OkHttpClient to 60 seconds (just like I tried with Parse's OkHttpClient), but the thread got stuck again. Just to be sure, I let the app run for another 10 minutes. Nothing. Similar picture in the thread dump:

"prefetch-thread@17606" prio=5 tid=0x27b nid=NA waiting
  java.lang.Thread.State: WAITING
	 blocks prefetch-thread@17606
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:442)
	  at java.lang.Object.wait(Object.java:568)
	  at okhttp3.internal.http2.Http2Stream.waitForIo$okhttp(Util.kt:550)
	  at okhttp3.internal.http2.Http2Stream$FramingSource.read(Http2Stream.kt:378)
	  - locked <0x45a6> (a okhttp3.internal.http2.Http2Stream)
	  at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.kt:276)
	  at okio.RealBufferedSource$inputStream$1.read(RealBufferedSource.kt:158)
	  at java.io.InputStream.read(InputStream.java:101)
	  at java.nio.file.Files.copy(Files.java:2908)
	  at java.nio.file.Files.copy(Files.java:3027)
	  at com.myapp.common.files.FileHelper.writeDataToFile(FileHelper.java:175)

As a last resort I tried to override the version of OkHttp used by Parse (4.9.1) to the latest 4.9.3, but it didn't change the situation.

IMO, all this means that the problem is either in OkHttp, or my backend, or both. Even assuming that the backend misbehaves, OkHttp should throw TimeoutException instead of hanging, right?

Empirically, I found that the threshold between TimeoutException and hanging is around 30 seconds (for timeout setting of 25 seconds I get the exception, but for 30 the thread gets stuck).

I guess the next steps from my side should be:

  1. Bring OkHttp's maintainers' attention to this issue and ask for their input
  2. Open a ticket for Parse Server to debug my backned issue (look like something there, potentially Parse Server itself, fails on long requests)

Any additional ideas?

@vzukanov
Copy link
Author

In addition, @mtrezza, the new title you assigned to this issue can be misleading because I'm having troubles with downloading the data, not uploading.

@swankjesse
Copy link

Is there a URL I can use to reproduce? What's your nginx version?

@vzukanov
Copy link
Author

@swankjesse , thanks for jumping in.

I'm using Ngnix 1.20.2 (LMK if you want to see its config files).

I wouldn't like to share our URLs here, so I'll PM a URL to a test file to you on Twitter.

For your convenience, that's the minimal code sample that reproduces the problem on my end (regardless of Parse):

        String threadName = "test-thread";
        new Thread(() -> {
            Log.d(threadName, "caching the file");
            Request request = new Request.Builder().url("<URL>").build();
            OkHttpClient client = new OkHttpClient.Builder()
                    .readTimeout(40, TimeUnit.SECONDS)
                    .writeTimeout(40, TimeUnit.SECONDS)
                    .build();
            try (Response response = client.newCall(request).execute()) {
                String localFilePath = getContext().getFilesDir().getPath() + File.separator + "test.pdf";
                fileHelper.writeDataToFile(response.body().byteStream(), localFilePath);
                Log.d(threadName, "cached material file at: " + localFilePath);
            } catch (IOException e) {
                Log.e(threadName, "failed to cache the file", e);
            }
        }, threadName).start();

@mtrezza mtrezza changed the title Uploading with getData() stalls for large file on slow network Downloading with getData() stalls for large file on slow network Mar 24, 2022
@mtrezza mtrezza changed the title Downloading with getData() stalls for large file on slow network Download with getData() stalls for large file on slow network Mar 24, 2022
@mtrezza
Copy link
Member

mtrezza commented Mar 24, 2022

I'm having troubles with downloading the data, not uploading.

Thanks for noting and providing the additional details.

Even assuming that the backend misbehaves, OkHttp should throw TimeoutException instead of hanging, right?

Yes I would say so.

@mtrezza mtrezza added type:bug Impaired feature or lacking behavior that is likely assumed and removed type:question Support or code-level question labels Mar 24, 2022
@parse-github-assistant
Copy link

The label type:bug cannot be used in combination with type:question.

@parse-github-assistant parse-github-assistant bot removed the type:bug Impaired feature or lacking behavior that is likely assumed label Mar 24, 2022
@mtrezza mtrezza added type:question Support or code-level question type:bug Impaired feature or lacking behavior that is likely assumed and removed type:question Support or code-level question labels Mar 24, 2022
@vzukanov
Copy link
Author

Ok, something interesting just happened: after I constructed the minimal test case for @swankjesse and sent the previous reply, my test case did throw TimeoutException:

2022-03-24 15:21:51.116 5852-5880/com.myapp.android D/test-thread: caching the file
2022-03-24 15:26:16.176 5852-5880/com.myapp.android E/test-thread: failed to cache the file
    java.net.SocketTimeoutException: timeout

However, as you can see from the timestamps, this happened after more than 4 minutes, even though the timeout is configured to 40 seconds. I wonder if this can be it: OkHttp doesn't hang, but the actual timeout is much longer than the specified configuration.
I'll try to gather metrics with other configs now.

@vzukanov
Copy link
Author

There is definitely something to think about here. The actual timeout seems to be much higher than the configured timeout and this relationship isn't linear:
timeout

At this point, I'm pretty sure that this has little to do with Android SDK. @mtrezza, please consider migrating this ticket to Parse Server (if that's an option).

@mtrezza
Copy link
Member

mtrezza commented Mar 24, 2022

The URL you're downloading from is pointing to Parse Server or to a external source?

please consider migrating this ticket to Parse Server

In which case could that be a Parse Server issue? The Parse Android SDK should ultimately decide for itself when to timeout. Or let the underlying HTTP lib timeout for it.

@vzukanov
Copy link
Author

@mtrezza , it's on Parse Server. I can send it to you via some private channel if you'd like to have a look yourself.

@mtrezza
Copy link
Member

mtrezza commented Mar 24, 2022

Could it be that the timeout you are referring to means "time since last data package received" and on your slow network there keeps data trickling in and therefore the timeout never occurs but also the download completion is not reached yet after a long time? Did you monitor your network to see whether there are packets being transmitted?

Again, I cannot see how it would mKe a difference from where the file is downloaded, the client controls when to timeout, not the server.

@swankjesse
Copy link

Yep, readTimeout is the maximum delay between bytes received; callTimeout is the max duration for the whole call.

@mtrezza
Copy link
Member

mtrezza commented Mar 24, 2022

@vzukanov Your read timeout is 40s in you example. You can test that by capping the connection between client and server during an ongoing data transfer and waiting for 40s.

If you set a call timeout to for example 10s, you can verify that by slowing down the network so that it would take longer than 10s, but after 10s the timeout should already occur.

My suspicion is that what you described as "hanging" is actually an ongoing data transfer at very slow speed.

@vzukanov
Copy link
Author

@mtrezza, I haven't reached down into TCP layer yet. Maybe tomorrow (hopefully we'll have a better hypothesis by then).

Just to give a bit more context: I'm using Android emulator to throttle the network speed. This is the "highest" setting that still suffers from this problem:

image

According to what I saw, HSDPA amounts to ~14000 Kbps. So, given this speed, a ~20MB file should be downloaded in ~12 seconds. That's not what I see.

Now, piggy-backing on @swankjesse comment, we could hypothesize that there is some weird discontinuity in the stream of the data from the server, which leads to very long download times. However, it immediately rises two questions:

  1. Why we don't get the same effect on higher-speed networks?
  2. Since the longer the timeout setting, the longer it takes to see the timeout, it would imply that the rate of "packets transfer" slows down over time. My gut feeling says that's not something we should expect?

Out of curiousity I set the timeouts to 120 seconds and initiated the same flow. It's been downloading this file for about 30 minutes now and still didn't complete.

Taking into account the fact that the same file is downloaded in ~8 seconds if I switch the emulator to LTE (~180000 kbps), I'd say there is something odd going on with either my Parse Server or Nginx or some other component. I just don't see how this can be explained by just "slow speed". We also experienced this problem on real devices connected to WiFi networks.

So, if we rule out OkHttp as a contributor, what could cause something like what I see on Parse Server side?

@mtrezza
Copy link
Member

mtrezza commented Mar 24, 2022

Nothing I can think of. A client has full autonomy over its decision to timeout. If you run your minimal example with a different URL, e.g. directly to where the file is stored, do you observe the same behavior?

@swankjesse
Copy link

I couldn’t reproduce this with OkHttp on my desktop, either at full speed or throttled to 3G. The window update frames are perfect so I think that’s an unlikely culprit.

full_speed_frames.txt
throttled_frames.txt

Next best guess: the Android emulator might be hurting you here.

@mtrezza
Copy link
Member

mtrezza commented Mar 25, 2022

@vzukanov Simulator is a good tip. Maybe try on a real device for comparison? If you enable the Developer menu in the Android settings, you may also be able to simulate a slow network. Otherwise you could connect to wifi and slow the network down for the device on the router side.

@vzukanov
Copy link
Author

vzukanov commented Mar 25, 2022

I'm pretty sure this isn't related to the emulator. I started this investigation after we saw this problem on multiple different real devices located in different cities, connected through either WiFi or mobile networks.
Just to be sure, I also tried to download with curl with speed limit. It gets stuck (download speed drops to 0):
image

Today I woke up and realized that I forgot about another component in my setup: Digital Ocean. So, I'll try to move the backend to a new droplet in another region and report back.

@vzukanov
Copy link
Author

After additional testing, I now understand why Curl output in my previous comment showed "current speed" of 0 for prolonged periods of time. This is to be expected because I set very low speed limit, so Curl throttled itself down to 0 to achieve an average speed of 2k.
I moved our backend to another droplet and performed additional tests. Seems to be working as expected now. DigitalOcean reported some problems and work in FRA1 region (where our old droplet resided), so this might be related:
image

I'll perform thorough testing and comparison between the new droplet and the old one and write a detailed summary of my findings in this thread later.

@azlekov
Copy link
Contributor

azlekov commented Apr 21, 2022

So, @vzukanov to summary - the downloads stalls because of configuration outside the scope of this SDK? Do you think there's something to be adjusted here?

@vzukanov
Copy link
Author

Sorry, forgot to report back. So, that's what I think happened:
This issue began as a real issue that affected our application on real devices and then I used Android emulator to throttle the network speed for testing. After migrating the server to a new droplet in another region, I compared the results of my "test case" on the emualtor with a real device and I think that @swankjesse was right: Android emulator tricked me while I investigated this issue. The timeout issue seems to be the artifact of the emulator (probably related to the internal implementation of its "throttling" feature).
At this point, our app works and is stable. I don't know why it misbehaved in the first place and, most probably, I'll never know. I don't think our troubles had anything to do with either Parse Android SDK or OkHttp. Therefore, I'm closing this issue.
Thank you all for your assistance and sorry for consuming that much of your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Impaired feature or lacking behavior that is likely assumed
Projects
None yet
Development

No branches or pull requests

4 participants