如何处理 :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() 在 10 秒错误后超时?

How to handle :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 10 seconds errors?(如何处理 :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() 在 10 秒错误后超时?)

本文介绍了如何处理 :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() 在 10 秒错误后超时?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在 GcWatcher.finalize、BinderProxy.finalizePlainSocketImpl.finalize 中看到了许多 TimeoutExceptions.其中 90% 以上发生在 Android 4.3 上.我们收到了来自 Crittercism 的来自现场用户的报告.

We're seeing a number of TimeoutExceptions in GcWatcher.finalize, BinderProxy.finalize, and PlainSocketImpl.finalize. 90+% of them happen on Android 4.3. We're getting reports of this from Crittercism from users out in the field.

错误是:com.android.internal.BinderInternal$GcWatcher.finalize() 10 秒后超时"

java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 10 seconds
at android.os.BinderProxy.destroy(Native Method)
at android.os.BinderProxy.finalize(Binder.java:459)
at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:187)
at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:170)
at java.lang.Thread.run(Thread.java:841)

到目前为止,我们还没有运气在内部重现问题或找出可能导致问题的原因.

So far we haven't had any luck reproducing the problem in house or figuring out what might have caused it.

有什么想法会导致这种情况吗?知道如何调试它并找出应用程序的哪个部分导致这种情况吗?任何能阐明这个问题的东西都会有所帮助.

Any ideas what can cause this? Any idea how to debug this and find out which part of the app causes this? Anything that sheds light on the issue helps.

更多堆栈跟踪:

1   android.os.BinderProxy.destroy  
2   android.os.BinderProxy.finalize Binder.java, line 482
3   java.lang.Daemons$FinalizerDaemon.doFinalize    Daemons.java, line 187
4   java.lang.Daemons$FinalizerDaemon.run   Daemons.java, line 170
5   java.lang.Thread.run    Thread.java, line 841  

2

1   java.lang.Object.wait   
2   java.lang.Object.wait   Object.java, line 401
3   java.lang.ref.ReferenceQueue.remove ReferenceQueue.java, line 102
4   java.lang.ref.ReferenceQueue.remove ReferenceQueue.java, line 73
5   java.lang.Daemons$FinalizerDaemon.run   Daemons.java, line 170
6   java.lang.Thread.run

3

1   java.util.HashMap.newKeyIterator    HashMap.java, line 907
2   java.util.HashMap$KeySet.iterator   HashMap.java, line 913
3   java.util.HashSet.iterator  HashSet.java, line 161
4   java.util.concurrent.ThreadPoolExecutor.interruptIdleWorkers    ThreadPoolExecutor.java, line 755
5   java.util.concurrent.ThreadPoolExecutor.interruptIdleWorkers    ThreadPoolExecutor.java, line 778
6   java.util.concurrent.ThreadPoolExecutor.shutdown    ThreadPoolExecutor.java, line 1357
7   java.util.concurrent.ThreadPoolExecutor.finalize    ThreadPoolExecutor.java, line 1443
8   java.lang.Daemons$FinalizerDaemon.doFinalize    Daemons.java, line 187
9   java.lang.Daemons$FinalizerDaemon.run   Daemons.java, line 170
10  java.lang.Thread.run

4

1   com.android.internal.os.BinderInternal$GcWatcher.finalize   BinderInternal.java, line 47
2   java.lang.Daemons$FinalizerDaemon.doFinalize    Daemons.java, line 187
3   java.lang.Daemons$FinalizerDaemon.run   Daemons.java, line 170
4   java.lang.Thread.run

推荐答案

完全公开 - 我是前面提到的 TLV DroidCon 演讲的作者.

Full disclosure - I'm the author of the previously mentioned talk in TLV DroidCon.

我有机会在许多 Android 应用程序中检查了这个问题,并与遇到它的其他开发人员讨论了这个问题 - 我们都达成了同一点:这个问题无法避免,只能最小化.

I had a chance to examine this issue across many Android applications, and discuss it with other developers who encountered it - and we all got to the same point: this issue cannot be avoided, only minimized.

我仔细查看了 Android 垃圾收集器代码的默认实现,以更好地了解引发此异常的原因以及可能的原因.我什至在实验过程中发现了一个可能的根本原因.

I took a closer look at the default implementation of the Android Garbage collector code, to understand better why this exception is thrown and on what could be the possible causes. I even found a possible root cause during experimentation.

问题的根源在于设备进入睡眠"一段时间 - 这意味着操作系统已决定通过暂时停止大多数用户空间进程并关闭屏幕来降低电池消耗,减少 CPU 周期等.这样做的方式是在 Linux 系统级别上,其中进程在运行中暂停.这可以在正常应用程序执行期间的任何时间发生,但它将在本机系统调用处停止,因为上下文切换是在内核级别完成的.所以 - 这就是 Dalvik GC 加入故事的地方.

The root of the problem is at the point a device "Goes to Sleep" for a while - this means that the OS has decided to lower the battery consumption by stopping most User Land processes for a while, and turning Screen off, reducing CPU cycles, etc. The way this is done - is on a Linux system level where the processes are Paused mid run. This can happen at any time during normal Application execution, but it will stop at a Native system call, as the context switching is done on the kernel level. So - this is where the Dalvik GC joins the story.

Dalvik GC 代码(在 AOSP 站点的 Dalvik 项目中实现)不是一段复杂的代码.我的 DroidCon 幻灯片介绍了它的基本工作方式.我没有介绍的是基本的 GC 循环——收集器有一个要完成(和销毁)的对象列表.底层的循环逻辑可以这样简化:

The Dalvik GC code (as implemented in the Dalvik project in the AOSP site) is not a complicated piece of code. The basic way it work is covered in my DroidCon slides. What I did not cover is the basic GC loop - at the point where the collector has a list of Objects to finalize (and destroy). The loop logic at the base can be simplified like this:

  1. starting_timestamp
  2. 删除对象以获取要释放的对象列表,
  3. 释放对象 - finalize() 并在需要时调用原生 destroy()
  4. end_timestamp
  5. 计算 (end_timestamp - starting_timestamp) 并与硬编码的 10 秒超时值进行比较,
  6. 如果超时 - 抛出 java.util.concurrent.TimeoutException 并终止进程.
  1. take starting_timestamp,
  2. remove object for list of objects to release,
  3. release object - finalize() and call native destroy() if required,
  4. take end_timestamp,
  5. calculate (end_timestamp - starting_timestamp) and compare against a hard coded timeout value of 10 seconds,
  6. if timeout has reached - throw the java.util.concurrent.TimeoutException and kill the process.

现在考虑以下场景:

应用程序在做它的事情时运行.

Now consider the following scenario:

Application runs along doing its thing.

这不是面向用户的应用程序,它在后台运行.

This is not a user facing application, it runs in the background.

在此后台操作期间,对象被创建、使用并需要被收集以释放内存.

During this background operation, objects are created, used and need to be collected to release memory.

应用程序不会打扰 WakeLock - 因为这会对电池产生不利影响,而且似乎没有必要.

Application does not bother with a WakeLock - as this will affect the battery adversely, and seems unnecessary.

这意味着应用程序会不时调用 GC.

This means the Application will invoke the GC from time to time.

通常情况下,GC 运行会顺利完成.

Normally the GC runs is completed without a hitch.

有时(很少)系统会决定在 GC 运行过程中休眠.

Sometimes (very rarely) the system will decide to sleep in the middle of the GC run.

如果您运行应用程序的时间足够长,并密切监视 Dalvik 内存日志,就会发生这种情况.

This will happen if you run your application long enough, and monitor the Dalvik memory logs closely.

现在 - 考虑基本 GC 循环的时间戳逻辑 - 设备可以开始运行,获取 start_stamp,然后在 destroy() 对系统对象的本机调用.

Now - consider the timestamp logic of the basic GC loop - it is possible for the device to start the run, take a start_stamp, and go to sleep at the destroy() native call on a system object.

当它唤醒并继续运行时,destroy() 将完成,下一个 end_stamp 将是 destroy() 所用的时间通话+睡眠时间.

When it wakes up and resumes the run, the destroy() will finish, and the next end_stamp will be the time it took the destroy() call + the sleep time.

如果睡眠时间过长(超过10秒),会抛出java.util.concurrent.TimeoutException.

If the sleep time was long (more than 10 seconds), the java.util.concurrent.TimeoutException will be thrown.

我在分析 python 脚本生成的图表中看到了这一点 - 适用于 Android 系统应用程序,而不仅仅是我自己监控的应用程序.

I have seen this in the graphs generated from the analysis python script - for Android System Applications, not just my own monitored apps.

收集足够多的日志,你最终会看到它.

Collect enough logs and you will eventually see it.

这个问题无法避免 - 如果您的应用在后台运行,您就会遇到它.

The issue cannot be avoided - you will encounter it if your app runs in the background.

您可以通过使用 WakeLock 来缓解问题,并防止设备进入睡眠状态,但这完全是另一回事,而且是新的头痛,也许是另一个骗局中的另一个话题.

You can mitigate by taking a WakeLock, and prevent the device from sleeping, but that is a different story altogether, and a new headache, and maybe another talk in another con.

您可以通过减少 GC 调用来最小化问题 - 降低该场景的可能性(提示在幻灯片中).

You can minimize the problem by reducing GC calls - making the scenario less likely (tips are in the slides).

我还没有机会查看 Dalvik 2(又名 ART)GC 代码 - 它拥有新的 Generational Compacting 功能,或者在 Android Lollipop 上进行任何实验.

I have not yet had the chance to go over the Dalvik 2 (a.k.a ART) GC code - which boasts a new Generational Compacting feature, or performed any experiments on an Android Lollipop.

于 2015 年 7 月 5 日添加:

查看此崩溃类型的崩溃报告汇总后,Android 操作系统 5.0 及以上版本(带有 ART 的棒棒糖)的这些崩溃似乎仅占此崩溃类型的 0.5%.这意味着 ART GC 更改降低了这些崩溃的频率.

After reviewing the Crash reports aggregation for this crash type, it looks like these crashes from version 5.0+ of Android OS (Lollipop with ART) only account for 0.5% of this crash type. This means that the ART GC changes has reduced the frequency of these crashes.

添加于 2016 年 6 月 1 日:

看起来 Android 项目添加了很多关于 GC 在 Dalvik 2.0(又名 ART)中如何工作的信息.

Looks like the Android project has added a lot of info on how the GC works in Dalvik 2.0 (a.k.a ART).

您可以在这里阅读 - 调试 ART 垃圾收集.

You can read about it here - Debugging ART Garbage Collection.

它还讨论了一些工具来获取有关您应用的 GC 行为的信息.

It also discusses some tools to get information on the GC behavior for your app.

向您的应用程序进程发送 SIGQUIT 实质上会导致 ANR,并将应用程序状态转储到日志文件以供分析.

Sending a SIGQUIT to your app process will essentially cause an ANR, and dump the application state to a log file for analysis.

这篇关于如何处理 :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() 在 10 秒错误后超时?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:如何处理 :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() 在 10 秒错误后超时?

基础教程推荐