Hacker Newsnew | past | comments | ask | show | jobs | submit | dahlia1's commentslogin

Fantastic. I know I can view the source but is packaged up somewhere with a readme?


Was about to ditch the page (due to the endless list) just before hitting the search bar with the drop down categories.

Make it obvious that the items are categorized.


I smell something rotten. Hide a microSD in a cake? If security and customs missed a cake going thru you think they would manage finding it stuck in your shoe?


You could tape a micro-SD card to the bottom of your foot, under your sock, and the TSA would never find it. Just opt for the pat-down, and they have never checked the bottom of my feet before.


"Near the end insertion rate slowed down thanks to excesive GC activity"

Can someone explain why this happens?


The article doesn't say what the -Xms was. If you set it to less than Xmx, the VM will increase the memory size only when it's really necessary - it will issue a GC before trying to increase it. A GC on a heap of 100M objects takes a while.

Even when you reach the Xmx the VM will try to GC, and sometimes it can free a bit of memory so it doesn't throw an OOM, insert another 10 items and run the GC again, and so on - as long as you allocate small objects, it can take a while until you see the OOM.


With each consecutive insert in the collection, the available memory shrinks, the GC detects this, and attempts to free some of it. As long as more memory (the 15GB) can be allocated, this isn't much of a problem. However, at some point there is hardly any memory left: the 15GB is all but allocated, and very little memory can be freed with each run. So the GC runs ever more often, to the point that it starts to slow down the system.

Before you say "well that's stupid", realise that the alternative would throwing an OutOfMemoryError, in which case the JVM would terminate. A slow JVM is better than no JVM.


I disagree. A slow JVM can go on and on and the perception may be that the system is up. I think in this scenario a fail fast is much more interesting, since in a normal system, the fact that the JVM is down would trigger some manual intervention which seems like is needed anyways.


I might like the JVM to fail-fast rather than plod along in a bogged down state, but if I did, I wouldn't want it to wholesale crash (throwing an OutOfMemoryException), but instead have my monitoring system start to throttle the work its doing and get other processes to help out (scaling).

Indeed, you can do this; the JVM has simple facilities to report the memory use and heap status, which you can monitor and use to proactively scale around the bogged-down VM.


You can get such functionality in Oracle's JVM by using the command line option -XX:+UseGCOverheadLimit.

This will cause OOM-exception to be thrown when at least 98% of program time is used in garbage collection and at most 2% of the total memory is reclaimed in such collections.


Java GC is one of the finest pieces of engineering. It can be configured and tweaked in million[1] ways to provide one with the behaviour that he wants to see.

[1]http://www.oracle.com/technetwork/java/javase/tech/vmoptions...


The behavior you want, I think, kind of depends.

Regardless, since Java 6 (I think), if the GC time becomes excessive (where "excessive" means x% of time is spent in GC and less than y% of memory is reclaimed in each GC cycle), an OutOfMemoryError will be thrown. This can be disabled.


Failure as default is usually not a good policy for a runtime system.


Failure as a default is how Erlang manages to keep systems up with 5+ 9's reliability. The difference is that, with Erlang, the entire system and the applications built on it are designed around that.


When there is high memory pressure, the default behavior in Erlang is to fail rather than continue garbage-collection?


The Java GC uses a mark and sweep algorithm that has to stop the world to collect the garbage. As the number of items on the heap goes up, there are a higher number of valid pointers (less garbage) and the GC takes a long time to find even a little bit of garbage to clean up. So each GC run takes longer and there are more of them as the heap fills up. Given that the world is stopped during each GC run things take longer and hence the insertion rate is slower.


Java's garbage collector has advanced from pure stop-the-world mark-and-sweep collection over the years. In Java 7, a new generational garbage collector became the default, working concurrently with application threads to minimize GC pauses and collection time on multicore systems. If you're curious, check out these resources about how it works:

* http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-co...

* http://www.oracle.com/technetwork/java/javase/tech/g1-intro-...

Note also that the JRE ships with multiple garbage collectors that you can select manually, and that some of the collectors tune themselves to match the application.


What? In java there are multiple GCs available: ConcMarkSwep, G1GC, ParallelGC, and I believe a couple more. All you have to do is set the right CLI flag.

From the concurrent mark and sweep docs: "The concurrent mark sweep collector, also known as the concurrent collector or CMS, is targeted at applications that are sensitive to garbage collection pauses. It performs most garbage collection activity concurrently, i.e., while the application threads are running, to keep garbage collection-induced pauses short. The key performance enhancements made to the CMS collector in JDK 6 are outlined below. See the documents referenced below for more detailed information on these changes, the CMS collector, and garbage collection in HotSpot."

It does have to stop the world sometimes but those times should be quite rare!


If the CMS collector can't 'keep up', meaning it doesn't think it can avoid running out of heap space based on its duty cycle and the current memory state, it will do a full on complete, stop the world GC. Based on the article I'm guessing this is what is kicking in.


They never mentioned using CMS, and CMS isn't the default GC, so I assume they weren't using it.

CMS actually isn't the highest throughput GC, it's there for low-pause systems where there are extra cores available to perform GC in the background.


My point was that none of the GCs completely avoid full on stop everything, sweep and compact all the things GC cycles. They will all do this if they are in a position where they think the heap will be exhausted (specifically when a concurrent mode failure occurs). There really isn't anything else a GC can do in this situation other than throw an OOM exception. Thus any time your pushing the heap to its limits, regardless of what GC you use, you will see complete GC sweeps of the old generation.

If you want stricter failure requirements use GCTimeLimit and GCHeapFreeLimit. By default the JVM with throw an OOM error if it spends 98% of execution time in the GC and frees only 2% of the memory, GCTimeLimit and GCHeapFreeLimit switches allow this these values to be changed. Also only the 'stop the world' portions of a collection apply to the execution time limit, concurrent phases do not. So if you managed to keep the collector in concurrent mode (by cranking up the duty cycle for example) then only the 2 mini-pauses will count even if the concurrent portion were now pegging a CPU core at 100%.


Exactly. If there is too many pointers GC becomes very slow.


It is kind of a dumb test to do 'new HashMap' with the default initialCapacity then cram stuff into it in a tight loop.

It is basically testing constant rehashes which is obviously going to generate a ton of garbage

I suspect the default collection implementations would perform significantly better if they were used correctly


They still suck even if you set a large initialCapacity.


I expect that as memory usage increases, the resulting increase in GC pressure causes the GC to run more often. Most GCs that I know of are designed to have some memory free for the young generation (for fast, cheap allocations) and some memory free in the old generation (so objects can be moved from the young generation to the old generation). Maybe the JVM detects that it's approaching the limits of physical memory and starts GCing more often to avoid running out of space?

EDIT: From rereading the original post it seems more likely that it was approaching the heap size limit set when running the JVM and not a physical memory limit. I would expect the same principle applies, though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: