Last week, we had a problem with one of our processes hanging and burning 100% CPU. The first time it happened we chalked it up to mysteries of the universe and restarted the process (a time-honored startup tradition), but the second time, I actually got off my ass and investigated.
Through the miracle of
jstack, I could look at the stack trace of a currently running Java process. This is what I found:"main" prio=10 tid=0x0a030800 nid=0x10a6 runnable [0xb7d7d000..0xb7d82218]
java.lang.Thread.State: RUNNABLE
at java.util.SubList$1.nextIndex(AbstractList.java:713)
at java.util.SubList$1.nextIndex(AbstractList.java:713)
...snip... about 100 lines
at java.util.SubList$1.hasNext(AbstractList.java:691)
at java.util.SubList$1.next(AbstractList.java:695)
at java.util.SubList$1.next(AbstractList.java:696)
...snip... about 100 lines
at com.pressflip.pipeline.standard.deduper.ShingleDupeDetector.
dedupBatch(ShingleDupeDetector.java:139)
at com.pressflip.pipeline.standard.deduper.DeduperPipelineStep.
innerProcess(DeduperPipelineStep.java:115)
... and right down to the main() from here.
The suspect line in all this is ShingleDupeDetector.java:139, which is one of those how-the-hell-are-you-hanging-on-this lines:
for (Integer x : someCollectionOfIntegers) {
So what the shit, right?
I was using this collection as a cache of sorts, where on every run, I chopped some data off the front of it and added some data to the back, keeping the collection size constant. To accomplish this, I used the subList method on java.util.List, something like this:
someCollectionOfIntegers = someCollectionOfIntegers.subList(fromIndex,
someCollectionOfIntegers.size());
someCollectionOfIntegers.addAll(incoming);
Well it turns out that subList didn't do what I thought it did. I assumed that I just got a new List that contained the elements in the given range of the original. Oh no, subList returns a view of the original list where only elements in the given range are addressable. A look at AbstractList.java's source reveals this:
public List<E> subList(int fromIndex, int toIndex) {
return new SubList<E>(this, fromIndex, toIndex);
}
And the SubList object keeps a reference to this, as well as an offset to know where iteration starts, so as I updated the "cache", iterating over it became recursive. Oh, balls. That's why it's running slow.