yogi.jpg
Don't ever try to do anything sneaky when you're programming.  It will always bite you in the ass.  If you still want to be sneaky, read the documentation.

Last week, we had a problem with one of our processes hanging and burning 100% CPU.  The first time it happened we chalked it up to mysteries of the universe and restarted the process (a time-honored startup tradition), but the second time, I actually got off my ass and investigated.

Through the miracle of jstack, I could look at the stack trace of a currently running Java process.  This is what I found:



"main" prio=10 tid=0x0a030800 nid=0x10a6 runnable [0xb7d7d000..0xb7d82218]
   java.lang.Thread.State: RUNNABLE
     at java.util.SubList$1.nextIndex(AbstractList.java:713)
     at java.util.SubList$1.nextIndex(AbstractList.java:713)

...snip... about 100 lines

     at java.util.SubList$1.hasNext(AbstractList.java:691)
     at java.util.SubList$1.next(AbstractList.java:695)
     at java.util.SubList$1.next(AbstractList.java:696)

...snip... about 100 lines

     at com.pressflip.pipeline.standard.deduper.ShingleDupeDetector.
dedupBatch(ShingleDupeDetector.java:139) at com.pressflip.pipeline.standard.deduper.DeduperPipelineStep.
innerProcess(DeduperPipelineStep.java:115) ... and right down to the main() from here.

The suspect line in all this is ShingleDupeDetector.java:139, which is one of those how-the-hell-are-you-hanging-on-this lines:

for (Integer x : someCollectionOfIntegers) {

So what the shit, right?

I was using this collection as a cache of sorts, where on every run, I chopped some data off the front of it and added some data to the back, keeping the collection size constant.  To accomplish this, I used the subList method on java.util.List, something like this:


someCollectionOfIntegers = someCollectionOfIntegers.subList(fromIndex,
                                                    someCollectionOfIntegers.size());
someCollectionOfIntegers.addAll(incoming);

Well it turns out that subList didn't do what I thought it did.  I assumed that I just got a new List that contained the elements in the given range of the original.  Oh no, subList returns a view of the original list where only elements in the given range are addressable.  A look at AbstractList.java's source reveals this:

 
public List<E> subList(int fromIndex, int toIndex) {
        return new SubList<E>(this, fromIndex, toIndex);
}

And the SubList object keeps a reference to this, as well as an offset to know where iteration starts, so as I updated the "cache", iterating over it became recursive.  Oh, balls.  That's why it's running slow.