JProf profile of jrgm's tests, 2001-11-08

By David Baron

This is an analysis of a jprof profile of running jrgm's tests [Netscape internal] on a build from 2001-11-08 (or maybe 11-07, I'm not sure) with my patch for bug 83836 in it, but no other changes. I used the default settings (5 runs through the pages, allowing the browser to cache), except an interval of 500ms. The profile was done with JPROF_FLAGS set to JP_REALTIME JP_PERIOD=0.002 JP_DEFER. My machine is a single-processsor 1GHz Pentium III with an IDE hard disk on the Netscape network (recently upgraded), running RedHat Linux 7.1. The stack traces on my machine occasionally seem to skip a stack frame here and there -- there are some artifacts that are noticeable. It's probably because I'm using gcc3.

I split up the profile using jprof's -i and -e options. For each split done with -ifunction, -efunction was used for all splits following (except for two splits provided as additional information). The numbers on the files indicate the order of the splits, and the functions split on were the following:

0poll (not included in any profiles)
1RuleProcessorData::RuleProcessorData(nsIPresContext*, nsIContent*, nsRuleWalker*, nsCompatibility*)
2StyleSetImpl::FileRules(int (*)(nsISupports*, void*), RuleProcessorData*)
3nsViewManager::Refresh(nsView*, nsIRenderingContext*, nsIRegion*, unsigned)
4ViewportFrame::Reflow(nsIPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned&)
5nsGIFDecoder2::ProcessData(unsigned char*, unsigned)
6DocumentViewerImpl::Destroy()
7js_GC
8nsContentTreeOwner::SetStatus(unsigned, unsigned short const*)
9nsDocShell::PersistLayoutHistoryState()
10nsFontCache::GetMetricsFor(nsFont const&, nsIAtom*, nsIFontMetrics*&)
11nsParser::Tokenize(int)
12nsImageFrame::LoadImage(nsAString const&, nsIPresContext*, imgIRequest*)
13nsCSSFrameConstructor::InitAndRestoreFrame(nsIPresContext*, nsFrameConstructorState&, nsIContent*, nsIFrame*, nsIStyleContext*, nsIFrame*, nsIFrame*)
14nsCSSFrameConstructor::ConstructFrame(nsIPresShell*, nsIPresContext*, nsFrameConstructorState&, nsIContent*, nsIFrame*, nsFrameItems&)
15HTMLStyleSheetImpl::SetAttributeFor(nsIAtom*, nsHTMLValue const&, int, nsIHTMLContent*, nsIHTMLAttributes*&)
16nsScriptSecurityManager::CheckPropertyAccessImpl(unsigned, nsIXPCNativeCallContext*, JSContext*, JSObject*, nsISupports*, nsIURI*, nsIClassInfo*, long, char const*, char const*, void**)
17nsGenericElement::HasMutationListeners(nsIContent*, unsigned)
18nsStyleContext::GetStyleData(nsStyleStructID)
19HTMLContentSink::AddAttributes(nsIParserNode const&, nsIHTMLContent*, int)
20HTMLContentSink::CreateContentObject(nsIParserNode const&, nsHTMLTag, nsIDOMHTMLFormElement*, nsIWebShell*, nsIHTMLContent**)
21CNavDTD::CanContain(int, int) const
22js_LookupProperty
23imgRequest::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned, unsigned)
24XPTC_InvokeByIndex
25XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
26_end
27nsContentList::ContentAppended(nsIDocument*, nsIContent*, int)
28PresShell::StyleSheetAdded(nsIDocument*, nsIStyleSheet*)
29CSSParserImpl::Parse(nsIUnicharInputStream*, nsIURI*, nsICSSStyleSheet*&)
30nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, nsAFlatString const&)
31nsDiskCacheMap::DeleteStorage(nsDiskCacheRecord*, int)
32nsContainerFrame::ReflowChild(nsIFrame*, nsIPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, int, int, unsigned, unsigned&)
33nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*)
34nsParser::BuildModel()
35DocumentViewerImpl::Close()
36nsDocShell::CreateContentViewer(char const*, nsIRequest*, nsIStreamListener**)

The complete list of the resulting profiles is:

00.html
01-StyleRes-RPD.html
02-StyleRes-FileRules.html
03-Paint.html
04-32-All-Reflow-except-GetStyleData.html
04-32-All-Reflow.html
04-Reflow.html
05-GIFDecode.html
06-DocViewer-Destroy.html
07-GC.html
08-SetStatus.html
09-SHistory-State.html
10-FontMetrics-ALL.html
10-FontMetrics-REMAINING.html
11-Tokenize.html
12-LoadImage.html
13-OtherFrameInit.html
14-OtherFrameCtor.html
15-HTMLAttributes.html
16-ScriptSecurity.html
17-HasMutationListeners.html
18-GetStyleData-ALL.html
18-GetStyleData-PARTIAL.html
19-AddAttributes.html
20-CreateContentObject.html
21-CanContain.html
22-js_LookupProperty.html
23-OtherImageDecode.html
24-CallsViaXPC.html
25-XPCInternal.html
26-XServer.html
27-ContentList-ContentAppended.html
28-StyleSheet-Reconstruct.html
29-CSSParse.html
30-EvaluateScript.html
31-DiskCacheMap-DeleteStorage.html
32-DeepReflow.html
33-XPCWrappedJSExecution.html
34-OtherContentSink.html
35-DocViewer-Close.html
36-Other-CreateContentViewer.html
37.html

Although splitting the profiles can help "blame" various modules, the split doesn't necessarily assign blame to the module at the root of the split. However, the splitting is definitely useful to make analysis of the profiles at a larger level more manageable.

Going through this list, the first relevant one is the complete profile. You can always go back to this one if you think something has been split out above a split that you're interested in. It's very difficult to make any sense out of this profile on a large scale. However, it can be quite useful for analyzing low-level problems, such as callers that do too much string appending. The total number of timer hits in this profile was 16651. (Note that a number of the stacks overflowed past jprof's limit of stack depth, 100, which I should really raise one of these days.)

Looking first at style resolution, 2.6% of the total was spent building up the RuleProcessorData for enumeration and another 6.6% of the time was spent going through matching selectors (and all the glue code in between).

8.9% of the total time was spent painting.

The "official" split (11.9%) covering reflow turned out to be inaccurate because many of the hits in reflow overflowed jprof's stack buffer. I took a later split (2.4%) to cover these. However, I had split out GetStyleData in between. This means the best splits to look for for reflow are one covering all reflow (16.0%) and one covering all reflow except GetStyleData (13.6%).

GIF decoding was 2.9% and other image decoding was 0.9%.

The document viewer's Destroy method was 2.9%. This was mostly destruction of the frame tree and related objects.

Javascript garbage collection was 1.9%, and was dominated by marking. We currently force GC on page transitions.

nsContentTreeOwner::SetStatus was 0.3%.

Capturing of state for session history was 0.3%.

The "official" split (0.8%) for font metrics initialization omitted a good bit of it, so it probably makes more sense to look at all of it (2.1%).

Tokenization in the HTML parser was 3.7%.

Handling of image loads was 8.7%, although this doesn't count the amount of time spent in poll waiting for them to arrive off the network (although this was mostly cached runs) because we kick off image loads from frame construction.

Other frame initialization (other than images) was 2.3%.

The remaining time within frame construction was 7.3%.

Script security checks took 2.1% of the time.

Checking for mutation listeners took 0.6% of the time, almost half of it doing a QueryInterface on the window object.

The "official" split (1.0%) for GetStyleData excluded a good bit of it, so it's probably more interesting to look at all the time spent in GetStyleData (4.6%). Bug 109261 only accounted for 0.3% here, although it accounts for more when the style system is used for the UI.

(Excluding things above) Setting attributes in the HTML stylesheet took 0.7% of the time. Other addition of attributes through the content sink took 2.9% of the time.

(Excluding things above) HTMLContentSink::CreateContentObject took 1.4% of the time.

CNavDTD::CanContain took 0.3% of the time.

js_LookupProperty took 2.2% of the time.

Various calls from JS to C++ and the things they called took 3.2% of the time. XPConnect internals involved in making those calls took 1.5% of the time.

Things that connected to the X server (excluding painting above and anything else above) took 2.3% of the time (well, there's a tiny bit of other stuff in this split as well).

nsContentList::ContentAppended took 0.7% of the time.

Frame reconstruction due to stylesheet loads (eek, should this happen?) took 0.3% of the time.

CSS parsing took 0.2% of the time (despite that our CSS parser is pretty slow, including its building up of data structures).

Evaluation of webpage javascript (inline, not through timeouts) took 1.0% of the time.

nsDiskCacheMap::DeleteStorage took 0.3% of the time.

Execution of JS code called from C++ took 1.9% of the time.

Other parser and content sink activity took 7.0% of the time.

DocumentViewerImpl::Close took 1.0% of the time. Much of it was propagating SetDocument calls. (Do we still need these?)

Other things that happened during nsDocShell::CreateContentViewer took 0.7% of the time.

I could not classify the remaining 4.1% of the time.