Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omit sources from generated Javadoc #7597

Open
cpovirk opened this issue Jan 3, 2025 · 8 comments
Open

Omit sources from generated Javadoc #7597

cpovirk opened this issue Jan 3, 2025 · 8 comments
Labels
P3 no SLO package=general type=api-docs Change/add API documentation

Comments

@cpovirk
Copy link
Member

cpovirk commented Jan 3, 2025

When we build Javadoc, we set linksource. That causes the Javadoc tool to generate links from text like public interface Multimap in https://guava.dev/Multimap. Those links go to pages like https://guava.dev/releases/snapshot-jre/api/docs/src-html/com/google/common/collect/Multimap.html#line-164.

I somewhat suspect that no one cares about these pages: If they want source, they'll look in GitHub or in the source jar for a release or snapshot.

Additionally, these pages force the generated Javadoc to change even for implementation changes (example output changes from input changes).

And any additional page presumably slows down downloading history, switching branches, and deploying the guava.dev. site.

I'm not sure I'll even bother to remove linksource, but I'm thinking about it, so let me know if that would cause a problem for you. If I do try it, and if it goes well, I might even consider removing some of the existing sources....

@cpovirk cpovirk added P3 no SLO package=general type=api-docs Change/add API documentation labels Jan 3, 2025
@cpovirk
Copy link
Member Author

cpovirk commented Jan 6, 2025

(If we cared enough, I'm sure we could rewrite the links to point to GitHub.)

@Brijeshthummar02
Copy link
Contributor

@cpovirk may be we can rewrite the links what are your thoughts on this also if we can fix it provide me path to the file.

@cpovirk
Copy link
Member Author

cpovirk commented Mar 20, 2025

The idea would be that the Javadoc for a release (e.g., https://guava.dev/releases/33.4.5-jre/api/docs/com/google/common/annotations/Beta.html) would link not to https://guava.dev/releases/33.4.5-jre/api/docs/src-html/com/google/common/annotations/Beta.html#line-36 but to https://github.com/google/guava/blob/v33.4.5/guava/src/com/google/common/annotations/Beta.java#L36 (and similarly for the Android link, which would go to https://github.com/google/guava/blob/v33.4.5/android/guava/src/com/google/common/annotations/Beta.java#L36).

Then the snapshot Javadoc (e.g., https://guava.dev/releases/snapshot-jre/api/docs/com/google/common/annotations/Beta.html) would link not to https://guava.dev/releases/snapshot-jre/api/docs/src-html/com/google/common/annotations/Beta.html#line-36 but to https://github.com/google/guava/blob/master/guava/src/com/google/common/annotations/Beta.java#L36 (and again, a similar variant for Android).

The dream would be that we can then wipe out all the src-html files that slow down everything. To really get the benefits, we might even want to rewrite the entire history of gh-pages so that the files don't exist anywhere in history.

(I just ran into trouble with this yesterday when I was trying to perform multiple Guava releases: I ended up with multiple copies of the Guava repo in my /tmp, and my machine is (for whatever reason) partitioned to put /tmp on a separate (small!) partition. That filled up /tmp, and most of the releases failed. I ended up rerunning with an environment variable set to place the temporary directories elsewhere.)

@Brijeshthummar02
Copy link
Contributor

Brijeshthummar02 commented Mar 22, 2025

@cpovirk after checking and rebuilding the project now there is no src-html.

i need to know more about it like how it can be done

To really get the benefits, we might even want to rewrite the entire history of gh-pages so that the files don't exist anywhere in history.

just general question what does the label P3 mean on this issue.

copybara-service bot pushed a commit that referenced this issue Mar 24, 2025
See #7597

Fixes #7737

RELNOTES=n/a
PiperOrigin-RevId: 739906243
copybara-service bot pushed a commit that referenced this issue Mar 24, 2025
See #7597

Fixes #7737

RELNOTES=n/a
PiperOrigin-RevId: 739920621
cpovirk pushed a commit that referenced this issue Mar 25, 2025
See #7597

Fixes #7737

RELNOTES=n/a
PiperOrigin-RevId: 739920621
@cpovirk
Copy link
Member Author

cpovirk commented Mar 28, 2025

As I mentioned last week, I filled up my /tmp partition as a result of a few Guava releases. Today, I discovered that I'd almost filled up my main partition, too. There are much bigger culprits than Guava (e.g., three separate variants of IntelliJ that I still have installed...), but Guava is very much Not Helping.

I ran some n=1 experiments to see how slow Guava currently makes various operations:

  • git clone [email protected]:google/guava.git, SSD filesystem: 27s
  • git clone [email protected]:google/guava.git, tmpfs filesystem: 24s
  • git checkout gh-pages, SSD filesystem: 52s
  • git checkout gh-pages, tmpfs filesystem: 19s

So I am going to see what I can do about that. Over the weekend, I am going to run:

git filter-branch --tree-filter ~/rewrite-gh-pages.sh gh-pages

(Yes, I am aware that filter-branch is heavily discouraged. But filter-repo is more complex (and I don't want to hack up lint-history), rebase --strategy-option=theirs --exec ... may lead to merge trouble [edit: though this workaround looks quite clever], and anything I work out with plumbing commands would amount to reinventing filter-branch. [edit: Maybe I should try git test fix from git-branchless?] I have backups, including both google/guava and cpovirk/guava on GitHub for now [edit: including an additional line of defense, a tag in my fork].)

As I write this, the contents of rewrite-gh-pages.sh are:

#!/bin/bash

set -u

if [[ ! -e releases ]]; then
  exit
fi

if [[ -e releases/snapshot-jre/api/docs/com/google/common/util/concurrent/ForwardingBlockingDeque.html && -e javadocshortcuts/forwardingblockingdeque/index.md ]]; then
  perl -pi -e 's"collect"util/concurrent"g' javadocshortcuts/forwardingblockingdeque/index.md javadocshortcuts/ForwardingBlockingDeque/index.md
fi

find releases -name src-html | xargs -r rm -r

( cd releases; for R in [0-9]*; do ( cd $R && shopt -s extglob && T=v${R%%?(-jre|-android)} && if [[ $R == *-android ]]; then P=android/; else P=""; fi && case $R in 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 7.0 | 8.0 | 9.0 ) G="" ;; * ) G=guava/ ;; esac && find -name '*.html' | xargs -r perl -pi -e "s!../../../../src-html/([^#]*)html#line[-.](\d+)!https://github.com/google/guava/blob/$T/${P}${G}src/\${1}java#L\$2!g" ); done )

( for R in releases/[0-9]*; do ( cd $R && for F in $(find api/docs -name '*.html'); do perl -lpi -e 's!^<title>!<link rel="canonical" href="https://guava.dev/releases/snapshot-jre/'$F'"><title>!' $F; done ); done )

if [[ ! -e releases/snapshot ]]; then
  exit
fi

find $(find releases/snapshot* -maxdepth 0 -type d) -name '*.html' | xargs -r perl -pi -e 's!<a href="../../../../src-html/[^>]*>(.*?)</a>!$1!g'

That will:

  • delete src-html
  • replace existing links to src-html with links to GitHub for releases, and delete them entirely from snapshots (since changes in line numbers in the links are a source of churn there even with the actual sources removed)
  • standardize guava.dev/forwardingblockingdeque to point to consistently point to the util.concurrent copy after that copy was introduced
  • insert a <link> element to attempt to make search engine prefer the head copy of Guava's Javadoc over random old versions. (e.g., a Google search for guava ascii currently has release 19.0 as its top result)
    • I should really have a plan to insert this into future releases' Javadoc.
    • Incidentally, if you're reading this and you haven't already heard, note that you can get to Guava's Javadoc by typing "guava.dev/api" and that you can get to the Javadoc for a type like ImmutableList by typing "guava.dev/immutablelist."

@cpovirk
Copy link
Member Author

cpovirk commented Mar 31, 2025

(I think things are going a little slower than Git projected, perhaps because the size of the files in the branch grows over the life of the repo as we publish more releases? But I had already seen on Friday that "over the weekend" was not going to be enough, so we're probably talking something like the difference between Monday afternoon and Wednesday morning. Anyway, I hope that the results turn out to look as expected, given how slow this is to run :))

@cpovirk
Copy link
Member Author

cpovirk commented Apr 3, 2025

We're still not there, and at some point tomorrow, my machine will get powered down and moved to my new desk location. We'll see if the rewrite completes in time! (most recently processed: 2024-07-24)

When I checked in on the progress, I was reminded of one more annoying bit of our Git history: commits of the following form:

 releases/snapshot-android/api/docs/member-search-index.zip  | Bin 49810 -> 49810 bytes
 releases/snapshot-android/api/docs/package-search-index.zip | Bin 322 -> 322 bytes
 releases/snapshot-android/api/docs/type-search-index.zip    | Bin 3363 -> 3363 bytes
 releases/snapshot-jre/api/docs/member-search-index.zip      | Bin 50705 -> 50705 bytes
 releases/snapshot-jre/api/docs/package-search-index.zip     | Bin 322 -> 322 bytes
 releases/snapshot-jre/api/docs/type-search-index.zip        | Bin 3363 -> 3363 bytes

Assuming that I pulled a proper example, that's showing no change to the actual files inside the zip, only the timestamps of those files. (Apparently #6322 doesn't help with that, perhaps because it's telling Maven to touch the resulting files, not telling javac to create everything with the desired timestamps in the first place?)

Fortunately, those files went away in b175c48, which was the result of our upgrade to the version of Javadoc from JDK 23, which contains the fix for JDK-8237909. So the problem exists "only" for a number of years back to 2cc42ec / 21d5422, which is when we upgraded our Javadoc version from JDK 8 to JDK 11.

If this current rewrite goes well, or even if it doesn't, I may try another round with the goal of eliminating the timestamp changes from the Git history, too.

And hmm, is there an opportunity to normalize timestamps from before #6322, too? It looks like no: The timestamp affects only the year. Well it affected more than that back before we added notimestamp in 578a794, but the snapshot Javadoc had barely been around at that point: It was added in 601a4af, churned (largely timestamps, if not entirely for all I know) in 086a718, and churned again (removing timestamps) in fb3fe0d. I could try to clean that up, too, if I really wanted, but I'm not sure that I do :)

@cpovirk
Copy link
Member Author

cpovirk commented Apr 4, 2025

Well, it completed in time! I haven't pushed it, nor have I updated it with the commits that landed since the process started.

Rewrite 15251cf9c6cafb5a7fc8bdd2078c0751b3ab1a54 (2414/2414) (620813 seconds passed, remaining 0 predicted)

->

real    10355m3.899s
user    6299m58.216s
sys     4408m47.318s

I notice that there's yet more that changes from commit to commit, since JDiff includes its timestamp and the directory name, as shown in ddab7ef. And I'm suspicious of some changes to the actual Javadoc's inherited-methods section.

Still, I see some now-empty commits, so that seems like a good sign. (I was expecting to be able to zap all of those with a simple git rebase, but somehow that's causing... merge conflicts? for a linear rebase? Sigh.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 no SLO package=general type=api-docs Change/add API documentation
Projects
None yet
Development

No branches or pull requests

2 participants