Skip to content

www: add log-collect for managing CloudFlare logs #679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Apr 10, 2017

Use the CloudFlare log fetching functionality to store download logs locally so we can get CF to cache everything and not have to manage logs on the server itself—the primary cause of our lack of resilience, we have a SPOF.

  • Store logs in /home/logs/nodejs.org/
  • One log per hour of traffic, maximum xz compressed, log file name format is -.log.xz
  • Use the CloudFlare logshare Go client to fetch logs, CF only makes the last 72 hours of logs available so we have a small window to ensure we get what we want
  • Pipe to filter through a simple Node.js script that parses the line-delimited JSON and discards entries that don't relate to our downloadable assets that we want to use for our metrics generation. This makes the compressed files go from around ~20Mb or more to 3-4Mb.
  • Unfortunately it's not a perfect API and occasionally results in zero-sized results or seemingly truncated results, but a retry is usually successful.
  • Fetch on the hour, discard any log file that looks suspiciously small and always re-fetch the last two because the most recent was always incomplete and the second-most recent could potentially be incomplete depending on the timing of the fetch (just to be sure!)
  • Every 12 hours, run a sanity checker across the recent batch of logs, check the last entry of each file and if that entry isn't near the top of the hour then discard the log and the fetcher will redo it again and hopefully get a better version

Yes it's quite convoluted but I'm trying to be careful that we get the highest quality logs since CF doesn't store them beyond 72 hours.

I have this in place now and I'm going to run it for a few days at least and check the quality of what it's got - whether we have a complete set of files and whether those files look complete in their content. I'll then try running metrics from them and see how they compare to what we are currently generating to make sure that the data is good.

@rvagg rvagg mentioned this pull request Apr 10, 2017
jbergstroem

This comment was marked as off-topic.

@rvagg
Copy link
Member Author

rvagg commented Nov 8, 2017

moving to #987

@rvagg rvagg closed this Nov 8, 2017
@rvagg rvagg deleted the cloudflare-logs branch November 8, 2017 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants