Skip to main content
TechAndTravel

Main navigation

  • Home
  • Norbert's blog
  • Contact
User account menu
  • Log in

Breadcrumb

  1. Home
  2. Blogs
  3. Norbert's blog

AWS Serverless: CloudFront Traffic Analysis

By Norbert, 5 May, 2025
  • Norbert's Blog

goacccess.io reportsAWS Serverless provides extensive CloudFront logging.  However, the logs are significantly different from standard Apache logs - they are not easy to work with, either for diagnosing issues or analysing traffic.  CloudFront logs are typically small - even a low-activity site may generate ten log files each hour.  The logs are stored in S3 buckets as gzip files.  Each log contains 33 tab-delimited fields,  

The log files can be downloaded via the S3 dashboard or using the AWS CLI to sync the S3 files to a local folder.  The open source package goaccess.io (available on most Linux distributions) can process the files, generating an HTML file displayable in a browser (image on the right). and a .csv file containing the data displayed in the HTML file.

The code to run goaccess.io is straightforward.  The first line selects the .gz files you want to process, the second decompresses each file into stdout, the third allows ignore records containing text that I want to ignore, the fourth selects traffic hitting specific folders I am interested in, and the sixth calls goaccess.io to generate the index.html file based on the last 100 days of data.   To generate the .csv file, change the last line to --output stats.csv.

find . -name "*2025*.gz" | \
   xargs gzip --decompress --stdout | \
   grep --invert-match --file=exclude.txt | \
   grep -E '/editions/zq|/letters/|/pdfs/' | \
   awk -F'\t' 'BEGIN{OFS=FS} { if ($20 != "-") $5=$20; print }' | \
   /usr/bin/goaccess \
       --log-format "%d\\t%t\\t%^\\t%b\\t%h\\t%m\\t%^\\t%r\\t%s\\t%R\\t%u\\t%^" \
       --date-format CLOUDFRONT \
       --time-format CLOUDFRONT \
       --keep-last=100 \
       --output index.html

Although CloudFront can provide security and content caching, I prefer to use Cloudflare.  The CloudFront log records contain the immediate origin of the traffic in field 5 ("c-ip"), which would be a Cloudflare proxy server.  Cloudflare passes the origin IP address which is stored in field 20 ("x-forward-for").  The awk call in the fifth line of code selectively copies field 20 into field 5.  The -F'\t' sets the FS (input Field Separator) to the horizontal tab character and the BEGIN{OFS=FS} block sets the OFS (Output Field Separator) to match FS.  The block { if ($20 != "-") $5=$20; print } tests if field 20 is not "-", and if so, copies field 20 into field 5 and writes the updated record back into the output stream. 

I initially tested awk without specifically defining the FS and OFS variables, since the CloudFront documentation was vague about the default logging field delimiter.  awk successfully parsed the log records but the output used blanks to delimit fields for any modified records, causing goaccess.io to fail.  I incorrectly assumed that awk would reuse the delimiter used to split fields as the output field delimiter.  Using https://www.kevssite.com/using-vi-as-a-hex-editor/, I determined that the field delimiter was x"09" (the horizontal tab).  The goaccess.io reports now properly reports the correct origin IP traffic.

 

Blog comments

About text formats

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
Blog tags
AWS
serverless
CloudFront
Cloudflare
logging
analysis
AWSability to repairserverlessGoogleCloudFrontenergyHydro Onedemand managementwatermonitoringtravelinnovationhackOntariomyEnergyRewardsdemand responsePayPalAndroidFlobetter by designAISidewalk LabsQuaysideWaterfront TorontoTorontoFind My DeviceSodaStreamsecurityEcobeeAWS Certificate ManagerS3right to repaircurrency exchange feesautomatic paymentsrecurring paymentshacksMicrosoftSwiftKeySwypecustomer serviceNiro EVMcGardwheel locksability to diagnoseExpediadark patternespressoDrinkmateink tanktax softwarecapital gainsCanadataxesprintinginkjetEpsoncloudInternetwritingcreativityPeak PerkssoftenerchainsawHonda GCVHusqvarnagood designanalysisloggingCloudflareVoIProamingeSIMconnectivitycruisingideasinventionEasy RoamKoodoforeign currencyExcelmanualsNotebookLMhackersperformanceweb hostingbaggageTileTrackiPebblebeeair conditioningelectricityOpenHABfreezertemperatureticksLymelimescalewaterfall faucetcustomer supportsurveillanceprivacyFacebookmouseLogitechspamunsolicited emailWIREDpythonphysicsfeed readersAtomRSSFeedlyreal-timebatchsystemscredit cardmodule optimiserinverterSolarEdgemicroFITsolarcostsWAFDNSEQ ConnectionEQ Bankelectronic bankingproduct life extensionseat coverIkeaNFCGoogle PayGoogle Wallet
RSS feed
Powered by Drupal