About the Author

Nivin Lee

Vice President of Strategy

Nivin is the Vice President of Strategy at VOLTAGE, leading new business for the agency and strategic planning for its new clients. As a performance marketer specializing in paid search and marketing analytics, he has spent his marketing career creating extremely data-driven media plans with ROI as the primary KPI. As a Google Ads, Analytics, and Tag Manager power user, he’s up to date on all the new features, regulations, trends, and oddities that come with being a digital marketer in the Google advertising ecosystem.

If you’re on LinkedIn, you’ll know that everyone is talking about LLMs. Some are focused on using them, some are focused on creating them, and most are focused on how this affects their business’ bottom line. Most notably for SEOs, LLMs are starting to “steal” organic search rankings, which has people pretty concerned.

Should you be worried about losing organic traffic to LLMs? Well, we don’t know. But there might be a way to find out—with data instead of guesses, trends, or more realistically, LinkedIn fearmongering.

In today’s article, we’re going to rehash how “dark traffic” has affected brands for years, how LLMs let you track some of their links to your website, and how you can customize Google Analytics to measure how much LLMs affect your marketing. While we’ve seen a few articles out there walking through how to set up custom reports, we haven’t found one that digs as deep into reporting on LLM activity with each of Google Analytics’ customization tools. In this article, we’re walking you through how to create custom reports, custom audiences, and custom channel groups for a well-rounded picture of LLM activity.

What is dark traffic?

If you’ve used Google Analytics in the past, you’re definitely familiar with the infuriating row in your reports that say “direct”, “unassigned”, or “not set”.

Where is that traffic coming from? Well, by definition, we don’t know. Dark traffic is traffic that we may be able to see, but we don’t know where it originated. But, here are some guesses:

  • Traffic that didn’t have any UTM parameters due to human error
  • Traffic that didn’t have any UTM parameters due to privacy technology
  • Traffic that went through a redirect (obscuring the referrer header)
  • Traffic that went through a SSL upgrade (again, obscuring the referrer header)
  • Traffic that came through a manually shared link
  • Traffic that didn’t have any UTM parameters due to, well, the norm (think links in documents, internal software, client portals, etc.)

That’s… not very helpful. Dark traffic has been the bane of digital marketers’ existence for years, making it difficult to get a complete picture of where our traffic comes from and how we can optimize it. While the industry understands that this trend will only continue with platforms and browsers obfuscating attribution, and many are beginning to invest in statistical trends over exact reports, until this becomes more mainstream and accessible, dark traffic is still a thorn in our side.

It’s unfortunate that we can now add one more guess to the list:

  • Traffic that came from LLMs (sometimes)

That “sometimes” is what we’re digging into today. By the end of this article, we won’t solve dark traffic, but we’ll be able to see something a bit more helpful when it comes to LLM-originated traffic.

How do LLM links work?

Before digging into LLM links, a quick reminder on how we usually tell where traffic comes from. There are two main ways: UTM parameters and referrer headers.

First, UTM parameters are URL query parameters that follow a commonly agreed specification to tell browsers and tracking scripts where users came from with a high level of specificity. These are the key-value pairs that turn links like

https://voltage.digital”, into 

https://voltage.digital?utm_source=linkedin&utm_medium=social&utm_campaign=newsletter
&utm_content=How+To+Track+LLM+Traffic+in+Google+Analytics+%28as+a+Custom+Report
%2C+Audience%2C+and+Channel+Group%29”.

While LLMs may never give us the same level of detail in UTM parameters like the above (and like what you should be using when possible), it looks like they sometimes use the UTM source parameter to convey where their traffic comes from (e.g. “?utm_source=chatgpt.com”).

Second, in the absence of UTM parameters, browsers and tracking scripts look for a “Referrer” header in the request a browser makes to a web server for a webpage. While that header is one that the average user may never see, it usually indicates the last URL a user was on before getting to the current one.

There are a few ways LLMs link to websites as supporting context, credit, or credibility for its responses to a user’s prompt. Let’s take a look at the most popular LLM, ChatGPT, and dig into those different methods along with what we know about how they help or hurt attribution:

1. Citations

These are when ChatGPT cites a website directly for information, data, or ideas that it scraped from that website and used in a response. These are listed under the “Citations” header in the “Sources” sidebar when you use the “Search the web” tool in a conversation.

Citations include both a referrer header and a UTM source parameter.

2. Search Results

These are when ChatGPT may not cite a website directly for something it included in a response, but shares with the user for additional context or reading. Previously, these were listed under the “Search Results” header in the “Sources” sidebar when you use the “Search the web” feature, but are more recently listed under “More” in the same place.

Search Results may include both a referrer header and a UTM source parameter. Previously, they didn’t when the “Search the web” feature was first rolled out, but after more testing, it looks like these links include more attribution data like citations, which is a win for marketers.

3. Inline Links

These are when ChatGPT writes a link directly into the text of a response.

Inline Links don’t include referrer headers or UTM source parameters. They’ll register as “direct” traffic in Google Analytics no matter what.

However, at the time of writing this, the rules around when LLMs use referrers and UTMs are shaky. It’s certainly getting better, but there are edge cases where that data is excluded (such as when a conversation is about sensitive or confidential topics like healthcare or finances). Moreover, as LLMs become more popular and understood, government regulation and social pressure may change how privacy controls work within LLM platforms. Lastly, it’s almost inevitable that LLMs—another way to deliver consumers to advertisers—will influence results for money. This may (or may not) improve attribution, but it will certainly make it more nuanced. 

How do you track LLMs on your website?

Now for the good part: how to track LLM traffic in Google Analytics. Based on the above, here’s the kind of data we can expect most ChatGPT citations to give Google Analytics:

  • Session source: “chatgpt.com”
  • Session medium: “referral”
  • Page referrer: “https://chatgpt.com”

While source and referrer are explicitly set, medium is implicitly inferred. There are a few things we can do with these data points, including (1) making a custom report to see LLM traffic, (2) making a custom audience to see LLM users, and (3) making a custom channel group to see LLM user activity compared to other channels, and where LLMs fit into our attribution paths.

How to make a custom report

A lot of digital marketers default to using the “Exploration” tab to make custom reports, and that’s completely fine for quick investigations and constantly changing configurations, but we’re interested in a more long-term report that’s easier to access for all team members, so we’re going to make a “Library” report instead.

Start by navigating to your reports library:

Then, create a new report from scratch:

You can choose what dimensions you want to include, but we recommend the following:

The same applies for metrics, which will heavily depend on what kind of website you have (such as one focused on lead generation or ecommerce), but we recommend the following:

Here’s the important part. Create a filter that only includes traffic that comes from LLMs based on a session’s source with a partial regex match:

Here’s that filter for you to copy and paste into Google Analytics:

“bard|chatgpt|claude|copilot|gemini|perplexity”

This should include traffic from popular LLMs like:

  • ChatGPT
  • Gemini
  • Claude
  • Perplexity
  • …and more

But if you notice other LLMs sending traffic to your website that are more industry specific, add them to the filter by inserting a pipe (“|”) and then the URL of the LLM.

Then, you can save and name this report:

And add it to a collection in your library:

You can place this report anywhere in your side navigation:

This lets your entire team see the report without having to go to the “Exploration” tab, make sure they have access to it, etc.

How to make a custom audience

Most marketers will stop at making a custom report, but Google Analytics power users know there’s more work to be done. The newest version of Google Analytics 4 improved the interface for creating custom audiences, which are groups of users that we can scope our reporting to at almost any time. The ones that are built into the platform by default include “Mobile” users, “US” users, and more, but we’re going to make one for “LLM” users to segment all of our reports.

Start by navigating to the audience settings:

Create a new audience:

Choose to create a custom audience:

Name the report and configure the same filter we made for the custom report:

Except this time, we can’t use a partial regex match, only a full regex match, so our filter value needs to be a little different:

“.*bard.*|.*chatgpt.*|.*claude.*|.*copilot.*|.*gemini.*|.*perplexity.*”

Then we can save the report, and have that audience stored in our Google Analytics property for use in comparisons later on. Note that it’ll take a day for users to be added to this audience:

How to make a custom channel group

If you’re serious about wanting to see how LLMs stack up against other channels, whether as a way to see how much it’s cutting into your organic search efforts or to see how your LLM optimization efforts are going, we advise creating a separate channel for LLMs. In Google Analytics, there’s a default channel group that most marketers use without knowing, but few marketers know that you can customize that grouping to suit your needs.

Until LLMs are added to the default, let’s create a custom channel group.

Start by navigating to your channel groups settings:

Create a copy of the default channel group to edit:

Name the new channel group and create a new channel:

Add the same filter as we added for the custom report:

Note that this time, we can use a partial regex match again, so the filter value can be something like this:

“bard|claude|chatgpt|copilot|perplexity”

And now we have a new channel group. But to actually use it, we need to change which channel group Google Analytics will show in reporting by editing the primary channel group:

And selecting our new channel group:

After a day of collecting data, we should now see session-related traffic numbers in an “LLM” row under channel dimensions.

Wrapping Up

While tracking some of the murkier channels out there will never be perfect, digital marketers are incentivized to track what they can to understand how much new technology and trends affect their efforts (especially in SEO with the rise of LLMs). These reporting tactics are just a start, and are bound to get an update as LLMs change over time.

For now, use these reporting tools to answer the questions (and address the fears) of how LLMs are affecting your marketing efforts with real data.