AI Citation Ranking Factors with Cyrus Shepard




  • URL accessibility is the top citation factor
  • Ranking for query “fan-outs” (the secondary searches AI runs alongside the main query) matters more than any single keyword ranking
  • Matching your content’s structure to the answer format correlates strongly with getting cited.
  • llms.txt has zero measurable correlation with AI citations
  • Domain authority is only weakly correlated with AI citations
  • Unlinked brand mentions and consensus across sites can matter more than backlinks alone
  • Shorter, tightly focused content is outperforming old-school “skyscraper” long-form content
  • The fundamentals still apply: solid SEO, ranking well, and real-world brand visibility

Understanding how to get your brand cited by AI is one of the most confusing threads in SEO and digital PR right now. The research is scattered, inconsistent, or based on a single study rather than the full picture. I’ve done my part to compile ours into an AI statistics post.

But Cyrus Shepard, SEO consultant, researcher, and founder at Zyppy (and the author of the Zyppy Signal Substack), set out to do more.

For his latest study, Cyrus combed through roughly 75 published papers and studies on AI citations, narrowed them down to the ones that actually held up, and built out a full set of ranking factors for what drives AI citations today.

Cyrus joined the BuzzStream Podcast for his second appearance (check out his first) to walk through the study, and we get into what’s overhyped, what’s underrated, and where digital PR and link building still fit into an AI-driven search landscape.

YouTube player

Below is a slightly edited transcript:

What is more important, a citation or mention?

Cyrus: That’s a million-dollar question, so I’ll give a non-intuitive answer.

For the purposes of this conversation, I’m going to assume Google, Google AI Mode, or Google AI Overviews — although we’ve had some good news on the ChatGPT front lately, where traffic through citations and mentions seems to be increasing.

There are different positions you can appear in within a Google answer.

You can appear as the recommendation within the AI answer, you can appear as a citation — usually over to the right, at least for the time being — or you can appear beneath that in the traditional rankings.

We’ve seen click-through rates decrease dramatically for those ranking positions. Still, the top of the organic rankings is going to drive the most traffic and visibility, weirdly, even with AI answers. But as AI answers become more prominent — and who knows where Google is going with all this — generally brands want to be in the answer, in the mention, whatever term you use for it. Citations are still great; we have data showing they drive some traffic and visibility.

And it gets a little messy because these AI engines are constantly changing their interface — sometimes there’s a lot of overlap between “citation” and “recommended source,” and they match each other. But I think currently the best place to be is within the mention or recommendation portion of the AI answer.

Vince: Yeah, and I think one of the things that’s been confusing a lot of people is the through-line between their SEO/GEO output and the outcomes.

With SEO, it was easier to say, “we got X amount of links, therefore we’re ranking better” — a pretty clean one-to-one relationship.

Whereas with your study especially, that through-line gets messier. It’s not just “go build a bunch of citations” or “build a bunch of links and you’ll show up in citations,” at least based on some of your findings. Without putting the cart before the horse — how did you come up with this list?

Give people some background on how you did this study, and then let’s get into the findings.

How did you come up with your list of AI Citation Ranking Factors?

Cyrus: This list came about from me just being an interested SEO. I noticed a lot of the people I follow — Dan Petrovic, Kevin Indig, a bunch of academic researchers — were putting out citation studies, citation visibility studies, and finding some really interesting things that sometimes overlapped and sometimes didn’t.

I thought someone needed to go through every single study and put all the evidence together. All the credit goes to the people doing the original research — I did no original research for this.

I just piggybacked.

I went through and combed dozens and dozens of studies. The discipline is so new that I literally found almost every single study and paper published — something like 75. I whittled it down to close to 50 that actually held up, that were really talking about citations and not mentions.

Then I hand-classified everything — “they found this, they found this, they found this” — into this huge spreadsheet that’s basically a heat map, and used that to classify and name the different factors that seem to contribute to AI citations.

That’s how it was born.

It probably doesn’t capture everything, but it captures the current state of research.

Vince: I love the methodology behind something like this, because there are a lot of nuances across these studies, and I think you did a good job pulling them together.

Curious if there’s one for mentions coming out in the pipeline. But let’s get into some findings. The top one you had was URL accessibility.

What is URL accessibility (as it relates to AI visibility)?

Cyrus: Yeah, to SEOs it seems obvious — you want your URL to be accessible, it’s the number one thing in every crawl report, like “we can’t access it.” I felt embarrassed even mentioning it, but it’s so important with AI, because so many people are blocking AI crawlers without even knowing it.

Their Cloudflare configuration, someone presses a button, and boom, you’ve blocked half the AI crawlers on the internet.

People are putting robots.txt protocols in, IT departments are blocking AI crawlers because their servers are getting overwhelmed, and they don’t understand what that’s doing to their AI visibility.

So it’s not a huge factor in the end, but it’s pretty fundamental to what we’re doing.

Vince: Yeah, this is a big conversation in the publisher space.

I’ve done some research on our side looking at publishers blocking bots via robots.txt and it not really having much impact — people realizing you have to block at the server level, using something like Cloudflare.

That’s sparked its own conversation.

But this idea of visibility — you have to show up to be part of the game — makes a lot of sense. I’m going to list off the next few: search rank, fan-out rank, preview control, query answer match, intent format match. Excluding preview control, those all seem very SEO-focused to me. Would you agree?

Are the next top five ranking factors SEO-focused?

Cyrus: Yeah, absolutely. It’s worth spending a second on search rank and fan-out rank, because there’s some surprising data here.

The number one correlation with appearing in AI answers is how well your URL ranks for the main query and all the fan-out queries — the secondary searches the AI engines run.

But here’s what was interesting: you might only be cited once as a source within an AI answer, but the number of times you rank across the fan-out queries can determine that one position.

If your page ranks for several fan-out queries, it’s more likely you’ll be cited at least once — it’s not a one-to-one relationship. The whole is greater than the sum of the parts. So optimizing your page, or pages, to rank for as many fan-out queries as possible increases your likelihood of citation. That was kind of surprising to me — they seem to add up how many times you appear.

Vince: That is really interesting.

I think more people are finally coming around to the idea that yes, it correlates to search — ranking is still important — but also to understanding the query fan-outs, that one query breaks into maybe twenty different queries now, and ranking for as many of those as you can is what really helps increase citations.

The other two I thought were interesting are query answer match and intent format match. Those seem like basic SEO too, but there’s an extra layer of complexity when you combine them with the fan-out piece. What are your thoughts on those?

Query answer match and intent format seem to have some complexity to them

Cyrus: Query answer match was a huge one across a lot of the studies I looked at. I think it has to do with how AI answers create citations — on one hand they’re scouring the web for information, on the other hand they’re just looking for something to validate the answer they already have.

What the data shows is that when your content closely matches the answer, it’s more likely to get cited. The optimization technique is using your H2 headings to closely match the fan-out query or main query, with your answer right below it closely matching what the AI is outputting. We’ve kind of known this for a while — Google has a patent on how this works, validating the answer — but structuring content that way seems really important.

Intent format match is a bit less important, but it’s about matching the format to the question.

If someone asks “what are the best things to do in New York City,” the format is probably going to be a list, a listicle — we talk about listicles a lot with AI. A different type of question might require a different format, like “compare Google and Apple,” which is going to want a table.

Using the right format for the question seems to be an important consideration.

Vince: Which, again, in my mind is just an SEO best practice — if people are still writing for people rather than writing for an algorithm, they’re probably going to do a good job anyway. I wanted to jump to a few things at the bottom of your list that surprised me.

Let’s briefly touch on llms.txt, since it’s in the news — Google recently put out documentation saying it’s not necessary, and then Lily Ray and John Mueller went back and forth because some other documentation suggested maybe yes, in some specific scenario. What’s your take on llms.txt?

What’s your take on llms.txt?

Cyrus: The headline is: no study we looked at in our entire database found any relationship between llms.txt and AI citation. There’s just zero evidence. Now, I don’t mean to poo-poo llms.txt — a lot of smart people and AI companies are promoting it as a web standard, and there are a lot of new web standards opening up, especially in e-commerce and various server protocols.

Adoption is new, and it’s the wild west. So I don’t think we should discount llms.txt at this moment, because it’s a new frontier, and if someone says “why not,” I think we should take that seriously, because there are some arguments for doing it. I just wouldn’t expect to see an increase in AI citations because you used llms.txt.

That doesn’t mean it’s a technology we should dismiss outright, and I hate it when people on social media say “this person is recommending llms.txt, they’re a snake oil salesman.”

No — it’s just a new technology with no evidence yet. It could develop into something, or it might not.

We don’t know.

I have to remind everybody — in the early days of SEO there was a meta keywords tag that worked, and then it didn’t work, and then it became “snake oil” after the fact.

But for a while it was a legitimate technique. Maybe llms.txt becomes the robots.txt of this era. I don’t know. Right now there’s just not a lot of evidence for it.

Vince: That brings up a good question about your study in general — should readers be looking at this as “here’s what we found in these studies” versus a strict ranking of importance?

Technically speaking, this is the stuff you have research to back up, based on those 50-ish research papers you dug into?

How strict is a ranking factors study like this?

Cyrus: Absolutely, and that’s a limitation of doing studies like this. I’ve done so many ranking factor studies over the years, at Moz and on my own, and we’re always limited to what we can measure — these studies are always going to be incomplete. The important thing for people to remember is that at the end of the day, Google, traditional SEO, Google AI, ChatGPT, Claude, everything else — they’re meant to be a reflection of the real world.

So the more real-world things you can do — getting out there, being cited by journalists, increasing your brand visibility — those are probably way more important than focusing on individual technical details.

Do real stuff.

We’re SEOs, we’re nerds, we try to capture these things in studies like this, but it’s just a reflection of the real world at the end of the day. The AI systems are imperfect, SEO is imperfect, Google is imperfect. This is just our attempt to move us closer to the goalpost.

Vince: That makes total sense. I wanted to bring up something I’d jotted down beforehand — I have a fond memory of you speaking with Ross Hudgens at a Siege Media event, a webinar or meetup or podcast, where I asked you a question about topic content depth and breadth, that kind of thing, length.

The question I asked was about writing for an algorithm versus writing for a person. Let me tie it to your study: one of your findings was “AI-ready structure,” which landed toward the higher end, not the top five, but pretty high.

Reading into this a bit — the question I asked you back then was whether it makes more sense for, say, a coffee brand to write the ultimate guide on brewing coffee, or to write individual blog posts on different methods.

I think you said it probably makes more sense for the user to go in-depth on individual types — if someone wants to know how to do a French press, they don’t want to read a ten-thousand-word document about every brewing method and hunt for the French press section; they want all the nuance about French presses specifically.

This ties into the AI-ready idea — chunking your content, putting the important stuff at the top, which has always been in the SEO air, like the reasonable surfer model with link placement. So my question is: how do you feel about content length in the AI context, and are there mechanisms — we’ve heard about crawl budgets, only a certain amount of words AI pulls out of a page — can you expand on that and how it relates to the AI-ready piece?

Does content length influence AI’s crawl budget?

Cyrus: Absolutely, this is such a great question. In the old days, SEO went through the era of the “skyscraper technique” — Brian Dean coined a version of it — where we’d create these giant articles that ranked for everything. Weirdly, I think around 2023 when Google’s Helpful Content Update hit, we started to see a reversal of that over the last two or three years.

It’s almost an anti-SEO bias in Google, where they stopped rewarding posts that ranked for every keyword possible — almost a negative effect for that approach.

But then we have AI answers, and Dan Petrovic — one of my favorite AI researchers, if you don’t follow him you should — did some great research into Gemini and how it retrieves content for citations.

What he found is that for every page in a list of candidate pages, there’s a budget that Gemini will extract from that page, and the budget depends on how well you rank for that query — the top-ranking page gets the largest budget. But even then, it’s just a budget of how many words they’ll extract, pulled from different sections of the page.

So if you’re ranked further down the list and want your answer to get in, you want to put it in a prominent place near the top, because you only have so many words they’ll extract. There’s also the idea of chunking — making it clear what the AI engine should extract, using an H2 heading with the answer text right underneath it.

If you put the question at the top and the answer way at the bottom, that’s confusing for the AI engine to extract — it has to do extra work to draw that information together. They can do it, but if you make it harder, it’s harder.

So swinging it all back — we’ve gone the other way from the days of five-thousand, ten-thousand-word skyscraper articles. Shorter, more tightly focused content seems to perform better, both in AI answers and in Google’s traditional ranking algorithms. That seems to be the weather report these days: focused, tightly written articles.

Vince: Yeah, I’d agree. Looking at how AI seems to model as closely as possible how humans actually want to read and search for information — I think as SEOs we sometimes overcomplicate things. Another reason to think about it: if you told somebody “hey, if you pose a question in your post, answer it right away,” they’d say, “yeah, that makes sense, obviously.”

But if you look at a lot of content out there, that’s not always the case.

Cyrus: Yeah, I’m not a fan of the long preamble anymore — recipe sites, “my grandmother gave me this recipe, she used to live in the Balkans” — can I interject on about pages for a second? The about page, in the age of AI, has become so important, because if you Google your company, the AI engines are looking for authoritative places of information, and that’s usually going to be your own website.

I see companies killing it, and I recommend companies not just have one about page, but like seven, eight, nine — your about page, your team page, your jobs page, your careers page, your press page, your legal page, your privacy page — all these pages talking about yourself and promoting yourself.

Boost your about pages, because they’re awesome. You should talk about yourself more.

Vince: That fits the idea that this is where a user would actually go on a website to find information, and the models are showing that.

In a study I did, looking at how AI handles news citations, I found a lot of it comes down to press pages specifically — a lot of owned content on your own site that’s genuinely low-hanging fruit, as much as I hate that term. Okay, another thing near the bottom of your list was domain authority.

This was something a lot of people latched onto — I think it was Ahrefs that put out an early post in the AI space about what gets the most mentions in AI Overviews, looking at things like unlinked brand mentions, which opened the door for digital PRs and link builders to say “links don’t matter as much, just get your brand mentioned,” making it easier to sell things like expert commentary and HARO-style pickups.

But then, which you also have on your piece, domain authority wasn’t highly correlated with showing up in AI — which a lot of people felt was a knock against the idea that link building and digital PR help with AI, because it used to be: get a bunch of links, raise your domain authority, that heavily correlates with search rankings.

Without something like domain authority mattering as much, it gets muddier for people trying to connect the dots between digital PR and link building.

Can you explain the domain authority finding, and how you see the through-line?

Why do you think Domain Authority is such a lower ranking factor?

Cyrus: Domain authority, for listeners who aren’t sure what it means, just references the backlink strength of your website — how many backlinks you have and the trust of those backlinks. Traditionally we see domain authority correlating with higher Google rankings.

But across all the papers I looked at, several academic researchers looked at different aspects of domain authority, and there was usually a correlation, but it wasn’t very strong with AI citations.

I don’t necessarily know what this means — I think it means AI citations are looking at a deeper level of fan-out queries for non-competitive queries, so there just isn’t as big a correlation.

An example that’s often used: if you Google “best CRM,” HubSpot doesn’t generally appear on the first page of traditional rankings — it’s like page two, number 14 or something.

But in AI answers, it’s almost always mentioned, always one, two, or three.

Because on the first page, what’s ranking are lists of best CRMs, and HubSpot is on every single one of them, whether linked or unlinked.

So if you ask Google “what’s the best CRM,” the rankings give you lists, but the AI answer, having read all those results, just says: HubSpot’s the one you want.

So in the age of AI, if you’re looking to get the recommendation rather than necessarily the citation, those unlinked mentions that Ahrefs was talking about are hugely important.

And we’re on the verge — there was a big controversy recently where one of the AI companies was automating mention acquisition and scaling it easily, and there was some question of whether Google would respond to inauthentic mention abuse.

It’s a new frontier — a year from now we might look back and go, “yeah, scaled mention abuse, that’s a thing, you don’t want to do that.”

Right now, maybe it works, maybe it doesn’t, I don’t know.

I wouldn’t advise it.

I’d go for authentic mentions, genuine outreach, genuine mentions.

I think Google has a huge challenge trying to combat this, but if mention spam becomes a real problem, how does Google fight it?

Authority.

If mention spam becomes a thing, they’re going to ignore mentions they don’t believe are authentic, and high-authority sites are going to win, lower-authority sites are going to struggle — because brand and authority have always been Google’s levers for weeding out spam.

Vince: That’s really interesting and timely — I was literally just writing a LinkedIn post about this, because whenever Google puts out something like that GEO recommendations piece, and one of the things it mentioned was scaled brand mentions and unlinked mentions as a potential risk, usually when they write something like that, either it’s already happening or they’ve got something in the works to weed it out.

I was trying to think through it as if I were Google — how would I go about this?

Two other things I wanted your thoughts on: this idea of consensus, which you touched on with the listicle example — if HubSpot is mentioned on all these other sites as the best CRM, chances are it’s going to show up as an AI answer.

Do you think Google is going to start coming after mentions that happen naturally?

I could see some people getting caught in friendly fire — say you get legitimate press mentions calling your brand the best running shoe, and three of them land on the same day from three different sites — that’s some consensus, but maybe those sites weren’t super high authority.

Do you think Google’s going to start going after that kind of thing? You mentioned brand and authority — can you elaborate?

Will Google try to do after brand mentions?

Cyrus: Yeah, I think we already see evidence of the LLMs, depending on the mode you use — extended thinking, pro, whatever — evaluating sources.

If you spend as much time online using AI as we probably do, you can see the answers essentially saying, “well, this is from a Reddit thread, but I don’t quite trust the source,” versus “this is from an authoritative, trusted source” — and the LLMs will tell you that.

Maybe that’s not so much algorithmic as Google or the AI engines offloading that evaluation to the LLMs.

But I don’t think it’s as gameable as people think it is.

Maybe for a while, I’m not sure.

At the end of the day, you still want authentic mentions and links from the most authoritative, trusted sources you can find.

That doesn’t mean lower-tier mentions aren’t valuable, but you always want to get the most authoritative ones you can find.

That’s just best practice.

Vince: And is that because search still plays a big part of it, and those are great ways to get yourself mentioned in search generally?

How do you get brand mentions?

Cyrus: Yes, but also I think all of these systems inherently are building out trust systems to evaluate sources — a source of truth.

The AI answer just wants a source of truth, and it has to evaluate thousands and thousands of sources.

That’s why your own website is going to be a great source of truth.

I had a question recently: do small businesses even need a website anymore if everything happens through Google My Business or an AI answer?

Yes — you need a website even if it gets no traffic, because that’s your source of truth, and AI is going to cite it above other sources because it’s yours, it’s trusted, it has high authority.

Trust is currency within AI, or Google, or anywhere else, and we always want to build that trust.

Vince: I love that.

And I’m not asking because I disagree — I’m always curious to hear different people’s takes on the value of links and link building today.

We had Metehan Yesilyurt on our podcast talking about Common Crawl, and he mentioned that a lot of these LLMs are built on Common Crawl, which is fundamentally links.

Cyrus: Yeah — by the way, he’s one of the researchers, one of the experimenters, cited frequently in this study. Great mind. Call-out there.

Vince: Yeah, check it out, I’ll link our video with him in the show notes. Well, Cyrus, I want to let you leave people with a final thought — say you get a new client and they ask, “Cyrus, how do I show up in LLMs?” Whether you’re tracking citations, mentions, or both — how do you answer that? Do you have a short elevator pitch on how people can show up?

How do you show up in AI?

Cyrus: There are all these advanced techniques — Michael King, iPullRank, Dan Petrovic again — all these advanced techniques for showing up in AI.

But I think for 80% of sites, the mom-and-pops, the regular websites, the regular marketing departments, it’s just doing regular SEO.

I hope that’s not a cop-out answer, but the correlations between ranking for the main query, fan-out queries, and related results — that basic foundational work — keep doing that, and it’s going to correlate to showing up in AI answers.

For the advanced folks who want to do the extra work, structuring content in a certain way and all that, that’s fine too.

But for most of us, we’re struggling just to rank in the top three anyway, and we should keep doing that.

The game has certainly changed in the last few years with what Google is rewarding, but get the foundations in first, invest in regular SEO, and the benefits are going to overlap with AI citations.

Vince: I love that — thinking about SEO as the core, and then when you get into optimizing specifically for LLMs, because there are so many different LLMs and they work differently, you’re slowly pushing the boundaries, but if you don’t have that core, you’re not going to rank anyway. Well, Cyrus, this has been awesome, thank you so much for your time.

Do you have any other studies coming out you want people to know about?

Cyrus: I’m going back to my SEO roots and doing a traditional ranking factor survey, just to see what’s working in Google right now — kind of like the work I used to do at Moz back in the day. Maybe people aren’t interested in what works for plain SEO anymore, I don’t know.

Vince: Not to put you on the spot, but yes — they should be. I’ve been asking Moz to do that, I’ve been asking Ahrefs, and they’re like, “nobody cares about that stuff anymore.” But I think they do — if the search and fan-out stuff really is the bones of all this, you’re still going to know what to do to show up. So I’m with you, man, excited to see that. Cyrus, where can people connect with you if they have questions about this stuff?

Cyrus: LinkedIn, and I also have a newsletter I write, obviously, where this study was published — Zyppy Signal on Substack. Subscribe, hit that subscribe button.

Vince: Smash it, smash it, love it. I’ll link to all this stuff in the show notes. And a reminder for listeners — if you like what you hear, please subscribe, like, all that good stuff. If you have any questions for Cyrus, leave them in the comments, or direct them to me and I’ll pass them along. Cyrus, thank you so much for your time, this has been awesome. Sorry I’ve been a little tongue-tied and brain-foggy here, but I really appreciate you coming on. I know you’re a busy guy, so thank you.

Cyrus: Thanks, Vince. Thanks, everybody.

Vince: Thanks everyone, and good luck out there.

Vince Nero

Vince Nero

Vince is the Director of Content Marketing at Buzzstream. He thinks content marketers should solve for users, not just Google. He also loves finding creative content online. His previous work includes content marketing agency Siege Media for six years, Homebuyer.com, and The Grit Group. Outside of work, you can catch Vince running, playing with his 2 kids, enjoying some video games, or watching Phillies baseball.
More Posts by
Website: https://www.buzzstream.com
back to top arrow