Inside the Black Box: What our LLM pilot revealed about brand reputation in the age of AI

Written by Amrita Sidhu and Jacquie Hanna | Nov 27, 2025 12:58:49 AM

Large Language Models (LLMs) are increasingly moving from experimental tools to important sources of information, creating a new dynamic that challenges the traditional influence of media in shaping brand perception. With users growingly turning to generative AI for quick, synthesised answers, LLMs are gaining a significant role in defining an organisation's reputation.

Yet, so much of how LLMs actually work remains a "black box." So, how can communicators measure, interpret, or simply just understand and influence their brand's footprint in these generative environments?

At Medianet, an Australian-owned media intelligence company, we decided to find out.

We conducted a market pilot on Generative Citation Analysis, choosing to focus our analysis on some of Australia’s leading brands. We applied the rigour of our award-winning media measurement and research methodology to LLM outputs, establishing a foundational understanding of how AI tools and large language models influence brand visibility and sentiment in Australia.

We're excited to share the key methodology, challenges, and some crucial takeaways from our research - insights that every communications professional should consider for LLM measurement and evaluation.

Download the full report here

The challenge: Measuring brand sentiment behind AI citations

Our goal was simple in theory, complex in execution - to unpack how organisations can understand, use, or influence LLM outputs. The biggest obstacle was the “black box” itself. Each model draws on different sources, and results can shift by time of day, platform, or even device. Even a single factor like whether the query came from mobile or desktop could alter the outcome, showing how many invisible variables shape LLM behaviour.

Finding a starting point was hard. Our analysts are accustomed to structured datasets with clear parameters. This pilot, however, ventured into uncharted territory. Even defining the scope was challenging: Australian print and online media outlets produce around 150,000 items per week, yet potential LLM outputs across just two major tools could multiply that number many times over.

We treated the pilot like a science experiment. To avoid analysis paralysis, we started small and strategically. We focused on two key sectors, financial services and the automotive insurance/services sector, where we had strong subject matter expertise and existing traditional media insights. This provided a comparative dataset against which we could measure LLM behaviour in Australian contexts.

The scope was narrowed to a 2-week period and 400 strategically selected responses. We leveraged a third-party AI Visibility Tracker, Peec, to automate daily prompt entries across ChatGPT, Perplexity, and AI Google summaries. Our analysts then manually applied Medianet’s established sentiment-analysis framework to every captured response.

This work was guided by two core hypotheses;

Citation sources: Citation sources would differ by industry, but largely favour mainstream editorial media.
Sentiment: LLMs would produce more positive sentiment compared with traditional media.

What are our key findings?

1. LLM responses are more positive than traditional media coverage

Our research strongly proved our sentiment hypothesis: LLM responses tended to be more optimistic than traditional media. For context, Australia’s Big 4 banks typically receive between 30–60% favourable coverage in traditional media. In contrast, within the analysed LLM responses, Westpac and Commbank received 97% and 100% favourable mentions, respectively.

Owned content played a major role. Commbank’s own website appeared as a top citation and was referenced on average 1.5 times per response. Positive messaging from corporate sites was often lifted verbatim into LLM answers, producing a polished, corporate-approved version of brand reputation in generative AI results.

2. Topics in LLMs have a longer lifespan than traditional media

Historic issues persist in LLM responses:

In traditional media: Negative news (e.g., job cuts, regional branch closures) are transient, fading from the news cycle in 2–4 weeks.
In LLMs: We saw citations of historic issues, such as the 2018/2019 Royal Commission into Banking, the HSBC 2024 fine for failing on fraud prevention, and a December 2024 ASIC ruling. Recent regulatory penalties against major banks were notably absent.

This "long memory" means historic regulatory, legal, or ethical issues can remain a foundational part of a brand's LLM reputation. For communicators, it requires a long-term, multi-layered issues management approach; one that goes beyond the standard 48-hour crisis cycle.

3. LLMs take citations straight from the source

While more research needs to be done in this area, our research showed that only 22% of financial-sector LLM responses cited traditional media. Most references came from corporate websites, financial blogs, product review sites like choice.com.au, or overseas outlets. Only three Australian media brands - 7News, news.com.au, and The New Daily - appeared at all.

The nature of the prompts could influence the sources used by LLMs; however, the absence of major outlets such as The Australian or The Sydney Morning Herald was notable, particularly given their prominence in recent banking coverage. This reflects ongoing content-licensing gaps and text/data-mining restrictions still in place across some Australian publishers.

This reliance on owned content reinforces the long memory insight. LLMs consistently draw on corporate or industry sources, and topics can resurface months after initial publication, impacting brand sentiment and Share of Voice.

4. Share of Voice looks very different in LLMs

In the automotive insurance sector, we compared Share of Voice (SoV) between LLMs and traditional media. The divergence was striking. To name a few;

Youi dominated LLM responses with 26% SoV, yet held only 7% in traditional media.
NRMA maintained a consistent share across both (20% and 31% resp.).
RACQ ranked high in traditional media (28%) but barely appeared in LLM outputs (15%).

Interestingly, Youi’s frequency didn’t guarantee positive sentiment. RACQ’s recent ASIC ruling for misleading customers drove heavy unfavourable coverage in traditional media, but did not appear in LLM responses during the same period.

This divergence in SoV highlights that LLM reputation operates by different rules than traditional media.

This divergence also reflects a systemic shift, where LLMs, by leaning on corporate sources and slower-moving narrative cycles, are contributing to changes in how influence, visibility and accountability are expressed. These patterns inform how organisations understand reputation, and how media outlets understand their position and the impact on their business models within emerging generative information environments.

Why does the LLM pilot matter for communications professionals?

Our LLM pilot sentiment analysis highlights that LLMs are an emerging stakeholder in reputation. They recall older narratives, favour corporate sources, and filter information through unseen algorithms. Traditional crisis playbooks and media monitoring tools weren’t designed for that.

Communications leaders will likely need to combine issues management, SEO, and owned content governance to manage how LLMs perceive and retain information about a brand. Owned channels, licensing deals, and even phrasing on corporate websites will influence how brands are represented in AI answers.

This pilot was only a first step. Our next phase will explore how different types of prompts influence both sentiment and citation patterns. For example, would consumer-focused prompts lead LLMs to rely more on forums and product review sites? What drives models to cite an organisation’s own website over earned media sources?

We also plan to refine a framework for LLM measurement and evaluation, aligning with AMEC Barcelona principles to establish for the Australian market meaningful metrics for visibility, sentiment, and influence within generative environments.

The black box won’t stay closed for long. As this pilot shows, it’s not enough to know what LLMs say about your brand. The real opportunity lies in understanding why and how AI models shape your brand reputation.

View full post