Large Language Models (LLMs) are increasingly moving from experimental tools to important sources of information, creating a new dynamic that challenges the traditional influence of media in shaping brand perception. With users growingly turning to generative AI for quick, synthesised answers, LLMs are gaining a significant role in defining an organisation's reputation.
Yet, so much of how LLMs actually work remains a "black box." So, how can communicators measure, interpret, or simply just understand and influence their brand's footprint in these generative environments?
At Medianet, an Australian-owned media intelligence company, we decided to find out.
We conducted a market pilot on Generative Citation Analysis, choosing to focus our analysis on some of Australia’s leading brands. We applied the rigour of our award-winning media measurement and research methodology to LLM outputs, establishing a foundational understanding of how AI tools and large language models influence brand visibility and sentiment in Australia.
We're excited to share the key methodology, challenges, and some crucial takeaways from our research - insights that every communications professional should consider for LLM measurement and evaluation.
Our goal was simple in theory, complex in execution - to unpack how organisations can understand, use, or influence LLM outputs. The biggest obstacle was the “black box” itself. Each model draws on different sources, and results can shift by time of day, platform, or even device. Even a single factor like whether the query came from mobile or desktop could alter the outcome, showing how many invisible variables shape LLM behaviour.
Finding a starting point was hard. Our analysts are accustomed to structured datasets with clear parameters. This pilot, however, ventured into uncharted territory. Even defining the scope was challenging: Australian print and online media outlets produce around 150,000 items per week, yet potential LLM outputs across just two major tools could multiply that number many times over.
We treated the pilot like a science experiment. To avoid analysis paralysis, we started small and strategically. We focused on two key sectors, financial services and the automotive insurance/services sector, where we had strong subject matter expertise and existing traditional media insights. This provided a comparative dataset against which we could measure LLM behaviour in Australian contexts.
The scope was narrowed to a 2-week period and 400 strategically selected responses. We leveraged a third-party AI Visibility Tracker, Peec, to automate daily prompt entries across ChatGPT, Perplexity, and AI Google summaries. Our analysts then manually applied Medianet’s established sentiment-analysis framework to every captured response.
This work was guided by two core hypotheses;
Our research strongly proved our sentiment hypothesis: LLM responses tended to be more optimistic than traditional media. For context, Australia’s Big 4 banks typically receive between 30–60% favourable coverage in traditional media. In contrast, within the analysed LLM responses, Westpac and Commbank received 97% and 100% favourable mentions, respectively.
Owned content played a major role. Commbank’s own website appeared as a top citation and was referenced on average 1.5 times per response. Positive messaging from corporate sites was often lifted verbatim into LLM answers, producing a polished, corporate-approved version of brand reputation in generative AI results.
Historic issues persist in LLM responses:
This "long memory" means historic regulatory, legal, or ethical issues can remain a foundational part of a brand's LLM reputation. For communicators, it requires a long-term, multi-layered issues management approach; one that goes beyond the standard 48-hour crisis cycle.
While more research needs to be done in this area, our research showed that only 22% of financial-sector LLM responses cited traditional media. Most references came from corporate websites, financial blogs, product review sites like choice.com.au, or overseas outlets. Only three Australian media brands - 7News, news.com.au, and The New Daily - appeared at all.
The nature of the prompts could influence the sources used by LLMs; however, the absence of major outlets such as The Australian or The Sydney Morning Herald was notable, particularly given their prominence in recent banking coverage. This reflects ongoing content-licensing gaps and text/data-mining restrictions still in place across some Australian publishers.
This reliance on owned content reinforces the long memory insight. LLMs consistently draw on corporate or industry sources, and topics can resurface months after initial publication, impacting brand sentiment and Share of Voice.
In the automotive insurance sector, we compared Share of Voice (SoV) between LLMs and traditional media. The divergence was striking. To name a few;
Interestingly, Youi’s frequency didn’t guarantee positive sentiment. RACQ’s recent ASIC ruling for misleading customers drove heavy unfavourable coverage in traditional media, but did not appear in LLM responses during the same period.
This divergence in SoV highlights that LLM reputation operates by different rules than traditional media.
This divergence also reflects a systemic shift, where LLMs, by leaning on corporate sources and slower-moving narrative cycles, are contributing to changes in how influence, visibility and accountability are expressed. These patterns inform how organisations understand reputation, and how media outlets understand their position and the impact on their business models within emerging generative information environments.
Our LLM pilot sentiment analysis highlights that LLMs are an emerging stakeholder in reputation. They recall older narratives, favour corporate sources, and filter information through unseen algorithms. Traditional crisis playbooks and media monitoring tools weren’t designed for that.
Communications leaders will likely need to combine issues management, SEO, and owned content governance to manage how LLMs perceive and retain information about a brand. Owned channels, licensing deals, and even phrasing on corporate websites will influence how brands are represented in AI answers.
This pilot was only a first step. Our next phase will explore how different types of prompts influence both sentiment and citation patterns. For example, would consumer-focused prompts lead LLMs to rely more on forums and product review sites? What drives models to cite an organisation’s own website over earned media sources?
We also plan to refine a framework for LLM measurement and evaluation, aligning with AMEC Barcelona principles to establish for the Australian market meaningful metrics for visibility, sentiment, and influence within generative environments.
The black box won’t stay closed for long. As this pilot shows, it’s not enough to know what LLMs say about your brand. The real opportunity lies in understanding why and how AI models shape your brand reputation.