Endpoints – Challenges from the perspective of the vfa

"Which outcomes are relevant for patients?" is the guiding question of the 2024 Fall Conference of the Benefit Assessment Platform. The question of endpoints is one of the central questions in the AMNOG procedure. But what significance do endpoints have for determining an added benefit in the AMNOG procedure? Hierarchically, it must be soberly stated that the most important factors are a study design that is compatible with the AMNOG procedure (i.e., usually randomized controlled), of sufficient duration, and the correct implementation of the appropriate comparator therapy specification. However, once these central conditions are met, what really matters are the endpoints and the study effects demonstrated for them.
From a legal perspective, Section 2, Paragraph 3 of the AM-NutzenV stipulates that the "benefit of a medicinal product [...] is the patient-relevant therapeutic effect, particularly with regard to improving the state of health, shortening the duration of illness, prolonging survival, reducing side effects, or improving quality of life." The additional benefit is therefore determined by influencing "patient-relevant endpoints." The status of an endpoint in the approval-relevant studies (primary or secondary) has no consequences for its relevance in the benefit assessment.
After approximately 13 years of evaluation, approximately 57 percent of new drugs have been able to demonstrate their added benefit. However, this proportion varies considerably depending on the therapeutic area. While an added benefit was demonstrated for oncology drugs in approximately 73 percent of cases, the proportion of proven added benefit for diseases of the nervous system and psychiatric illnesses was 46 percent and only 29 percent, respectively. Clearly, the combination of a compatible study design and endpoints for oncology drugs in the AMNOG achieved better results than for other indications.
Overall survivalIt should be noted that the endpoint "overall survival" is of paramount importance, especially in oncology. It is undoubtedly patient-relevant and thus also crucial for demonstrating and classifying the additional benefit. Empirical evidence for oncological diseases clearly shows a correlation between the magnitude of the benefit in overall survival (in combination with benefits in other endpoint categories) and the extent of the additional benefit in the decisions of the Federal Joint Committee (G-BA).
However, the endpoint "overall survival" also presents a number of practical and methodological challenges. For example, adequate study power cannot be achieved within a reasonable timeframe in all treatment situations. This is often further exacerbated in the context of benefit assessments by the Federal Joint Committee (G-BA) dividing a study population into sub-questions or considering only a portion of the pivotal study population as usable. This occurs without any discernible consideration of the power issue.
In some situations, a permitted change of treatment is ethically unavoidable, which makes the interpretation of survival data more difficult. While methodological solutions exist for dealing with permitted changes of treatment, none of these approaches has yet been accepted in the AMNOG assessment. The requirements for recording subsequent treatments are equally challenging with regard to the interpretation of overall survival results.
If a specific treatment situation does not (yet) allow for conclusions about overall survival within a reasonable timeframe, the question also arises as to how patient-relevant treatment success can be measured. In the decision-making practice of the Federal Joint Committee (G-BA), endpoints such as EFS (event-free survival), DFS (disease-free survival), or RFS (relapse-free survival) are now generally accepted, but only to reflect the failure of a curative treatment approach.
However, the PFS (progression-free survival) endpoint remains irrelevant for evaluation, although there have been differing opinions within the G-BA regarding its patient relevance from the outset. Endpoints such as TTST (time to subsequent therapy), CR (complete response, except for the old special case of the assessment of basal cell carcinoma), or MRD (minimal residual disease) are also not considered. These endpoints, which are important or even primary for approval decisions, are consistently classified as non-patient-relevant and therefore not relevant for evaluation in the AMNOG procedure, regardless of the specifics of individual treatment situations.
Surrogate endpointsIn addition to the question of direct patient relevance, some endpoints raise the question of whether they can be used as surrogates for other endpoints in certain treatment situations. For AMNOG benefit assessments, the IQWiG guidelines apply to surrogate validation, which were published in Rapid Report A10-05 "Significance of Surrogate Endpoints in Oncology" in 2011 (IQWiG 2011).
Surrogate validation ideally requires a meta-analysis of several RCTs with high certainty of results and a high degree of correlation at the study and individual level, or, failing that, the application of the concept of a surrogate threshold effect (STE) with specific thresholds. Although the G-BA's requirements do not specify explicit thresholds, they refer to the same methodology as proposed by the IQWiG. That's the claim. The reality is sobering, as these requirements have not been met for 13 years now. Whether this is due to the conservative requirements themselves or to a (possibly resulting) limited number of validation attempts cannot be conclusively determined.
Nevertheless, it should be noted that surrogate endpoints are indispensable in some cases, as otherwise key questions simply cannot be investigated to enable access to innovative treatments. Therefore, in the past, in absolute exceptional cases, some surrogate endpoints have been accepted in the AMNOG benefit assessment without formal validation according to the above-mentioned methods and used to derive the additional benefit.
For example, the endpoint "sustained virologic response" in chronic hepatitis C, "virological response" in HIV infection, and "HbA 1c" in type 1 diabetes mellitus were classified as sufficiently valid surrogate endpoints by both IQWiG and the G-BA. While the consideration of these surrogate endpoints was always comprehensible, the respective decisions remained based on a less than transparent individual assessment.
Patient-reported endpointsPatient-reported endpoints (PROs) are becoming increasingly important overall. For many therapeutic areas, assessing morbidity and health-related quality of life using such endpoints is now standard practice in clinical trials. An analysis of EU approval decisions for oncology drugs between 2017 and 2020 shows that PROs were included in pivotal studies in approximately 78 percent of cases (Teixeira et al. 2022).
The increasing importance of PROs is also evident in the AMNOG benefit assessment. For example, in the case of non-small cell lung cancer, 95 percent of the studies considered by the G-BA provided usable data on at least one PRO instrument (Brand et al. 2022). The picture is also encouraging for the subgroup of patient-reported endpoints, health-related quality of life. The proportion of procedures with quality of life data has increased in recent years and has exceeded 70 percent since 2014. This proportion was particularly high in benefit assessments of oncology drugs (Kramer et al. 2024).
However, the question of fundamental relevance does not stop with the nature of an endpoint, but can also extend to its operationalization. For example, a PRO outcome that appears clearly relevant may still be disregarded in an assessment. This can be illustrated by the example of benefit assessments in the therapeutic area of moderate to severe plaque psoriasis. Here, the endpoint PASI 90, which represents a 90 percent improvement in disease symptoms and virtually symptom-free skin, has not been considered by IQWiG for years, as it cannot formally be ruled out that psoriasis symptoms may still be present and affect patients.
For this reason, IQWiG only uses analyses of PASI 100 (complete remission). From the outset, this assessment was inconsistent with the guidelines and current healthcare practice, where PASI 75 and PASI 90 responses also serve as treatment goals, since the absence of cutaneous symptoms cannot be achieved in all patients (Nast et al. 2021). The G-BA therefore also considers the corresponding results for PASI 75 and PASI 90.
There are also a number of challenges in recording PROs. For some special therapeutic situations, such as rare diseases, there are no validated and established instruments. The use of available questionnaires from other therapeutic areas is always viewed with skepticism. The potential power issue is not considered when interpreting study results. Another challenge is recording and maintaining high response rates, especially in terminal phases of life and after progression of a life-threatening disease (Böhme et al. 2022).
Until recently, there were differing opinions regarding the duration of PRO recording. On the one hand, there was a split between the IQWiG, which advocates documentation as long as possible until the end of the study, and the clinical community, which considers recording after progression important, but to a reasonable extent and not unlimited until the end of life.
Dealing with available evidenceWith regard to the general acceptance of the data, reference should be made to the existing regulation in Section 5 Paragraph 5 of the AM-NutzenV, which states: “If valid data on patient-relevant endpoints are not yet available at the time of the evaluation, the evaluation shall be based on the available evidence, taking into account the quality of the study, and shall indicate the probability of demonstrating an additional benefit. A deadline may be set by which valid data on patient-relevant endpoints must be submitted.”
On the one hand, the regulation aims to provide for the possibility of a time limit, which is common practice. On the other hand, it stipulates that assessments must be based on the available evidence. In practice, however, it has been shown that available data are generally not considered if they are not linked to patient-relevant endpoints. This raises the question of whether an assessment should be conducted taking the available evidence into account, especially in special treatment situations.
Weighting of endpoints and effectsIn addition to the fundamental question of the relevance of an endpoint, a benefit assessment also raises the question of how relevant an endpoint or effect is. In its own methods, IQWiG distinguishes three hierarchical categories of outcomes: 1. All-cause mortality, 2. Serious (or severe) symptoms and adverse effects, and health-related quality of life, and 3. Non-serious (or non-severe) symptoms and adverse effects.
However, simply classifying an outcome measure as serious or non-serious is not always sufficiently transparent or trivial. For example, the previously outlined example of an indication such as moderate to severe plaque psoriasis shows that a blanket classification of the endpoint PASI 100 (complete remission) as "non-serious/non-severe symptoms" can certainly raise questions. In many cases, the hierarchical assignment of an outcome measure (e.g., from an EORTC QLQ-C30 questionnaire for oncological diseases) is based solely on its formal classification into the categories of morbidity or quality of life. This can lead to systematic bias in the classifications of endpoints in the morbidity category, since in case of doubt they are classified as "non-serious" and thus face a higher hurdle in evaluation.
In this regard, the methodology for determining the extent of an additional benefit certainly raises a number of critical questions. This special approach to determining the extent of relative effect measures has been criticized from the outset, particularly due to the blanket thresholds for upper confidence interval limits (defined in the institute's internal consensus), normative assumptions, and the assumption of two studies across all treatment situations.
Although the Federal Joint Committee (G-BA) has not relied on the IQWiG methodology for determining the extent of benefit since 2011 (this is explicitly mentioned in all supporting reasons for the decisions), it can nevertheless be assumed that it has a lasting impact on benefit assessments. The established thresholds for continuous outcomes, combined with the conservative approach of a shifted hypothesis boundary, also do not correspond to the internationally recognized criteria or standards of evidence-based medicine and thus pose an additional challenge (IQWiG 2022).
Finally, the assessment of the fundamental relevance of PRO effects is also problematic. The requirement for established and validated MID (minimal important difference) thresholds has been replaced with a rigid 15 percent formula for responder analyses. Accordingly, if responder analyses are prespecified in a study and the response criterion corresponds to at least 15 percent of the scale range of the survey instrument used, these analyses are used for the evaluation.
However, this “one-size-fits-all” approach is controversial in many respects, in particular because it, as a special approach, also ignores the development approach of international science to improve assessment standards by means of meaningful quality criteria and does not sufficiently take into account known differences in patient perspectives on meaningful outcomes (Böhme et al 2022, Schlichting et al 2022).
Furthermore, the IQWiG methodology means that even if clinical relevance is ensured by the predefined responder criterion, a statistically significant effect for some PROs does not necessarily lead to a recognized effect. In addition to the aforementioned response criterion of 15 percent, another relevance criterion applies: the threshold for the upper confidence interval (for non-severe symptoms). This leads to a duplication of relevance criteria and an overly conservative classification of PRO effects.
Overall assessmentUltimately, a decision on added benefit focuses on an overall assessment of endpoints and treatment effects. This decision is made by the Federal Joint Committee (G-BA) on behalf of patients and their preferences, but without a formal and sufficiently transparent weighting process. Studies measuring patient preferences have not yet been considered in the AMNOG procedure.
Some classifications raise questions here, for example, in the case of classifying the therapeutic benefits as a minor additional benefit. According to the AM-NutzenV, such an additional benefit exists when a "previously unattained moderate and not merely minor improvement in the therapy-relevant benefit [...] is achieved, in particular a reduction of non-serious symptoms of the disease or a relevant avoidance of side effects."
However, in the G-BA's assessment practice, this also includes assessments that extend overall survival, prevent recurrences in cancer, more frequent complete remissions of severe plaque psoriasis in children and adolescents, or even multiple benefits in patients with moderately to severely active Crohn's disease. A balancing of effects has recently become even more crucial with the introduction of the so-called "guard rails" in the Statutory Health Insurance Financing Act (GKV-FinStG), since even a small change in the classification of the extent of additional benefit (including by reflecting residual methodological uncertainties) can determine the scope of these guard rails in negotiations.
European perspectiveAnother prospective yet imminent challenge is the European HTA process. This will begin in January 2025 with the evaluation of advanced therapy medicinal products (ATMPs) and oncology medicinal products. Evaluations of orphan drugs are to follow in 2028, and other medications in 2030. A key uncertainty is the uncertain number of national PICO questions, which include requested and available endpoints, as well as possible operationalizations and evaluations of the endpoints.
This also raises the challenge of how the national "Delta dossier" for the AMNOG process will be designed. It remains equally exciting to see whether the desired harmonization of methodological requirements will be achieved in the future and what interactions will arise from the different approaches to endpoints in European HTA and the AMNOG benefit assessment. For example, with regard to the entrenched approach of "patient-centered endpoints" (including, for example, preferences or needs) pursued in European HTA.
ConclusionIn conclusion, key challenges in dealing with endpoints in the context of benefit assessment can be summarized as follows:
a stronger focus on accepted and established methods that meet international standards of evidence-based medicine,
greater transparency in the classification and consideration of endpoints,
taking into account the specific characteristics of therapeutic situations in the context of the benefit assessment.
© vfa / B. Brundert
Dr. PH Andrej Rasch has been working as Senior Manager Benefit Assessment/HTA Coordination at the German Association of Research-Based Pharmaceutical Companies (vfa) since 2013. Prior to that, he worked as Head of Pharmaceutical Research at the Scientific Institute of the AOK (WIdO), a methodologist at the Federal Joint Committee (G-BA), and as a research associate at the Faculty of Health Sciences at Bielefeld University.
1 Böhme et al. New approaches to quality of life data are needed, in Monitor Versorgungsforschung (01/22), pp. 43-47
2 Brand et al. Value in Health, Volume 25, Issue 12S (December 2022), https://go.sn.pub/y32ial.
3 IQWiG. Validity of Surrogate Endpoints in Oncology. Rapid Report. IQWiG Reports – Year: 2011 No. 80
4 IQWiG. Documentation and evaluation of the consultation on the draft General Methods 6.1, Version 1.0 of 22 March 2022
5 Kramer et al. Health-related quality of life (HRQoL) in German early benefit assessment: The importance of disease-specific instruments, in: ZEFQ (2024), https://doi.org/10.1016/j.zefq.2024.02.003
6 Nast et al. German S3 guideline for the treatment of psoriasis vulgaris, adapted from EuroGuiDerm – Part 1: Treatment goals and treatment recommendations. J Dtsch Dermatol Ges 2021 Jun;19(6):934–951.
7 Schlichting et al. Is IQWiG's 15% Threshold Universally Applicable in Assessing the Clinical Relevance of Patient-Reported Outcomes Changes? An ISPOR Special Interest Group Report. Value Health 2022 Sep;25(9):1463-1468.
8 Teixeira et al. (2022) A review of patient-reported outcomes used for regulatory approval of oncology medicinal products in the European Union between 2017 and 2020. Front. Med. 9:968272.
Arzte zeitung