A Single Licence And A Shared Royalty Pool: Examining India’s Proposed AI Licensing Model

A Single Licence And A Shared Royalty Pool: Examining India’s Proposed AI Licensing Model

A Single Licence And A Shared Royalty Pool: Examining India’s Proposed AI Licensing Model

Written by Hetal Desai

Written by Hetal Desai

Written by Hetal Desai

Written by Hetal Desai

6 min read

6 min read

India’s conversation on generative AI has reached a stage where questions about innovation, creator remuneration and regulatory certainty can no longer be parked for later. The DPIIT committee’s recent working paper arrives at a moment when creators feel increasingly sidelined by the scale of unlicensed data use while AI developers face a patchwork of legal risks that cannot be fully mitigated through bilateral agreements or existing copyright exceptions. The proposal attempts to reconcile these concerns by constructing a national framework that gives developers predictable access to training data while ensuring that creators receive compensation. Understanding why this model has been floated, and what it might actually achieve, requires stepping back to the structural problems that triggered this exercise.

India’s conversation on generative AI has reached a stage where questions about innovation, creator remuneration and regulatory certainty can no longer be parked for later. The DPIIT committee’s recent working paper arrives at a moment when creators feel increasingly sidelined by the scale of unlicensed data use while AI developers face a patchwork of legal risks that cannot be fully mitigated through bilateral agreements or existing copyright exceptions. The proposal attempts to reconcile these concerns by constructing a national framework that gives developers predictable access to training data while ensuring that creators receive compensation. Understanding why this model has been floated, and what it might actually achieve, requires stepping back to the structural problems that triggered this exercise.

India’s conversation on generative AI has reached a stage where questions about innovation, creator remuneration and regulatory certainty can no longer be parked for later. The DPIIT committee’s recent working paper arrives at a moment when creators feel increasingly sidelined by the scale of unlicensed data use while AI developers face a patchwork of legal risks that cannot be fully mitigated through bilateral agreements or existing copyright exceptions. The proposal attempts to reconcile these concerns by constructing a national framework that gives developers predictable access to training data while ensuring that creators receive compensation. Understanding why this model has been floated, and what it might actually achieve, requires stepping back to the structural problems that triggered this exercise.

India’s conversation on generative AI has reached a stage where questions about innovation, creator remuneration and regulatory certainty can no longer be parked for later. The DPIIT committee’s recent working paper arrives at a moment when creators feel increasingly sidelined by the scale of unlicensed data use while AI developers face a patchwork of legal risks that cannot be fully mitigated through bilateral agreements or existing copyright exceptions. The proposal attempts to reconcile these concerns by constructing a national framework that gives developers predictable access to training data while ensuring that creators receive compensation. Understanding why this model has been floated, and what it might actually achieve, requires stepping back to the structural problems that triggered this exercise.

India has encountered the same tension that several jurisdictions are now grappling with. Creators have raised concerns that their works are being absorbed into large models without consent and without clarity on how those uses differ from conventional copying. A photographer who posts work online for clients, for instance, cannot practically verify whether thousands of copies have been ingested into training datasets. On the other hand, developers argue that restricting access to diverse datasets will skew model performance and slow progress, especially for Indian languages and domain specific applications.

India has encountered the same tension that several jurisdictions are now grappling with. Creators have raised concerns that their works are being absorbed into large models without consent and without clarity on how those uses differ from conventional copying. A photographer who posts work online for clients, for instance, cannot practically verify whether thousands of copies have been ingested into training datasets. On the other hand, developers argue that restricting access to diverse datasets will skew model performance and slow progress, especially for Indian languages and domain specific applications.

India has encountered the same tension that several jurisdictions are now grappling with. Creators have raised concerns that their works are being absorbed into large models without consent and without clarity on how those uses differ from conventional copying. A photographer who posts work online for clients, for instance, cannot practically verify whether thousands of copies have been ingested into training datasets. On the other hand, developers argue that restricting access to diverse datasets will skew model performance and slow progress, especially for Indian languages and domain specific applications.

India has encountered the same tension that several jurisdictions are now grappling with. Creators have raised concerns that their works are being absorbed into large models without consent and without clarity on how those uses differ from conventional copying. A photographer who posts work online for clients, for instance, cannot practically verify whether thousands of copies have been ingested into training datasets. On the other hand, developers argue that restricting access to diverse datasets will skew model performance and slow progress, especially for Indian languages and domain specific applications.

Existing provisions in the Copyright Act, 1957 (‘Copyright Act’) were never drafted with non-expressive computational uses in mind, which has resulted in an uncomfortable reliance on stretched interpretations of fair dealing, implicit licences in terms of service, or untested assumptions about the legality of scraping publicly available content. These doctrinal gaps have created uncertainty for both sides. Creators worry that enforcement is practically impossible while developers operate under the constant risk of liability if courts interpret large scale training as infringement.

Existing provisions in the Copyright Act, 1957 (‘Copyright Act’) were never drafted with non-expressive computational uses in mind, which has resulted in an uncomfortable reliance on stretched interpretations of fair dealing, implicit licences in terms of service, or untested assumptions about the legality of scraping publicly available content. These doctrinal gaps have created uncertainty for both sides. Creators worry that enforcement is practically impossible while developers operate under the constant risk of liability if courts interpret large scale training as infringement.

Existing provisions in the Copyright Act, 1957 (‘Copyright Act’) were never drafted with non-expressive computational uses in mind, which has resulted in an uncomfortable reliance on stretched interpretations of fair dealing, implicit licences in terms of service, or untested assumptions about the legality of scraping publicly available content. These doctrinal gaps have created uncertainty for both sides. Creators worry that enforcement is practically impossible while developers operate under the constant risk of liability if courts interpret large scale training as infringement.

Existing provisions in the Copyright Act, 1957 (‘Copyright Act’) were never drafted with non-expressive computational uses in mind, which has resulted in an uncomfortable reliance on stretched interpretations of fair dealing, implicit licences in terms of service, or untested assumptions about the legality of scraping publicly available content. These doctrinal gaps have created uncertainty for both sides. Creators worry that enforcement is practically impossible while developers operate under the constant risk of liability if courts interpret large scale training as infringement.

Existing ‘Fair Dealing’ and ‘Fair Use’ Divide

Existing ‘Fair Dealing’ and ‘Fair Use’ Divide

Existing ‘Fair Dealing’ and ‘Fair Use’ Divide

Existing ‘Fair Dealing’ and ‘Fair Use’ Divide

A significant part of the licensing debate stems from the structural limits of India’s fair dealing framework. Section 52 of the Copyright Act contains an exhaustive list of situations where copyrighted works may be used without permission. The list is purpose-bound and tightly framed. The permitted purposes include private or personal use such as research or study under Section 52(1)(a) of the Copyright Act, transient or incidental storage in the technical process of electronic transmission under Section 52(1)(b) of the Copyright Act, and certain uses for criticism, review or reporting under Section 52(1)(a)(ii) of the Copyright Act. Nothing in this list expressly anticipates the computational demands of large-scale text and data mining for machine learning systems, which typically involve making temporary copies, creating processed representations and analysing large volumes of protected works in a non-expressive manner. Developers often rely on the research limb or on transient storage to position training as a non-communicative, analytical operation, yet the purely commercial nature of many AI models complicates any straightforward reading of these provisions. Rights holders, in turn, argue that model training involves reproductions that fall outside the enumerated purposes and therefore cannot be sheltered by Section 52 of the Copyright Act unless the legislature expands it.

A significant part of the licensing debate stems from the structural limits of India’s fair dealing framework. Section 52 of the Copyright Act contains an exhaustive list of situations where copyrighted works may be used without permission. The list is purpose-bound and tightly framed. The permitted purposes include private or personal use such as research or study under Section 52(1)(a) of the Copyright Act, transient or incidental storage in the technical process of electronic transmission under Section 52(1)(b) of the Copyright Act, and certain uses for criticism, review or reporting under Section 52(1)(a)(ii) of the Copyright Act. Nothing in this list expressly anticipates the computational demands of large-scale text and data mining for machine learning systems, which typically involve making temporary copies, creating processed representations and analysing large volumes of protected works in a non-expressive manner. Developers often rely on the research limb or on transient storage to position training as a non-communicative, analytical operation, yet the purely commercial nature of many AI models complicates any straightforward reading of these provisions. Rights holders, in turn, argue that model training involves reproductions that fall outside the enumerated purposes and therefore cannot be sheltered by Section 52 of the Copyright Act unless the legislature expands it.

A significant part of the licensing debate stems from the structural limits of India’s fair dealing framework. Section 52 of the Copyright Act contains an exhaustive list of situations where copyrighted works may be used without permission. The list is purpose-bound and tightly framed. The permitted purposes include private or personal use such as research or study under Section 52(1)(a) of the Copyright Act, transient or incidental storage in the technical process of electronic transmission under Section 52(1)(b) of the Copyright Act, and certain uses for criticism, review or reporting under Section 52(1)(a)(ii) of the Copyright Act. Nothing in this list expressly anticipates the computational demands of large-scale text and data mining for machine learning systems, which typically involve making temporary copies, creating processed representations and analysing large volumes of protected works in a non-expressive manner. Developers often rely on the research limb or on transient storage to position training as a non-communicative, analytical operation, yet the purely commercial nature of many AI models complicates any straightforward reading of these provisions. Rights holders, in turn, argue that model training involves reproductions that fall outside the enumerated purposes and therefore cannot be sheltered by Section 52 of the Copyright Act unless the legislature expands it.

A significant part of the licensing debate stems from the structural limits of India’s fair dealing framework. Section 52 of the Copyright Act contains an exhaustive list of situations where copyrighted works may be used without permission. The list is purpose-bound and tightly framed. The permitted purposes include private or personal use such as research or study under Section 52(1)(a) of the Copyright Act, transient or incidental storage in the technical process of electronic transmission under Section 52(1)(b) of the Copyright Act, and certain uses for criticism, review or reporting under Section 52(1)(a)(ii) of the Copyright Act. Nothing in this list expressly anticipates the computational demands of large-scale text and data mining for machine learning systems, which typically involve making temporary copies, creating processed representations and analysing large volumes of protected works in a non-expressive manner. Developers often rely on the research limb or on transient storage to position training as a non-communicative, analytical operation, yet the purely commercial nature of many AI models complicates any straightforward reading of these provisions. Rights holders, in turn, argue that model training involves reproductions that fall outside the enumerated purposes and therefore cannot be sheltered by Section 52 of the Copyright Act unless the legislature expands it.

Fair use jurisdictions take a different interpretive path. For example, Section 107 of the US Copyright Act does not prescribe a closed list. Instead, it applies a four-factor test that asks whether the use is transformative, whether the work is creative or factual, how much of it is used and whether the use harms the market for the original. Courts have historically treated non-expressive computational uses, such as search indexing or digital scanning for analysis, as potentially transformative. This approach gives developers more latitude to argue that model training creates something fundamentally new rather than substituting the original. But even in fair use jurisdictions the legal position for generative AI is far from settled. The emerging litigation seeks to determine whether training on protected works and subsequently generating outputs that might imitate, but do not store, the underlying data can still be categorised as transformative.

Fair use jurisdictions take a different interpretive path. For example, Section 107 of the US Copyright Act does not prescribe a closed list. Instead, it applies a four-factor test that asks whether the use is transformative, whether the work is creative or factual, how much of it is used and whether the use harms the market for the original. Courts have historically treated non-expressive computational uses, such as search indexing or digital scanning for analysis, as potentially transformative. This approach gives developers more latitude to argue that model training creates something fundamentally new rather than substituting the original. But even in fair use jurisdictions the legal position for generative AI is far from settled. The emerging litigation seeks to determine whether training on protected works and subsequently generating outputs that might imitate, but do not store, the underlying data can still be categorised as transformative.

Fair use jurisdictions take a different interpretive path. For example, Section 107 of the US Copyright Act does not prescribe a closed list. Instead, it applies a four-factor test that asks whether the use is transformative, whether the work is creative or factual, how much of it is used and whether the use harms the market for the original. Courts have historically treated non-expressive computational uses, such as search indexing or digital scanning for analysis, as potentially transformative. This approach gives developers more latitude to argue that model training creates something fundamentally new rather than substituting the original. But even in fair use jurisdictions the legal position for generative AI is far from settled. The emerging litigation seeks to determine whether training on protected works and subsequently generating outputs that might imitate, but do not store, the underlying data can still be categorised as transformative.

Fair use jurisdictions take a different interpretive path. For example, Section 107 of the US Copyright Act does not prescribe a closed list. Instead, it applies a four-factor test that asks whether the use is transformative, whether the work is creative or factual, how much of it is used and whether the use harms the market for the original. Courts have historically treated non-expressive computational uses, such as search indexing or digital scanning for analysis, as potentially transformative. This approach gives developers more latitude to argue that model training creates something fundamentally new rather than substituting the original. But even in fair use jurisdictions the legal position for generative AI is far from settled. The emerging litigation seeks to determine whether training on protected works and subsequently generating outputs that might imitate, but do not store, the underlying data can still be categorised as transformative.

This divergence creates a practical mismatch for India. A closed, purpose-specific fair dealing provision restricts judicial flexibility and leaves little room for courts to recognise novel technological uses unless they fit within the existing statutory wording. This uncertainty fuels the argument for legislative solutions such as a statutory licensing mechanism that operates independently of Section 52. The proposed licensing model is an attempt to bridge this doctrinal gap by providing clarity at the licensing layer rather than relying on exceptions that were never designed with modern AI systems in mind.

This divergence creates a practical mismatch for India. A closed, purpose-specific fair dealing provision restricts judicial flexibility and leaves little room for courts to recognise novel technological uses unless they fit within the existing statutory wording. This uncertainty fuels the argument for legislative solutions such as a statutory licensing mechanism that operates independently of Section 52. The proposed licensing model is an attempt to bridge this doctrinal gap by providing clarity at the licensing layer rather than relying on exceptions that were never designed with modern AI systems in mind.

This divergence creates a practical mismatch for India. A closed, purpose-specific fair dealing provision restricts judicial flexibility and leaves little room for courts to recognise novel technological uses unless they fit within the existing statutory wording. This uncertainty fuels the argument for legislative solutions such as a statutory licensing mechanism that operates independently of Section 52. The proposed licensing model is an attempt to bridge this doctrinal gap by providing clarity at the licensing layer rather than relying on exceptions that were never designed with modern AI systems in mind.

This divergence creates a practical mismatch for India. A closed, purpose-specific fair dealing provision restricts judicial flexibility and leaves little room for courts to recognise novel technological uses unless they fit within the existing statutory wording. This uncertainty fuels the argument for legislative solutions such as a statutory licensing mechanism that operates independently of Section 52. The proposed licensing model is an attempt to bridge this doctrinal gap by providing clarity at the licensing layer rather than relying on exceptions that were never designed with modern AI systems in mind.

What the proposal actually does

What the proposal actually does

What the proposal actually does

What the proposal actually does

The working paper positions the proposed mandatory AI licensing scheme as a structural response to these problems. Rather than forcing creators and developers into bilateral negotiations at impossible scale, it sketches a system where developers can train on any content they have lawfully accessed, as long as they pay statutory royalties to a

The working paper positions the proposed mandatory AI licensing scheme as a structural response to these problems. Rather than forcing creators and developers into bilateral negotiations at impossible scale, it sketches a system where developers can train on any content they have lawfully accessed, as long as they pay statutory royalties to a designated collective. By introducing a centralised rights management entity for AI training, the model aims to convert a fragmented environment into a single predictable workflow. The lawful access requirement is a significant pivot because it separates copyright permission from access conditions. In practice this means that buying, subscribing to or otherwise legitimately accessing content would satisfy the threshold for training use, while unauthorised scraping or circumvention of access controls would not. A developer who trains on articles from a paid digital news subscription would qualify, while a developer who bypasses a paywall to download the same content would not.

The working paper positions the proposed mandatory AI licensing scheme as a structural response to these problems. Rather than forcing creators and developers into bilateral negotiations at impossible scale, it sketches a system where developers can train on any content they have lawfully accessed, as long as they pay statutory royalties to a

The working paper positions the proposed mandatory AI licensing scheme as a structural response to these problems. Rather than forcing creators and developers into bilateral negotiations at impossible scale, it sketches a system where developers can train on any content they have lawfully accessed, as long as they pay statutory royalties to a designated collective. By introducing a centralised rights management entity for AI training, the model aims to convert a fragmented environment into a single predictable workflow. The lawful access requirement is a significant pivot because it separates copyright permission from access conditions. In practice this means that buying, subscribing to or otherwise legitimately accessing content would satisfy the threshold for training use, while unauthorised scraping or circumvention of access controls would not. A developer who trains on articles from a paid digital news subscription would qualify, while a developer who bypasses a paywall to download the same content would not.

The shift is significant because it implicitly acknowledges that the existing exceptions for transient or incidental storage, or fair dealing for research, cannot bear the weight of modern machine learning practices. The paper hints that amendments may be required to introduce a statutory licence specific to training, or to modify certain sections of the Act to recognise computational uses as distinct from expressive uses. It does not, however, prescribe specific clause language. This leaves room for legislative debate on definitions, thresholds, obligations of developers and scope of coverage.

The shift is significant because it implicitly acknowledges that the existing exceptions for transient or incidental storage, or fair dealing for research, cannot bear the weight of modern machine learning practices. The paper hints that amendments may be required to introduce a statutory licence specific to training, or to modify certain sections of the Act to recognise computational uses as distinct from expressive uses. It does not, however, prescribe specific clause language. This leaves room for legislative debate on definitions, thresholds, obligations of developers and scope of coverage.

designated collective. By introducing a centralised rights management entity for AI training, the model aims to convert a fragmented environment into a single predictable workflow. The lawful access requirement is a significant pivot because it separates copyright permission from access conditions. In practice this means that buying, subscribing to or otherwise legitimately accessing content would satisfy the threshold for training use, while unauthorised scraping or circumvention of access controls would not. A developer who trains on articles from a paid digital news subscription would qualify, while a developer who bypasses a paywall to download the same content would not.

designated collective. By introducing a centralised rights management entity for AI training, the model aims to convert a fragmented environment into a single predictable workflow. The lawful access requirement is a significant pivot because it separates copyright permission from access conditions. In practice this means that buying, subscribing to or otherwise legitimately accessing content would satisfy the threshold for training use, while unauthorised scraping or circumvention of access controls would not. A developer who trains on articles from a paid digital news subscription would qualify, while a developer who bypasses a paywall to download the same content would not.

The shift is significant because it implicitly acknowledges that the existing exceptions for transient or incidental storage, or fair dealing for research, cannot bear the weight of modern machine learning practices. The paper hints that amendments may be required to introduce a statutory licence specific to training, or to modify certain sections of the Act to recognise computational uses as distinct from expressive uses. It does not, however, prescribe specific clause language. This leaves room for legislative debate on definitions, thresholds, obligations of developers and scope of coverage.

The shift is significant because it implicitly acknowledges that the existing exceptions for transient or incidental storage, or fair dealing for research, cannot bear the weight of modern machine learning practices. The paper hints that amendments may be required to introduce a statutory licence specific to training, or to modify certain sections of the Act to recognise computational uses as distinct from expressive uses. It does not, however, prescribe specific clause language. This leaves room for legislative debate on definitions, thresholds, obligations of developers and scope of coverage.

How this could reshape industry practice

How this could reshape industry practice

How this could reshape industry practice

How this could reshape industry practice

If adopted, the model would create several practical changes for different stakeholders. Developers would need to document lawful access trails, maintain provenance logs, and calculate royalties based on prescribed formulas. For example, a startup building a specialised model for legal research may need to record where each segment of its training corpus came from, instead of relying on a single large scraped archive. This will raise compliance costs, particularly for teams that do not already maintain structured data workflows. At the same time, the proposal envisages thresholds or exemptions for startups, which could soften the financial impact on early-stage players.

If adopted, the model would create several practical changes for different stakeholders. Developers would need to document lawful access trails, maintain provenance logs, and calculate royalties based on prescribed formulas. For example, a startup building a specialised model for legal research may need to record where each segment of its training corpus came from, instead of relying on a single large scraped archive. This will raise compliance costs, particularly for teams that do not already maintain structured data workflows. At the same time, the proposal envisages thresholds or exemptions for startups, which could soften the financial impact on early-stage players.

If adopted, the model would create several practical changes for different stakeholders. Developers would need to document lawful access trails, maintain provenance logs, and calculate royalties based on prescribed formulas. For example, a startup building a specialised model for legal research may need to record where each segment of its training corpus came from, instead of relying on a single large scraped archive. This will raise compliance costs, particularly for teams that do not already maintain structured data workflows. At the same time, the proposal envisages thresholds or exemptions for startups, which could soften the financial impact on early-stage players.

If adopted, the model would create several practical changes for different stakeholders. Developers would need to document lawful access trails, maintain provenance logs, and calculate royalties based on prescribed formulas. For example, a startup building a specialised model for legal research may need to record where each segment of its training corpus came from, instead of relying on a single large scraped archive. This will raise compliance costs, particularly for teams that do not already maintain structured data workflows. At the same time, the proposal envisages thresholds or exemptions for startups, which could soften the financial impact on early-stage players.

Creators would be encouraged to register their works to claim royalties, which may lead to a gradual formalisation of sectors where creators currently operate informally. However, registration linked compensation also risks excluding many creators who do not have the resources or awareness to register. For consumers and downstream users, the likely impact is more complex. A predictable training environment may accelerate domestic model development and reduce the legal overhang on AI products, but the costs of compliance may be passed on through higher product prices or reduced competition.

Creators would be encouraged to register their works to claim royalties, which may lead to a gradual formalisation of sectors where creators currently operate informally. However, registration linked compensation also risks excluding many creators who do not have the resources or awareness to register. For consumers and downstream users, the likely impact is more complex. A predictable training environment may accelerate domestic model development and reduce the legal overhang on AI products, but the costs of compliance may be passed on through higher product prices or reduced competition.

Creators would be encouraged to register their works to claim royalties, which may lead to a gradual formalisation of sectors where creators currently operate informally. However, registration linked compensation also risks excluding many creators who do not have the resources or awareness to register. For consumers and downstream users, the likely impact is more complex. A predictable training environment may accelerate domestic model development and reduce the legal overhang on AI products, but the costs of compliance may be passed on through higher product prices or reduced competition.

Creators would be encouraged to register their works to claim royalties, which may lead to a gradual formalisation of sectors where creators currently operate informally. However, registration linked compensation also risks excluding many creators who do not have the resources or awareness to register. For consumers and downstream users, the likely impact is more complex. A predictable training environment may accelerate domestic model development and reduce the legal overhang on AI products, but the costs of compliance may be passed on through higher product prices or reduced competition.

How India’s approach compares globally

How India’s approach compares globally

How India’s approach compares globally

How India’s approach compares globally

When compared with international approaches, India’s proposal occupies a unique space. Europe’s framework relies heavily on text and data mining exceptions with opt outs for rights holders, which gives creators a voice but also creates operational complexity for developers who must track opt outs across millions of works. Japan and Singapore have leaned toward permissive data mining exceptions, placing innovation at the centre of their policy choices. The Indian proposal differs from both positions by blending wide access with mandatory remuneration. The model resembles elements of compulsory licences used historically in broadcasting and public performance, but applies them to the computational context of AI training. It attempts to avoid the fragmentation seen in some jurisdictions where multiple collecting societies complicate compliance.

When compared with international approaches, India’s proposal occupies a unique space. Europe’s framework relies heavily on text and data mining exceptions with opt outs for rights holders, which gives creators a voice but also creates operational complexity for developers who must track opt outs across millions of works. Japan and Singapore have leaned toward permissive data mining exceptions, placing innovation at the centre of their policy choices. The Indian proposal differs from both positions by blending wide access with mandatory remuneration. The model resembles elements of compulsory licences used historically in broadcasting and public performance, but applies them to the computational context of AI training. It attempts to avoid the fragmentation seen in some jurisdictions where multiple collecting societies complicate compliance.

When compared with international approaches, India’s proposal occupies a unique space. Europe’s framework relies heavily on text and data mining exceptions with opt outs for rights holders, which gives creators a voice but also creates operational complexity for developers who must track opt outs across millions of works. Japan and Singapore have leaned toward permissive data mining exceptions, placing innovation at the centre of their policy choices. The Indian proposal differs from both positions by blending wide access with mandatory remuneration. The model resembles elements of compulsory licences used historically in broadcasting and public performance, but applies them to the computational context of AI training. It attempts to avoid the fragmentation seen in some jurisdictions where multiple collecting societies complicate compliance.

When compared with international approaches, India’s proposal occupies a unique space. Europe’s framework relies heavily on text and data mining exceptions with opt outs for rights holders, which gives creators a voice but also creates operational complexity for developers who must track opt outs across millions of works. Japan and Singapore have leaned toward permissive data mining exceptions, placing innovation at the centre of their policy choices. The Indian proposal differs from both positions by blending wide access with mandatory remuneration. The model resembles elements of compulsory licences used historically in broadcasting and public performance, but applies them to the computational context of AI training. It attempts to avoid the fragmentation seen in some jurisdictions where multiple collecting societies complicate compliance.

Open questions that remain

Open questions that remain

Open questions that remain

Open questions that remain

The lawful access test will almost certainly generate disputes because courts will need to interpret whether public availability constitutes lawful access, particularly when terms of use prohibit scraping. A publicly accessible blog may be free to read, yet the site owner may restrict automated crawling in their terms of use, creating a tension the proposal does not fully resolve. Developers may also struggle to prove lawful access for legacy datasets built before any regulatory framework existed. The decision not to impose mandatory dataset disclosure protects competitive interests but makes verification difficult for rights holders. The governance structure of the proposed collective will influence fairness because royalty distribution formulas and audit mechanisms determine whether creators truly benefit. Cross border datasets add another difficult layer since many models rely on global corpora and it is unclear how foreign works or reciprocal royalties will be treated.

The lawful access test will almost certainly generate disputes because courts will need to interpret whether public availability constitutes lawful access, particularly when terms of use prohibit scraping. A publicly accessible blog may be free to read, yet the site owner may restrict automated crawling in their terms of use, creating a tension the proposal does not fully resolve. Developers may also struggle to prove lawful access for legacy datasets built before any regulatory framework existed. The decision not to impose mandatory dataset disclosure protects competitive interests but makes verification difficult for rights holders. The governance structure of the proposed collective will influence fairness because royalty distribution formulas and audit mechanisms determine whether creators truly benefit. Cross border datasets add another difficult layer since many models rely on global corpora and it is unclear how foreign works or reciprocal royalties will be treated.

The lawful access test will almost certainly generate disputes because courts will need to interpret whether public availability constitutes lawful access, particularly when terms of use prohibit scraping. A publicly accessible blog may be free to read, yet the site owner may restrict automated crawling in their terms of use, creating a tension the proposal does not fully resolve. Developers may also struggle to prove lawful access for legacy datasets built before any regulatory framework existed. The decision not to impose mandatory dataset disclosure protects competitive interests but makes verification difficult for rights holders. The governance structure of the proposed collective will influence fairness because royalty distribution formulas and audit mechanisms determine whether creators truly benefit. Cross border datasets add another difficult layer since many models rely on global corpora and it is unclear how foreign works or reciprocal royalties will be treated.

The lawful access test will almost certainly generate disputes because courts will need to interpret whether public availability constitutes lawful access, particularly when terms of use prohibit scraping. A publicly accessible blog may be free to read, yet the site owner may restrict automated crawling in their terms of use, creating a tension the proposal does not fully resolve. Developers may also struggle to prove lawful access for legacy datasets built before any regulatory framework existed. The decision not to impose mandatory dataset disclosure protects competitive interests but makes verification difficult for rights holders. The governance structure of the proposed collective will influence fairness because royalty distribution formulas and audit mechanisms determine whether creators truly benefit. Cross border datasets add another difficult layer since many models rely on global corpora and it is unclear how foreign works or reciprocal royalties will be treated.

Despite these uncertainties, the proposal is valuable because it reframes the AI copyright debate as a question of institutional design rather than ideological conflict. It acknowledges that creators deserve remuneration and that developers need predictable access. It also recognises that clarity is essential for innovation. If designed well, the licensing architecture could reduce friction, encourage formalisation, and give India a coherent position in global AI regulation. The consultation phase will be critical because the model’s success will depend on definitions, governance rules, enforcement mechanisms and economic thresholds that the current draft leaves open.

Despite these uncertainties, the proposal is valuable because it reframes the AI copyright debate as a question of institutional design rather than ideological conflict. It acknowledges that creators deserve remuneration and that developers need predictable access. It also recognises that clarity is essential for innovation. If designed well, the licensing architecture could reduce friction, encourage formalisation, and give India a coherent position in global AI regulation. The consultation phase will be critical because the model’s success will depend on definitions, governance rules, enforcement mechanisms and economic thresholds that the current draft leaves open.

Despite these uncertainties, the proposal is valuable because it reframes the AI copyright debate as a question of institutional design rather than ideological conflict. It acknowledges that creators deserve remuneration and that developers need predictable access. It also recognises that clarity is essential for innovation. If designed well, the licensing architecture could reduce friction, encourage formalisation, and give India a coherent position in global AI regulation. The consultation phase will be critical because the model’s success will depend on definitions, governance rules, enforcement mechanisms and economic thresholds that the current draft leaves open.

Despite these uncertainties, the proposal is valuable because it reframes the AI copyright debate as a question of institutional design rather than ideological conflict. It acknowledges that creators deserve remuneration and that developers need predictable access. It also recognises that clarity is essential for innovation. If designed well, the licensing architecture could reduce friction, encourage formalisation, and give India a coherent position in global AI regulation. The consultation phase will be critical because the model’s success will depend on definitions, governance rules, enforcement mechanisms and economic thresholds that the current draft leaves open.