HARMONIZING AI MODEL TRAINING WITH THE DIGITAL PERSONAL DATA PROTECTION ACT, 2023 AND THE EUROPEAN UNION AI REGULATORY FRAMEWORK: A LEGAL MODEL FOR CROSS-BORDER DATA SOURCING AND INTELLECTUAL PROPERTY LIABILITY

 ABSTRACT

The swift development of artificial intelligence (AI) has greatly amplified the dependence on itemized information on scale data training and optimization, which is often obtained across various jurisdictions by automated data gathering. The practice presents intricate legal issues that touch on the intersection of the law of data protection, cross-border data governance, and intellectual property (IP) law. [1]The Digital Personal Data Protection Act, 2023 in India, and the regulatory framework in Europe, including the General Data Protection Regulation[2] and the Artificial Intelligence Act are two different, but very impactful methods of controlling the work of data processing and AI systems.[3]

In this paper, the authors discuss the level to which current legal regulations manage the training of AI models, especially when it comes to legal data processing, consent, and restriction of goals. It also examines the regulatory problems of data flows across borders where AI developers have to work within the space of conflicting and possibly overlapping legal regimes. The paper also discusses the issues of intellectual property, such as the application of copyrighted content to training datasets, as well as the lack of a clear policy on how to attribute and be liable.[4]

The paper, with the help of a qualitative, doctrinal, and comparative approach, pinpoints the essential gaps in existing legal regimes, namely, their failure to take into consideration the reality of vast AI training.[5] It in turn suggests a unified legal framework that is founded on a three-layer model containing (i) data legitimacy evaluation, (ii) cross-border compliance architecture, and (iii) proportional distribution of IP liability. The research adds to the new debate on the global AI regulation by providing a systematic and practical solution to the compatibility of innovation and legal responsibility.[6]

KEYWORDS

Artificial Intelligence; Data Protection Law; Cross-Border Data Transfer; Intellectual Property Liability; AI Regulation; Data Governance

INTRODUCTION

Artificial intelligence has become a game changer to the digital economy essentially altering how data is gathered, processed and used. AI systems especially machine learning and generative models – require extensive datasets to be trained, optimized and deployed. These data have often been obtained by different sources such as publicly available sites, commercial databases, and user-created materials.[7]

As much as such practices have found ways of speeding up technological innovation, they have brought major legal challenges. The crux of this problem is that there is a structural conflict between the working conditions of the AI systems and the principles of the data protection law. Development of AI is based on the extensive aggregation of data and their reuse, whereas the legal frameworks introduce restrictions to its development through such doctrines as purpose limitation, data minimization, and etc. as lawful processing.

The Digital Personal Data Protection Act, 2023[8] in India has a consent-based structure of data processing on digital personal information. Though the presentation of AI systems is not specifically mentioned in the Act, its extension applies to AI training actions related to personal data. Nevertheless, the lack of AI-related instructions raises questions about the practice related to mass data scrapping and utilization of secondary data.[9]

On the contrary, the European Union uses a multi-layered regulatory model that will integrate the General Data Protection Regulation (GDPR) with the Artificial Intelligence Act. Although the GDPR sets the principles of transparency, accountability and lawful processing, the AI Act represents a risk-based framework of AI systems.[10] In spite of such a detailed architecture, there are still some serious ambiguities, especially in the case of training data sets and publicly accessible information.

The compliance is also complicated by the transnationalism of the development of AI. AI models are frequently trained with a dataset that includes various jurisdictions, and such legal conflicts impose saying that the same model produces conflicting and redundant legal requirements.[11] This brings with it, difficult issues of jurisdiction, law to apply and application.

Moreover, the intellectual property law has unresolved issues in terms of AI models training. Data sets that are being used to train are often copyrighted, and existing laws are silent on whether the use is infringement or the way the liability is to be distributed. These doubts apply to the outputs of AI as questions of authorship and ownership arise.[12]

Although there are various regulatory regimes, there is no unified framework of laws that cover the crossroads of data protection, cross-border regulation and intellectual property in AI training. The paper aims to address this gap presenting a harmonized legal model based on the technological reality of AI systems.[13]

RESEARCH METHODOLOGY

The proposed research employs a qualitative, doctrinal, and comparative research approach in order to discuss the legal issues that exist in the context of AI model training in terms of data protection and intellectual property law.

Statutory provisions, regulatory instruments, and legal principles are analysed using the doctrinal approach in regard to the Digital Personal Data Protection Act, 2023 of India[14] and the General Data Protection Regulation and Artificial Intelligence Act of the European Union. This will entail a thorough review of legislative documents, policy statements and regulatory provisions so as to know the extent and relevance of such provisions to the training of AI models.[15]

To determine the areas of intersection and difference in Indian and European attitudes to data protection and AI governance, a comparative analysis of the law is conducted. The comparison is drawn to such key principles as consent, the lawfulness of processing, limitation of purpose, accountability, and regulation on a risk basis. This comparative prism allows highlighting regulatory voids and inconsistencies which occur in cross-border data ecosystems.[16]

This paper also has a conceptual and analytical element of analysing the overlap of the data protection law with the intellectual property law in relation to AI training datasets. Since the binding jurisprudence that specifically covers the AI training practices is scarce, the study will be based on secondary sources such as scholarly literature, policy briefs, and new regulatory commentary.[17]

The paper also uses hypothetical case modelling to demonstrate real life-case scenarios such as the use of cross-border data sourcing, data scraping, and AI-generated output. It is explained by these hypothetical models to test the relevance of the current legal frameworks as well as to show the real-life implications of regulatory gaps.[18]

This methodology is aimed at, not merely, interpreting the existing law but also producing a normative structure that will solve all the challenges identified. In this respect, the study will take a problem-solution research approach, which will end with the suggestion of a harmonized legal framework of AI training compliance and liability distribution.[19]

LITRETURE REVIWE

Artificial intelligence (AI) model training regulation has become a troubling area of legal study, especially where it touches upon the law of data protection, cross-border data regulation, and intellectual property (IP) law. The current literature highlights the major conflicts between legal systems and technological realities of AI systems, but it is still rather fragmented.[20]

A significant amount of literature criticizes the applicability of AI to the principles of data protection. Sandra Wachter and Brent Mittelstadt claim that the frameworks like the General Data Protection Regulation (GDPR) are not oriented structurally in AI, especially because of their relying on such principles as purpose limitation, or data minimization. These values are incompatible with machine learning systems which rely on massive data aggregation and re-use.[21] On the same note, Paul Voigt and Axel von dem Bussche observe that although the GDPR offers a holistic compliance framework, it is unclear about AI training data, particularly on publicly available data and automated data scraping.[22]

Likewise, the Digital Personal Data Protection Act, 2023 is the subject of scholarship in the Indian context that indicates the same concerns. Graham Greenleaf and Pavan Duggal[23] note that the consent-centric system is inadequate in the context of the AI-specific practices, including massive data collection and access to secondary data, so there is a lack of regulation.[24]

A second important line of literature is on the subject of cross-border data governance. Anupam Chander[25] and Uyen Le point at the difficulties with multi-jurisdiction datasets, that current mechanisms, including adequacy decisions, cannot be used with AI systems. Christopher Kuner also highlights that there is a problem of regulatory fragmentation, in which conflicting legal regimes have diverse obligations to AI developers, making it difficult to comply with in a global data ecosystem.[26]

Intellectual property law has been seen as another field where AI and the law both interact. Mark Lemley says that deploying copyrighted content in AI training could be allowed under such common law as fair use, but such interpretations are jurisdiction-specific.[27] Likewise, Pamela Samuelson observes that current copyright systems have not been created to tackle machine learning activities, and therefore, there is confusion on the issue of infringement and attribution of machine learning.[28]

Accountability and risk-based regulation can be found in the policy frameworks, such as those of the Organisation for Economic Co-operation and Development and the Artificial Intelligence Act. Nevertheless, researchers state that these solutions are mostly concerned with the implementation of AI, instead of the upstream operations, i.e., data sourcing and training models. Regardless of such contributions, the literature also shows that three key gaps exist: (i) there is no clarity about how AI training data is classified in legal terms, (ii) there is no consistent framework on the cross-border compliance in multi-origin datasets, and (iii) there are no adequate models to assign the IP liability in the AI lifecycle. To fill these gaps, this paper will provide a three-layer model to harmonize the three variables of data legitimacy, cross-border compliance, and proportional IP liability.[29]

SUGGESTIONS

Considering the gaps that have been identified in the current legal regulations, this paper suggests a three-layer harmonized regulatory framework that will regulate the training of AI models in a way that neither discourages technological advancement nor makes the practice legally liable. The suggested framework will combine the concepts of data protection, cross-border governance, and intellectual property law in a single comprehensive and systematic system.

Layer I Data Legitimacy Framework.

The layer one layer resolves the underlying problem of clarifying the legality of information utilized in AI training. Available legislations like the Digital Personal Data Protection Act, 2023 of India[30] and the General Data Protection Regulation (GDPR) [31]of the European Union focus on legal processing and consent. These principles are not, however, entirely adjusted to the realities of large-scale AI training, where data can be aggregated and repurposed and processed outside its context.[32]

As a solution to this weakness, a hierarchical taxonomy of datasets into four categories is suggested by the research:

  • personal data,
  • publicly available data,
  • anonymized data, and
  • copyrighted data. Although personal data is still under the consent and lawful basis provisions of Article 6 of the GDPR[33], the usage of publicly available data needs a situational analysis and not a presumption of free usage. Any anonymized economies of information can be used under the protection of re-identity considerations, in line with Recital 26 of the GDPR[34]. Copyrighted data, however, must be regulated with licensing systems or exceptions to the law like fair use or text-and-data mining clauses to the EU law[35].

In this regard, the paper presents a new standard of consent that is a Contextual Consent Standard, according to which the legality of the use of data is evaluated in terms of (a) the context of initial disclosure, (b) the reasonable expectations of the subject of the data, and (c) the magnitude and the purpose of AI processing. This strategy fits with the changing interpretations of the purpose limitation and the legitimate interest in the data protection laws[36] and offers a more viable alternative approach to the stricter consent-based models.

Layer II Cross-Border Compliance Architecture

The second-tier deals with the regulatory issues that emerge due to cross-border flows of data in AI systems. The training of AI models is usually conducted based on data provided by various jurisdictions, which evokes conflicting legal requirements. The existing software like adequacy decisions and standard contractual clauses under the GDPR2 fall short of the requirements of multi-origin AI datasets.

The paper will fill this gap by suggesting a Cross-Border AI Compliance Architecture that is comprised of four components. First, the mapping of data origin entails the developers’ keeping records of dataset provenance that can be verified. Second, jurisdiction tagging attributes legal regimes to the various parts of data enabling the tracking of compliance. Third, dual compliance mechanism guarantees compliance with not one but several regulatory frameworks such as consent-based reporting with Indian law1 and risk-based reporting with the EU Artificial Intelligence Act[37]. Fourth, legal in the form of compliance-by-design systems This principle is taken up in AI development projects to ensure legal safeguards that are in line with the principle of data protection by design and by default in Article 25 of the GDPR[38].Such architecture will promote transparency, reduce regulatory conflicts and establish accountability in the cross-border AI operations.

 Layers III Intellectual Property Liability Framework

 Intellectual Property Liability Framework. The third layer is the solution to the unresolved

problem of intellectual property liability during the training of AI models. The current copyright laws lack proper guidelines on whether using guarded works in instructing datasets is infringement. Though other jurisdictions, like the European Union

have created text-and-data mining exceptions[39], they are narrow and conditional. To overcome this ambiguity, the research suggests a Training Data Attribution Doctrine, which stipulates that training datasets must be documented and data sources are to be classified, and audit trails are to be kept. This helps in improving transparency and traceability in AI systems. Furthermore, it is enhanced with a Proportional Liability Model to distribute the responsibility throughout the AI lifecycle. In this model, the data collectors are held responsible of illegal acquisition, AI developers of unethical training procedures, and deployers of detrimental or infringing results. This strategy is in line with the new principles of risk-allocation in AI governance4 and would guarantee equitable sharing of liability.[40]

Additional Recommendations

In addition to the three-layer framework, the paper suggests the creation of AI-related regulatory provisions under the Indian law, especially under the Digital Personal Data Protection Act, 2023.[41] Although the Act introduces a framework of personal data protection, it remains silent in relation to the complicated matters of AI model training, such as automated decision-making, reuse of large-scale data, and algorithmic opaqueness. Accordingly, Indian regulators need to introduce additional instructions or delegated acts explaining whether data protection principles apply to AI systems, particularly with regard to the secondary use of data and automated operations.

Moreover, the paper highlights the necessity to enhance mechanisms of international cooperation in the governance of AI. Due to the global character of the innovations in AI development, the unilateral methods of regulation cannot be applied. Jurisdictional coordination, especially between India and the European Union is required so that legal regimes like the General Data Protection Regulation and the Artificial Intelligence Act could be interoperable. This may be done by bilateral or multilateral agreements with regard to data sharing standards, compliance recognition and enforcement cooperation.[42]

The other recommendation that is of value is the standardization of documentation practices and dataset governance. The planning of AI developers must be made to keep a detailed document of the sources of data, categories, and ways of processing. Transparency can be improved by using standardized documentation templates (which resemble model cards or datasheets of datasets) to enable audit of the regulatory framework. This would not only enhance accountability, but it would also be in a position to assess and mitigate risks more effectively.Also, the research suggests creating special regulatory organizations or AI control agencies in India, which would manage adherence, provide advice, and resolve complaints about AI systems. These bodies ought to liaise with the available data protection authorities to have a unified regulatory strategy.[43]

Such actions correspond to international policy frameworks, such as the OECD AI Principles, which focus on transparency and accountability in AI development and human-friendly AI. All of these recommendations would help in the design of a harmonized, flexible, and globally compatible regulatory ecosystem capable of combating the emerging changes of training of AI models.[44]

CONCLUSION

The concept of artificial intelligence has proved to be a game changer in the digital era, which has changed the very way data is gathered, crunched, and used. Though, the fast development of AI technologies, especially in relation to model training has revealed considerable flaws within the current legal frameworks on data protection and intellectual property.

In this paper, it has been shown that existing regulatory frameworks, such as the Digital Personal Data Protection Act, 2023 of India[45], and the European system, including the General Data Protection Regulation and the Artificial Intelligence Act,[46] do not have all the tools to handle the challenges of the training of AI models. Although these frameworks offer significant protection, they exist in disjointed legal spheres, and do not offer a coherent way of dealing with problems like international data transfers and liability of intellectual property.

The issues which have been identified in the context of this paper, especially the ones involving consent, reuse of data, conflict of jurisdiction and copyright violation, demonstrate the necessity of a more integrated and flexible legal system. Although traditional legal principles are crucial, they must be changed based on the magnitude, automation, and transnationality of AI systems. To this, the paper has recommended a harmonized legal system based on three major dimensions, namely, data legitimacy, compliance across borders and proportionality of sharing liability.[47]

Finally, AI governance will be determined by the capacity of the law to adapt to the technological changes in the future. This will involve more than a reform of the existing laws and more coordinate efforts between jurisdictions and regulatory authorities. There is an urgent need to establish a unified attitude toward AI regulation to make sure that the technological advancement is realized in a legally and socially responsible way.[48]

In the case of India, it will be an opportunity to come up with a progressive regulatory framework that also harmonizes innovation with the upholding of fundamental rights. The global level will see greater collaboration and uniformity play a significant role in solving the issues of cross-border AI systems. It is only under these organized changes that the law can effectively act on the transformative effects of artificial intelligence.

ARSHI KHAN

PRESTIGE INSTITUTE OF MANAGEMENT DEPARTMENT OF LAW INDORE


[1] Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach 27–30 (4th ed. 2020).

[2] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (General Data Protection Regulation), 2016 O.J. (L 119) 1

[3] Anupam Chander & Uyên P. Lê, Data Nationalism, 64 Emory L.J. 677, 690–95 (2015).

[4] Digital Personal Data Protection Act, 2023, No. 22 of 2023, § 4 (India).

[5] Christopher Kuner, Transborder Data Flows and Data Privacy Law 45–52 (Oxford Univ. Press 2013).

[6] Mark A. Lemley & Bryan Casey, Fair Learning, 99 Tex. L. Rev. 743, 748–60 (2021).

[7] Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach 27–32 (4th ed. 2020).

[8] Digital Personal Data Protection Act, 2023, No. 22 of 2023, §§ 4, 6 (India).

[9] Ian Goodfellow, Yoshua Bengio & Aaron Courville, Deep Learning 98–105 (MIT Press 2016).

[10] Mark A. Lemley & Bryan Casey, Fair Learning, 99 Tex. L. Rev. 743, 748–52 (2021).

[11] Mireille Hildebrandt, Law for Computer Scientists and Other Folk 215–20 (Oxford Univ. Press 2020).

[12] Jane C. Ginsburg & Luke Ali Budiardjo, Authors and Machines, 34 Berkeley Tech. L.J. 343, 350–60 (2019).

[13] OECD, Artificial Intelligence, Machine Learning and Big Data in Finance 14–18 (2021).

[14] Digital Personal Data Protection Act, 2023, No. 22 of 2023, §§ 4–6 (India).

[15] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (General Data Protection Regulation), 2016 O.J. (L 119) 1.

[16] Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act), 2024 O.J. (

[17] Mark A. Lemley & Bryan Casey, Fair Learning, 99 Tex. L. Rev. 743, 748–55 (2021).

[18] Konrad Zweigert & Hein Kötz, An Introduction to Comparative Law 34–38 (Tony Weir trans., Oxford Univ. Press 3d ed. 1998).

[19] .L.A. Hart, The Concept of Law 123–30 (3d ed. 2012).

[20] Sandra Wachter and Brent Mittelstadt, ‘A Right to Reasonable Inferences’ (2019) 2 Columbia Business Law Review 494.

[21] Brent Mittelstadt et al, ‘The Ethics of Algorithms’ (2016) 3 Big Data & Society.

[22] Paul Voigt and Axel von dem Bussche, The EU General Data Protection Regulation (GDPR): A Practical Guide(Springer 2017).

[23] Pavan Duggal, ‘India’s Data Protection Framework and AI Challenges’ (2023).

[24] Graham Greenleaf, Asian Data Privacy Laws (OUP 2014).

[25] Anupam Chander, How Law Made Silicon Valley (OUP 2018).

[26] Christopher Kuner, Transborder Data Flows and Data Privacy Law (OUP 2013)

[27] Mark Lemley, ‘Fair Learning’ (2021) 99 Texas Law Review 743

[28] Pamela Samuelson, ‘Implications of Machine Learning for Copyright Law’ (2020).

[29] OECD, AI Principles (2019)

[30] Paul Voigt and Axel von dem Bussche, The EU General Data Protection Regulation (GDPR): A Practical Guide(Springer 2017).

[31] Anupam Chander, How Law Made Silicon Valley (Oxford University Press 2018)

[32] Mark Lemley, ‘Fair Learning’ (2021) 99 Texas Law Review 743.

[33] Digital Personal Data Protection Act, 2023.

[34] General Data Protection Regulation (Regulation (EU) 2016/679).

[35] Directive (EU) 2019/790 on Copyright in the Digital Single Market, arts 3–4

[36] Artificial Intelligence Act (Regulation (EU) 2024/1689).

[37] Sandra Wachter and Brent Mittelstadt, ‘A Right to Reasonable Inferences: Re-thinking Data Protection Law in the Age of Big Data and AI’ (2019) 2 Columbia Business Law Review 494.

[38] Paul Voigt and Axel von dem Bussche, The EU General Data Protection Regulation (GDPR): A Practical Guide(Springer 2017).

[39] Directive (EU) 2019/790 on Copyright in the Digital Single Market (DSM Directive), arts 3–4 (Text and Data Mining exceptions).

[40]Mark Lemley, ‘Fair Learning’ (2021) 99 Texas Law Review 743.

[41] Digital Personal Data Protection Act, 2023.

[42] General Data Protection Regulation (Regulation (EU) 2016/679).

[43] Artificial Intelligence Act (Regulation (EU) 2024/1689).

[44] Organisation for Economic Co-operation and Development, OECD Principles on Artificial Intelligence (2019).

[45] Digital Personal Data Protection Act, 2023, No. 22 of 2023, INDIA CODE (2023)

[46] General Data Protection Regulation, Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016, 2016 O.J. (L 119) 1.

[47] Artificial Intelligence Act, Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024, 2024 O.J.

[48] Graham Greenleaf, ASIAN DATA PRIVACY LAWS (Oxford Univ. Press 2014)

.

Leave a Comment

Your email address will not be published. Required fields are marked *