Công Nghệ Thông Tin - Information Technology
  • Trang Chủ
  • Tin Tức
  • Thủ Thuật Máy Tính
  • OS
    • Linux
    • Windows 11
    • Windows 10
  • Website
    • WordPress
  • Network
  • Liên Hệ
Reading: Generative AI training data sets are now trackable – and often legally complicated
Share
[language-switcher]
Công Nghệ Thông Tin - Information TechnologyCông Nghệ Thông Tin - Information Technology
Font ResizerAa
Search
  • Home
    • Home 1
    • Home 2
    • Home 3
    • Home 4
    • Home 5
  • Demos
  • Categories
  • Bookmarks
  • More Foxiz
    • Sitemap
Have an existing account? Sign In
Follow US
Công Nghệ Thông Tin - Information Technology > Blog > Tin Tức > Generative AI training data sets are now trackable – and often legally complicated
Tin Tức

Generative AI training data sets are now trackable – and often legally complicated

hoidabunko
Last updated: 2023/11/06 at 10:27 AM
hoidabunko Published November 6, 2023
Share
ai artificial intelligence law copyright legal
SHARE

A new online tool allows users to identify, track and learn about the legal status of training data sets for generative AI, and a quick glance shows that many may have licensing issues.

The tool, dubbed the Data Provenance Explorer, is the result of a joint effort between machine learning and legal experts from MIT, generative AI API provider Cohere, and 11 other organizations — Harvard Law School, Carnegie Mellon University and Apple are all among the contributors. The Data Provenance Explorer lets researchers, journalists and anyone else search through thousands of AI training databases and trace the “lineage” of widely used data sets.

The idea is to provide a way to explore the sometimes murky world of training data used to develop generative AI. In an official statement announcing the Data Provenance Explorer, the team behind it described a “data transparency crisis” that could complicate the development and commercial use of generative AI systems.

Crowdsourced data sets lack licenses

“Crowdsourced aggregators like GitHub, Papers with Code, and many of the open source LLMs [large language models] trained from data on these aggregators, have an extremely high proportion of missing data licenses … ranging from 72% to 83%,” the group said. “In addition, the licenses that are assigned by crowdsourced aggregators frequently allow broader use than the original intent expressed by the authors of a data set.”

The need for responsibly developed AI is something that the industry appears to be well aware of, according to Kathy Lange, a research director for IDC. The headlong rush to deploy generative AI has created a public focus on the safe and legal use of data, she said.

“Understanding the provenance of the data; how it was collected, processed, and transformed can impact the trust in AI model results,” Lange said. “AI vendors prioritizing data provenance will have a leg-up in the market for customers requiring transparency, accountability, and compliance initiatives.”

AI data has become nothing less than a battleground, in certain respects. Lange highlighted the recent introduction of the Nightshade tool, which subtly changes digital art in such a way as to confuse AI creators attempting to use copyrighted works for training data. Moreover, authors and other copyright holders have begun to take legal action against the use of their works in generative AI training – comedian and author Sarah Silverman is among those suing OpenAI for this reason.  However, the legal landscape for those claims remains murky in many respects.

hoidabunko November 6, 2023 November 6, 2023
Share This Article
Facebook Twitter Email Print
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Cry0
Embarrass0
Joy0
Shy0
Surprise0
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Facebook Like
Twitter Follow
Pinterest Pin
Youtube Subscribe

LATEST NEWS

cisco live event 1200x800 photo

Cisco brings generative AI to Webex and Cisco Security Cloud

hoidabunko hoidabunko November 7, 2023
Prepare for generative AI with experimentation and clear guidelines
“Automactic Send/Receive trong Outlook: Hướng dẫn kích hoạt tính năng và tối ưu hiệu suất”
Microsoft Copilot could fix a long-running Office problem
Google Maps gets more immersive live views, even from above
Công Nghệ Thông Tin - Information Technology
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?