OpenAI wants to work with organizations to build new AI training datasets

is rolling out a new partnership program to collect datasets from third parties that it intends to use to train its AI models. The initiative, OpenAI Data Partnerships, will seek large-scale private and public information that it says is “not already easily accessible online to the public.” The company says the data it will collect doesn't necessarily have to be quantitative or in text formats — the program will also accept images, audio or video.

Notably, the company says it's on the lookout for data on “any topic” and in “any language” so long as it “expresses human intention,” which it likens to long-form essays or transcribed conversations. Human-centric data collected by OpenAI is expected to help the company improve tools like its automatic which is used to transcribe spoken words. This initiative also lines up with ChatGPT’s recent expansion to support to engage with users in a conversational manner. Exposing its AI models to more information that teaches it how to hold up will only further improve this feature and other tools that will follow in function.

Announcing OpenAI Data Partnerships — help steer the future of AI by collaborating on public and private datasets with us. https://t.co/4tbi5SZ6sS

— OpenAI (@OpenAI) November 9, 2023

The model testing conducted throughout the data partnership program will also naturally expand the capabilities of OpenAI’s consumer-facing , which has been updated to provide users with more complex and meaningful responses. OpenAI says it has already started working with interested organizations, including authoritative bodies like the Icelandic government. Through curated datasets, OpenAI says its working to improve GPT-4’s ability to comprehend queries made in the Icelandic language.

If a private or public organization wants to participate in the program, a representative can submit on the company’s website and share information on the data type and size that they intend to share. There are two pathways for datasets. The first is the Open-Source archive, which is ideal for datasets relevant to training language models. However, submissions made to it will be public for anyone to use. Alternatively, OpenAI says a company can submit information through its private dataset pathway which will be funneled to train proprietary AI models, which the company says includes their “foundation models” and “fine-tuned and custom models.” This is recommended for companies or institutions that want to keep their data confidential. But in that same regard, OpenAI says it is not looking for datasets that contain sensitive or personal information.

ChatGPT has already set records for its soaring user base. It has about around the world, meaning will only continue to be a focal point for the tool. Previously, Samsung employees were put in the hot seat for to the AI model. While it does not use data generated by its API to train its models unless a user explicitly submits information through an opt-in form, all eyes will be on how the company handles the data collected through this initiative, especially the private datasets.

$144.99

Learn More

OpenAI wants to work with organizations to build new AI training datasets

Cooler Master MasterBox Q300L Micro-ATX Tower with Magnetic Design Dust Filter, Transparent Acrylic Side Panel, Adjustable I/O & Fully Ventilated Airflow, Black (MCB-Q300L-KANN-S00)

ASUS TUF Gaming GT301 ZAKU II Edition ATX mid-Tower Compact case with Tempered Glass Side Panel, Honeycomb Front Panel, 120mm Aura Addressable RGB Fan, Headphone Hanger,360mm Radiator, Gundam Edition

ASUS TUF Gaming GT501 Mid-Tower Computer Case for up to EATX Motherboards with USB 3.0 Front Panel Cases GT501/GRY/WITH Handle

be quiet! Pure Base 500DX ATX Mid Tower PC case | ARGB | 3 Pre-Installed Pure Wings 2 Fans | Tempered Glass Window | Black | BGW37

ASUS ROG Strix Helios GX601 White Edition RGB Mid-Tower Computer Case for ATX/EATX Motherboards with tempered glass, aluminum frame, GPU braces, 420mm radiator support and Aura Sync

Corsair 5000D Airflow Tempered Glass Mid-Tower ATX PC Case – Black

CORSAIR 7000D AIRFLOW Full-Tower ATX PC Case, Black

Bgears b-Voguish Gaming PC Case with Tempered Glass panels, USB3.0, Support E-ATX, ATX, mATX, ITX. (Fans are sold separately)

Phanteks (PH-EC360ATG_DWT01) Eclipse P360A Ultra-fine Performance Mesh, Mid-Tower case, Tempered Glass, Digital-RGB Lighting, White

CORSAIR iCUE 4000X RGB Tempered Glass Mid-Tower ATX PC Case – 3X SP120 RGB Elite Fans – iCUE Lighting Node CORE Controller – High Airflow – White

OZARK PUDDING – OLD FASHIONED RECIPE

Authentic German Schnitzel Recipe

My favorite wellness resources list

10 Minute Southwest Chicken Soup

Leave a reply Cancel reply

Compare items

Shopping cart