To help you get started, we've compiled a variety of datasets and APIs from which to gain inspiration. Many of these datasets have already been cleaned and normalized, so they are ready to be explored using AI tools. The use of these datasets is often intended for research purposes only. If you want to use the data in your startup, be sure to read any associated license agreements to understand if there are commercial restrictions. Also note that you are not restricted to basing your idea on the data sets below. You may discover other open source data sets that inspire your creativity or you may bring your own proprietary data sets if you wish.
And if there’s a data set you think we should add to the list, please send it to us.
Public media data available and accessible on the internet can be broadly categorized into several types or categories. These categories may overlap or vary based on the specific sources and platforms that provide the data. Here are some common categories of public media data:
1. Social Media Data: Data from various social media platforms, including posts, comments, likes, shares, followers, and user profiles. This data can provide insights into trends, user behavior, and audience engagement.
2. Web Analytics Data: Data related to website traffic, user behavior, and audience demographics. Web analytics data can help understand website performance and user interactions.
3. News and Articles: Textual data from news articles, blogs, and online publications covering a wide range of topics. This category can include both current and historical media content.
4. Broadcast and Streaming Data: Data related to television and radio broadcasts, streaming services, and video-sharing platforms. This category may include TV ratings, program schedules, and streaming metrics.
5. Advertising Data: Data related to advertising spending, ad impressions, click-through rates, and ad performance metrics. This data can help understand advertising trends and effectiveness.
6. Market Research Data: Data from market research reports and surveys related to media consumption, consumer preferences, and industry trends.
7. Public Opinion and Polling Data: Data from public opinion polls and surveys on media-related topics, including media trust, political coverage, and social issues.
8. Government Data: Data published by government agencies related to media, communication, broadcasting, and regulatory information.
9. Academic Research Data: Data from academic studies and research papers related to media studies, journalism, media effects, and media content analysis.
10. Historical Archives: Digital archives of historical media content, including newspapers, magazines, photographs, and audio/video recordings.
11. Social Sentiment Analysis: Data that analyzes and quantifies public sentiment and opinions expressed on social media and online platforms.
12. Media Ratings and Rankings: Data related to media performance, audience ratings, and rankings for TV shows, movies, music, and other media content.
13. Metadata and API Data: Data accessible through APIs (Application Programming Interfaces) that provide structured information from various media platforms and databases.
14. Financial Data: Data related to media companies' financial performance, stock prices, mergers, and acquisitions.
15. Copyright and Licensing Data: Information about copyright licenses, usage rights, and permissions for media content.
16. Media Bias and Fact-Checking Data: Data related to media bias assessments and fact-checking of news articles and sources.
17. Public Records and Legal Data: Data from legal proceedings, court cases, and public records related to media and journalism.
18. Cultural and Entertainment Data: Data on cultural trends, entertainment events, celebrity news, and music charts.
19. Geospatial Media Data: Data that incorporates geographical information, such as media coverage by location or regional media consumption patterns.
20. User-Generated Content: Data generated by users on platforms like forums, review sites, and community discussion boards.
1. Statista: Description: Statista is a leading provider of market and consumer data. They offer a wide range of statistics and studies on various media-related topics, including advertising, social media, TV, radio, print media, and more. - Website: https://www.statista.com/
2. Pew Research Center: Description: Pew Research Center is a nonpartisan fact tank that conducts public opinion polling, demographic research, content analysis, and other data-driven social science research. They often publish reports and data related to media, journalism, and social media usage. - Website: https://www.pewresearch.org/
3. Data.gov: Description: Data.gov is the home of the U.S. government's open data. It provides access to a wide range of datasets, including media and communication-related data, from various federal agencies. - Website: https://www.data.gov/
4. European Data Portal: Description: The European Data Portal offers access to open data published by countries and organizations across Europe. It covers various media-related datasets, such as broadcasting, media usage, and more. - Website: https://www.europeandataportal.eu/
5. Kaggle: Description: Kaggle is a platform for data science and machine learning enthusiasts. It hosts various datasets, and you can find media-related data by searching for relevant keywords. - Website: https://www.kaggle.com/datasets
6. World Bank Open Data: Description: The World Bank provides free and open access to a comprehensive set of data about development in countries around the globe. You can find data on media usage, telecommunications, and more. - Website: https://data.worldbank.org/
7. United Nations Data: Description: The United Nations offers access to a wide range of data related to media, information, and communication, as well as other global indicators. - Website: https://data.un.org/
8. Internet Archive: Description: The Internet Archive is a digital library that provides access to archived versions of websites, audio, video, and other digital materials. It can be a valuable resource for historical media-related data. - Website: https://archive.org/
9. Media Cloud: Description: Media Cloud is an open-source platform that collects and analyzes news articles from around the web, providing insights into media coverage and trends on various topics. - Website: https://mediacloud.org/
10. Socialbakers: Description: Socialbakers is a social media analytics and publishing platform that provides data and insights on social media performance, content, and audience engagement for various platforms. - Website: https://www.socialbakers.com/
11. Nielsen Media Research: Description: Nielsen is a global measurement and data analytics company that offers audience measurement and insights for television, radio, and digital media. - Website: https://www.nielsen.com/us/en/solutions/measurement/
12. Google Trends: Description: Google Trends allows you to explore search trends and popularity of specific keywords over time, which can provide insights into media and user interests. - Website: https://trends.google.com/trends/
13. Ad Age Datacenter: Description: Ad Age Datacenter provides advertising and marketing data, including ad spending, media agency rankings, and industry reports. - Website: https://adage.com/datacenter
14. comScore: Description: comScore is a leading cross-platform measurement company that provides data on audience behavior, advertising effectiveness, and media consumption across various digital platforms. - Website: https://www.comscore.com/
15. Kantar Media: Description: Kantar Media offers media intelligence, monitoring, and analytics services, providing data on media consumption, advertising, and brand performance. - Website: https://www.kantarmedia.com/
16. Alexa Internet: Description: Alexa Internet, a subsidiary of Amazon, offers website traffic data and insights, which can be helpful in understanding online media usage. - Website: https://www.alexa.com/
17. Social Mention: Description: Social Mention is a real-time social media search and analysis tool that aggregates user-generated content across various platforms, helping to monitor media mentions and sentiment. - Website: https://socialmention.com/
18. Reddit API: Description: The Reddit API allows developers to access and retrieve data from Reddit, which can include discussions, posts, and comments related to media topics. - Website: https://www.reddit.com/dev/api/
19. Twitter API: Description: The Twitter API provides access to public tweets and user data, which can be useful for analyzing media trends and sentiments on the platform. - Website: https://developer.twitter.com/en/docs/twitter-api
20. YouTube Data API: Description: The YouTube Data API enables developers to retrieve information about YouTube videos, channels, and user activities, providing valuable media-related data from the platform. - Website: https://developers.google.com/youtube/v3
21. Facebook Graph API: Description: The Facebook Graph API allows developers to access public data from Facebook, including posts, comments, and user information, which can provide insights into media trends and interactions on the platform. - Website: https://developers.facebook.com/docs/graph-api
22. Instagram Graph API: Description: The Instagram Graph API provides access to public data from Instagram, such as posts, comments, and user profiles, enabling analysis of media content and engagement on the platform. - Website: https://developers.facebook.com/docs/instagram-api
23. SimilarWeb: Description: SimilarWeb is a digital market intelligence platform that offers data on website traffic, user behavior, and app performance, providing insights into online media consumption. - Website: https://www.similarweb.com/
24. GDELT Project: Description: The Global Database of Events, Language, and Tone (GDELT) is an open-data initiative that monitors global media and events, providing a comprehensive database of news articles and events worldwide. - Website: https://www.gdeltproject.org/
25. Global Web Index: Description: Global Web Index is a market research company that provides data on digital consumer behavior, including media consumption, social media usage, and online trends. - Website: https://www.globalwebindex.com/
26. OpenSecrets: Description: OpenSecrets is a website that tracks money in U.S. politics, including campaign contributions and lobbying data, which can be relevant for analyzing media and political connections. - Website: https://www.opensecrets.org/
27. Media Bias/Fact Check: Description: Media Bias/Fact Check is a website that evaluates the bias and reliability of media sources, offering assessments of various news outlets' credibility. - Website: https://mediabiasfactcheck.com/
28. Freedom of the Press Index by Freedom House: Description: Freedom House publishes an annual Freedom of the Press Index, which assesses the state of media freedom around the world, providing valuable data on press independence and censorship. - Website: https://freedomhouse.org/report-types/freedom-press
29. Reuters Institute Digital News Report: Description: The Digital News Report by the Reuters Institute for the Study of Journalism offers data and insights into news consumption, digital media, and changing media trends globally. - Website: https://reutersinstitute.politics.ox.ac.uk/digital-news-report
30. YouTube Trends Dashboard: Description: The YouTube Trends Dashboard highlights trending and popular videos on YouTube, providing data and insights into the platform's most-watched media content. - Website: https://trends.google.com/trends/youtube
31. Vimeo API: Description: The Vimeo API enables developers to access data from the Vimeo video platform, including video information, user data, and interactions, which can be valuable for media analytics. - Website: https://developer.vimeo.com/api
32. Crunchbase: Description: Crunchbase is a platform that provides data on companies, investors, and startup activity, including media-related ventures and investments. - Website: https://www.crunchbase.com/
33. MediaCloud API: Description: MediaCloud API offers access to the Media Cloud platform, allowing developers to analyze and explore media coverage on various topics and issues. - Website: https://mediacloud.org/tools/
34. OpenDataSoft: Description: OpenDataSoft is a data visualization and analysis platform that aggregates and shares public datasets, including media-related data from various sources. - Website: https://www.opendatasoft.com/
35. Radian6 (Now part of Salesforce): Description: Radian6, now part of Salesforce Marketing Cloud, offers social media monitoring and analytics tools, providing insights into media mentions and sentiment. - Website: (Radian6 has been integrated into Salesforce Marketing Cloud)
36. TVEyes: Description: TVEyes is a media monitoring platform that provides TV and radio broadcast data and analytics, enabling tracking and analysis of media coverage. - Website: https://www.tveyes.com/
37. NewsWhip: Description: NewsWhip offers tools for media monitoring, content discovery, and trend analysis, providing data on news and social media engagement. - Website: https://www.newswhip.com/
38. Shareablee: Description: Shareablee is a social media analytics platform that focuses on social media performance, content optimization, and audience engagement for media and brands. - Website: https://www.shareablee.com/
39. LexisNexis Newsdesk: Description: LexisNexis Newsdesk is a media monitoring and analysis tool that offers access to a vast collection of news articles and publications, including media coverage. - Website: https://www.lexisnexis.com/en-us/products/newsdesk.page
40. ProQuest: Description: ProQuest is a digital library of academic and professional content, including news articles, magazines, and other media publications, relevant for research and analysis. - Website: https://www.proquest.com/
41. Factiva: Description: Factiva, by Dow Jones, is a business news and information platform that provides access to a vast collection of global news and media sources, including newspapers, magazines, and newswires. - Website: https://professional.dowjones.com/factiva/
42. LexisNexis Academic: Description: LexisNexis Academic offers access to a wide range of news and media sources, legal documents, and business information, making it a valuable resource for media research and analysis. - Website: https://www.lexisnexis.com/en-us/products/lexisnexis-academic.page
43. SnapStream: Description: SnapStream is a TV monitoring and clipping service that allows users to record, search, and analyze television broadcasts, providing media data from a wide range of channels and programs. - Website: https://www.snapstream.com/
44. Critical Mention: Description: Critical Mention is a media monitoring and analytics platform that offers real-time broadcast data and TV clip analysis, valuable for tracking media mentions and coverage. - Website: https://www.criticalmention.com/
45. Nexis Uni: Description: Nexis Uni, by LexisNexis, provides access to a comprehensive collection of news articles, legal documents, and business information, including media-related content. - Website: https://www.lexisnexis.com/en-us/gateway.page
46. FactSet: Description: FactSet is a financial data and analytics platform that offers market data and intelligence, including media and entertainment industry financials and analytics. - Website: https://www.factset.com/
47. Bloomberg Terminal: Description: Bloomberg Terminal is a professional financial information and trading platform that offers real-time data and news, including media industry-related data and financials. - Website: https://www.bloomberg.com/professional/product/terminal/
48. Refinitiv Eikon: Description: Refinitiv Eikon is a financial information and data analytics platform that provides news, data, and analysis on media companies and related industries. - Website: https://www.refinitiv.com/en/products/eikon-trading-software
49. Thinknum Media: Description: Thinknum Media is a platform that provides financial and alternative data, including media-related metrics and insights, useful for investors and researchers. - Website: https://media.thinknum.com/
50. Media Research Center: Description: The Media Research Center is a conservative media analysis organization that examines media bias and content, offering a different perspective on media-related data and analysis. - Website: https://www.mrc.org/
1. Twitter API: - Data Offered: The Twitter API provides access to tweets, user profiles, timelines, and other social media interactions on the Twitter platform. - Use Cases: Developers can use this API to analyze trends, sentiment, and engagement on Twitter, and create applications that interact with Twitter data. - Website: https://developer.twitter.com/en/docs/twitter-api
2. YouTube Data API: - Data Offered: The YouTube Data API offers access to information about YouTube videos, channels, playlists, and user activities. - Use Cases: Developers can use this API to integrate YouTube data into applications, analyze video trends, and retrieve video metadata. - Website: https://developers.google.com/youtube/v3
3. Instagram Graph API: - Data Offered: The Instagram Graph API provides access to public media content, including posts, comments, and user profiles. - Use Cases: Developers can use this API to display Instagram content on websites, analyze user interactions, and track social media trends. - Website: https://developers.facebook.com/docs/instagram-api
4. Facebook Graph API: - Data Offered: The Facebook Graph API allows access to public posts, comments, likes, and user information on Facebook. - Use Cases: Developers can use this API to create Facebook integrations, analyze post engagement, and retrieve user data. - Website: https://developers.facebook.com/docs/graph-api
5. OMDb API (Open Movie Database): - Data Offered: The OMDb API offers information about movies and TV shows, including titles, release dates, ratings, and plot summaries. - Use Cases: Developers can use this API to build movie-related applications, retrieve film details, and create movie databases. - Website: https://www.omdbapi.com/
6. New York Times API: - Data Offered: The New York Times API provides access to articles, reviews, multimedia content, and metadata from The New York Times newspaper. - Use Cases: Developers can use this API to display news articles, analyze trending topics, and retrieve historical news data. - Website: https://developer.nytimes.com/
7. Spotify Web API: - Data Offered: The Spotify Web API allows access to music metadata, playlists, tracks, and user data on the Spotify music streaming platform. - Use Cases: Developers can use this API to integrate music content into applications, analyze music preferences, and create personalized playlists. - Website: https://developer.spotify.com/documentation/web-api/
8. TMDb API (The Movie Database): - Data Offered: The TMDb API offers information on movies and TV shows, including titles, posters, cast, crew, and user ratings. - Use Cases: Developers can use this API to build movie-related applications, create movie databases, and analyze user ratings. - Website: https://developers.themoviedb.org/3/getting-started/introduction
9. Google Trends API: - Data Offered: The Google Trends API provides access to search trends data, indicating the popularity of specific keywords over time and by location. - Use Cases: Developers can use this API to analyze search trends, identify rising topics, and understand user interests. - Website: https://developers.google.com/trends/
1. TV News Archive by Internet Archive: Description: The TV News Archive is a collection of televised news programs from various U.S. and international sources, providing a historical record of TV news content and allowing researchers to explore media coverage over time. - Website: https://archive.org/details/tv
2. GDELT's Television Explorer: Description: GDELT's Television Explorer is a tool that allows users to explore global television coverage, track media mentions, and visualize patterns in news broadcasting worldwide. - Website: https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/
3. Reddit Public API: Description: The Reddit Public API offers access to Reddit's vast collection of user-generated content, including posts, comments, and discussions from various subreddits. It provides valuable insights into online conversations and trending topics. - Website: https://www.reddit.com/dev/api/
4. Media Cloud by the Berkman Klein Center for Internet & Society: Description: Media Cloud is a research platform that collects, processes, and analyzes news articles and online media content, offering data on media coverage and public discourse around the world. - Website: https://mediacloud.org/
5. TVSmiles API: Description: TVSmiles is a platform that rewards users for engaging with TV advertisements. The API offers access to data on user interactions with TV ads and their preferences. - Website: https://developers.tvsmiles.com/
6. CrowdTangle API: Description: CrowdTangle is a social media analytics tool that provides access to data on trending content and audience interactions across social media platforms, offering insights into viral media content. - Website: https://www.crowdtangle.com/
7. Media Bias Chart by Ad Fontes Media: Description: Ad Fontes Media publishes a media bias chart that assesses the bias and reliability of various news sources. Their dataset contains detailed assessments of media outlets and articles. - Website: https://www.adfontesmedia.com/
8. NewsAPI.org: Description: NewsAPI.org offers access to a wide range of news articles and headlines from various sources, making it a valuable resource for media content analysis and research. - Website: https://newsapi.org/
9. Reddit-Tracker: Description: Reddit-Tracker is a data project that collects and analyzes Reddit submissions and comments in real-time, providing insights into trending topics and discussions on the platform. - Website: https://reddit-tracker.firebaseapp.com/
10. Global Database of Events, Language, and Tone (GDELT): Description: GDELT not only monitors news but also tracks television and radio broadcasts worldwide, providing a comprehensive database of media content from various sources. - Website: https://www.gdeltproject.org/
We are with our founders from day one, for the long run.