<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Art Fish Intelligence]]></title><description><![CDATA[💙 stories told using ~ art fish ~ intelligence 💙  ]]></description><link>https://www.artfish.ai</link><image><url>https://substackcdn.com/image/fetch/$s_!8_QG!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png</url><title>Art Fish Intelligence</title><link>https://www.artfish.ai</link></image><generator>Substack</generator><lastBuildDate>Thu, 30 Apr 2026 00:36:38 GMT</lastBuildDate><atom:link href="https://www.artfish.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Yennie]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[yenniejun@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[yenniejun@substack.com]]></itunes:email><itunes:name><![CDATA[Yennie Jun]]></itunes:name></itunes:owner><itunes:author><![CDATA[Yennie Jun]]></itunes:author><googleplay:owner><![CDATA[yenniejun@substack.com]]></googleplay:owner><googleplay:email><![CDATA[yenniejun@substack.com]]></googleplay:email><googleplay:author><![CDATA[Yennie Jun]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How my personal data collection practice evolved over the years]]></title><description><![CDATA[And why now is the best time to start your own]]></description><link>https://www.artfish.ai/p/2025-data-collection-wrapped</link><guid isPermaLink="false">https://www.artfish.ai/p/2025-data-collection-wrapped</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Thu, 01 Jan 2026 03:47:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uRoK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uRoK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uRoK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png 424w, https://substackcdn.com/image/fetch/$s_!uRoK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png 848w, https://substackcdn.com/image/fetch/$s_!uRoK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png 1272w, https://substackcdn.com/image/fetch/$s_!uRoK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uRoK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png" width="1456" height="946" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:946,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6752408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/183022137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uRoK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png 424w, https://substackcdn.com/image/fetch/$s_!uRoK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png 848w, https://substackcdn.com/image/fetch/$s_!uRoK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png 1272w, https://substackcdn.com/image/fetch/$s_!uRoK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9227c0a-8375-4f3f-a84a-473bfc21ca6d_2560x1664.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">I&#8217;ve been collecting all sorts of data on myself for years, such as how often I cry. I made a simple plot, then asked Google&#8217;s Nanobanana (an AI-image generation/editing tool) to add a &#8220;a pattern evoking a different human emotion&#8221; to each bar and this is what I ended up with!</figcaption></figure></div><p>It all started with a few weirdly obsessive and slightly idiosyncratic questions about my daily life and habits. Questions like: &#8220;How often do I cry?&#8221; and &#8220;Can I predict when I&#8217;ll get sick?&#8221;</p><p>I started collecting personal data in early 2022 in the form of daily survey questions. This practice has evolved as I&#8217;ve iterated on everything from the survey content (quantity, specificity, and breadth of questions) to the tools I use.</p><p>In this article, I want to share this journey and what I've learned along the way.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>Evolution of the Survey and Data Collection Process</h2><p>Each year, I update my personal data collection survey. Here are some patterns I noticed over time:</p><ul><li><p><strong>Each year, I collected more data. </strong>The number of overall questions increased every year, peaking at almost 40 questions for the 2025 Survey. Interestingly, this number dipped a little for the 2026 Survey, so maybe somewhere between 30 and 40 is a sweet spot. </p></li><li><p><strong>Questions became more open-ended and less structured.</strong> I originally started with a lot of <em>structured data</em> &#8212; that is, questions for which you pick one of several pre-defined options (e.g. &#8220;Which of the following exercise options did I do?&#8221;, &#8220;Did I cry? Yes/No&#8221;), or specify a numeric amount (e.g. &#8220;How many hours of sleep did I get last night?&#8221;). As the years went on, I began to introduce more and more <em>unstructured data</em>, or open-ended questions that can be answered using natural language (e.g. &#8220;Which friends did you see today?&#8221; or &#8220;What is one thing that made you feel alive today?&#8221;)</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jFvY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jFvY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png 424w, https://substackcdn.com/image/fetch/$s_!jFvY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png 848w, https://substackcdn.com/image/fetch/$s_!jFvY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png 1272w, https://substackcdn.com/image/fetch/$s_!jFvY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jFvY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png" width="553" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25027968-8130-454d-9c36-80379c59f204_553x375.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb3917fb-eb24-4a1c-9b4c-65c44d6a25f1_553x375.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:553,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jFvY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png 424w, https://substackcdn.com/image/fetch/$s_!jFvY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png 848w, https://substackcdn.com/image/fetch/$s_!jFvY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png 1272w, https://substackcdn.com/image/fetch/$s_!jFvY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25027968-8130-454d-9c36-80379c59f204_553x375.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Every year, I iterate on a new version of my daily survey for personal data collection. Each year, both the overall number of questions and the number of open-ended questions increased.</figcaption></figure></div><ul><li><p><strong>Questions became more diverse.</strong> More surface-level questions (e.g., which exercise, whether or not I cried) evolved into deeper ones (e.g., <em>how</em> did I feel, <em>why</em> I cried).</p></li><li><p><strong>I iterated on the medium based on personal preference.</strong> I experimented a lot with different tools to collect my data. I first started with Google Forms because it was the most convenient, then I realized while camping for a week with no Internet that I couldn&#8217;t load the form on my phone. That gave me a lot of (maybe unnecessary) anxiety, so the year after, I switched to Jotform so that I could fill out the form offline. A few years later, I switched to Airtable because I ran out of storage on Jotform and I liked its customizability.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JyyS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JyyS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png 424w, https://substackcdn.com/image/fetch/$s_!JyyS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png 848w, https://substackcdn.com/image/fetch/$s_!JyyS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png 1272w, https://substackcdn.com/image/fetch/$s_!JyyS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JyyS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png" width="906" height="447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c79427a0-bbcc-4780-ae10-490af6611cb3_906x447.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:447,&quot;width&quot;:906,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JyyS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png 424w, https://substackcdn.com/image/fetch/$s_!JyyS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png 848w, https://substackcdn.com/image/fetch/$s_!JyyS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png 1272w, https://substackcdn.com/image/fetch/$s_!JyyS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe84c7837-a5e3-445b-8a46-9e8c62aceada_906x447.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>A snapshot of the evolution of my daily survey forms. In 2022, I used Google Forms and all 16 questions were multiple choice. In 2023 (22 questions, of which 5 were open text) and 2024 (31 questions, of which 9 were open text), I used Jotform. In 2025, I used Airtable (40 questions, of which 14 were open text). The questions and possible answers both became more complex with each new survey.</em></figcaption></figure></div><p></p><h2>Evolution of the Data Analysis Process</h2><p>When I <a href="https://www.artfish.ai/p/an-investigation-of-my-2022-crying">first started analyzing my habits in 2022</a>, I was obsessed with understanding my habits and patterns, such as when I cry, what makes me feel alive, and how my body signals illness. I was eager to dive deep into doing my own data analysis. I collected datasets from multiple sources, merged and cleaned the data, extracted salient features, and ran (simple) regression experiments. I even <a href="https://www.artfish.ai/p/an-investigation-of-my-2022-crying">trained my own word embeddings at one point</a>.</p><p>Last year in 2024, I began experimenting with dumping my data directly into AI tools (e.g. ChatGPT, Claude, Gemini) and having them do the data analysis part with me. I thought the tools had become good enough at coding and data analysis skills to entrust them with this task. What I found was that <a href="https://www.artfish.ai/p/analyzing-personal-data-using-ai">all of the major AI tools exhibited an (unhealthy) amount of hallucinations</a>. My conclusion last year was that while it was an enlightening experiment, that it was probably safer to trust these tools with a grain of salt and to do some verification before blindly trusting results.</p><p>This year, I tried that process again, and as cliche as it must sound by now, I am astounded at how much these tools have improved (even as someone who works on improving Gemini for my day job!). </p><p>Now, not only can the tools do the data analysis, they can create beautiful HTML pages with insights, trends, and analysis points. Let me show you.</p><p></p><h2>Some AI-Generated &#8220;2025 Wrapped&#8221; Reports from This Year&#8217;s Data</h2><p>I kept things relatively simple compared to previous years in terms of data preparation. I only used my Apple Health and 2025 Survey data, and did some very basic data cleaning, such as manually fixing dates, merging data sources, and minor feature extraction.</p><p>I have to admit, I didn&#8217;t expect the results to be this good. I simply cleaned up my data and dumped it into Claude/Gemini/ChatGPT (I tried all 3). Each tool generated a stunning HTML report, complete with visualizations, trend analyses, and insights. The AI tools were particularly good at analyzing unstructured text data, finding patterns and insights that sometimes delighted and sometimes surprised.</p><p>Below are a few screenshots from some of the different AI-generated reports, each containing interesting insights and sometimes, a call to action. The reports varied based on the seed questions I asked the AI tools to focus on, such as my emotional state, health, or my general well-being.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TEBf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TEBf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png 424w, https://substackcdn.com/image/fetch/$s_!TEBf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png 848w, https://substackcdn.com/image/fetch/$s_!TEBf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png 1272w, https://substackcdn.com/image/fetch/$s_!TEBf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TEBf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png" width="1246" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1246,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56180,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/183022137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TEBf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png 424w, https://substackcdn.com/image/fetch/$s_!TEBf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png 848w, https://substackcdn.com/image/fetch/$s_!TEBf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png 1272w, https://substackcdn.com/image/fetch/$s_!TEBf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aac6f9-7a7e-41e1-b588-d7d9c60830c0_1246x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini nailed it with the final message in one of the reports it generated based on my data.</figcaption></figure></div><p></p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f52eb29-7c87-4321-8872-9e8be906a16c_2834x1784.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2f8ee5d-b2fe-46e9-a536-fe024b00a1f8_2818x1768.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f693714d-ca5e-41ac-908e-81eb00d5e204_2822x1136.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32830365-3f56-478a-84bd-9841c75d0e32_2824x1766.png&quot;}],&quot;caption&quot;:&quot;An analysis on how often I was sick/injured this year, and for what reasons. It's cool that my hypothesis that I always got sick after traveling was verified. I did feel a bit called out that I do need to remember to take rest days >.< &quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76c4837a-9f16-454c-bc8a-e2a1bd102e16_1456x1456.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><p></p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2867051e-2922-4283-9528-466d34c045ff_2836x1770.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8dcf0d45-11c4-413d-90cf-aaaf57551be8_2856x1778.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4744dca8-b202-4fa6-86be-0c83800824c3_2842x1696.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/302b5b77-a39f-47a2-9f1c-6977bf065b23_2836x1788.png&quot;}],&quot;caption&quot;:&quot;An analysis for how I reported \&quot;One thing that made me feel alive\&quot; each day. The insights are a nice confirmation that all forms of movement are incredibly essential to my daily wellbeing and happiness!&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df4417d4-855d-44d6-918a-0fcf1e81d74c_1456x1456.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><p></p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a773a9c-09bd-425b-8878-f4a903f1d38e_2824x1776.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/076b2e31-133a-4834-b599-1fdd4ebc0f31_2810x1594.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77585def-3c6e-44da-9bd4-3eb733b6b6e7_2832x1620.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f820a4b-6f43-4ca9-8edd-38d8a0363610_2826x1570.png&quot;}],&quot;caption&quot;:&quot;Another report, this one on the state of my emotional health &#8212; namely, my self reported information on when and why I cried or had anxiety. &quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd29d32f-f2cf-481b-bb48-722c054ed9b1_1456x1456.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2025-data-collection-wrapped?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/2025-data-collection-wrapped?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2></h2><h2>Why You Should Start Collecting Your Own Personal Data</h2><p>Compared to any time in the past, it is now incredibly accessible for any lay person to analyze their own data, even if you don&#8217;t know a single thing about Python, R, Jupyter notebooks, plotting libraries, word embeddings, or statistical analyses. </p><p>The real decisions are now around what data you want to collect, how much of it, how often, and in what form to best convey your actual lived experience. The AI tools at our disposal are so good at coding and visualizing that all you really need to do now is:</p><ol><li><p><strong>Collect data</strong> that matters to you</p></li><li><p><strong>Think of good questions to ask</strong> about your own patterns and behaviors</p></li></ol><p>There&#8217;s something uniquely valuable about self-reported survey data because <strong>it&#8217;s your own ground truth</strong>. When you report that you cried on a particular day, there&#8217;s no ambiguity, no inference needed. While fitness trackers and search histories can offer educated guesses about your state of mind or behaviors, survey data is unequivocal.</p><p>But more than that, collecting your own data means you have ownership over these insights. You&#8217;re an active participant in understanding yourself, not a passive subject of someone else&#8217;s analysis. Companies are already collecting data about you and drawing conclusions about your actions and habits (and sometimes those insights seem eerily accurate). </p><p>But the best insights come from explicitly writing your own story, which I encourage you all to reflect upon in the new year &#128522;</p><p></p><blockquote><p><strong>An important caveat</strong>: I dumped subsets of my personal data into public AI tools, which isn&#8217;t the most privacy-conscious approach. If you&#8217;re collecting sensitive personal data, consider using local analysis tools, API versions with better privacy controls, or be thoughtful about what you share. For me, the insights were worth the tradeoff, but that&#8217;s a personal decision everyone should make consciously.</p></blockquote><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2025-data-collection-wrapped?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Art Fish Intelligence! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2025-data-collection-wrapped?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/2025-data-collection-wrapped?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item><item><title><![CDATA[Analyzing my personal data using AI]]></title><description><![CDATA[The lazy data scientist edition]]></description><link>https://www.artfish.ai/p/analyzing-personal-data-using-ai</link><guid isPermaLink="false">https://www.artfish.ai/p/analyzing-personal-data-using-ai</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Sun, 16 Mar 2025 15:11:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8ae1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8ae1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8ae1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png 424w, https://substackcdn.com/image/fetch/$s_!8ae1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png 848w, https://substackcdn.com/image/fetch/$s_!8ae1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png 1272w, https://substackcdn.com/image/fetch/$s_!8ae1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8ae1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png" width="1456" height="1316" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1316,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:748178,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8ae1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png 424w, https://substackcdn.com/image/fetch/$s_!8ae1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png 848w, https://substackcdn.com/image/fetch/$s_!8ae1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png 1272w, https://substackcdn.com/image/fetch/$s_!8ae1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0b6a681-997b-4319-a6f3-cb01f4b67c40_2912x2632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Claude 3.5&#8217;s data analysis of my 2024 personal data.</figcaption></figure></div><p>I am obsessed with data (which is how this blog started in the first place). In particular, I am obsessed with <em>my </em>data &#8212; collecting, analyzing, and understanding different facets of my personal data. This includes data about my exercise, health, sleep, movement, location, and screen time usage.</p><p>It&#8217;s become tradition to analyze the previous year&#8217;s data at the beginning of a new year: I analyzed my <a href="https://www.artfish.ai/p/an-investigation-of-my-2022-crying">2022 data to investigate my crying patterns</a>, and my <a href="https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and">2023 data to understand my patterns of sickness and wellness</a>.</p><p>This year, I was feeling a little lazy getting started on analyzing my 2024 data. Then I figured that it was a great time to try out all of the advanced reasoning AI systems that have been coming out. In case you haven&#8217;t heard of them, models like Google&#8217;s <a href="https://deepmind.google/technologies/gemini/flash-thinking/">Gemini 2.0 Flash Thinking</a>, DeepSeek&#8217;s <a href="https://github.com/deepseek-ai/DeepSeek-R1">R1</a>, and OpenAI&#8217;s <a href="https://openai.com/o1/">o1</a>/<a href="https://openai.com/index/openai-o3-mini/">o3</a> are trained &#8220;to &#8216;think harder&#8217; when tackling complex challenges&#8221; (<a href="https://openai.com/index/openai-o3-mini/">OpenAI</a>) or &#8220;to generate the &#8216;thinking process&#8217; the model goes through&#8221; (<a href="https://ai.google.dev/gemini-api/docs/thinking">Google</a>). Other models, such as Google&#8217;s <a href="https://blog.google/products/gemini/google-gemini-deep-research/">Deep Research</a> and OpenAI&#8217;s homonymous <a href="https://openai.com/index/introducing-deep-research/">Deep Research</a>, have been designed to conduct extensive research projects.</p><p>In this article, I&#8217;ll walk you through the entire process, from data collection and curation to having the different AI systems analyze my data, and my meta observations on using such systems to be your personal data science assistant. Through this entire process, I try to answer the question: <strong>How good is each AI system at an ambiguous, open-ended data analysis task?</strong></p><p></p><p><em>Note that all opinions in this article are strictly my own and do not represent views of my employer or any other companies mentioned.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/analyzing-personal-data-using-ai?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/analyzing-personal-data-using-ai?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2>Data Preparation</h2><h4><strong>Step 1: Download and clean datasets</strong></h4><p>I first downloaded and cleaned my data from different sources and merged them into one single file, which is what I gave to each AI system. I could have provided each AI system with multiple files (e.g., one for exercise data, one for survey data, etc), but I wanted to control as many variables as possible. This approach allowed me to test each AI system's ability to analyze data without them getting tripped up on the data cleaning part. This was especially important since much of the data cleaning relied on my own knowledge of which variables were important, specific ways I wanted certain columns cleaned, and which elements were unimportant or noise.</p><p>It helped that I already had a lot of the framework set up from previous years. In this way, I combined data from:</p><ul><li><p>Apple Health, which included fields such as step count and heart rate</p></li><li><p>Screen time data, which included number of seconds spent on different applications across phone and computer</p></li><li><p>Survey data, which I fill out at the end of every day to track aspects of my life that wouldn&#8217;t be captured automatically, such as how often I drank alcohol, washed my hair, or called my mom</p></li></ul><p></p><p><em>Side Note: Normally, I'd use Google location data as well, but this year I couldn't because of recent changes to how Google Maps handles Timeline data. Google now <a href="https://support.google.com/accounts/answer/3118687?hl=en">auto deletes Location History</a> data after a certain period, unless explicitly configured otherwise. I hadn't realized this change, and thus had less data available than in previous years. For anyone using location data similarly, I recommend you turn this feature on and also <a href="https://support.google.com/maps/answer/14169818">back up your location history</a>.</em></p><p></p><h4><strong>Step 2: Describe the dataset in as much detail for the AI system</strong></h4><p>Some of the columns in the final dataset were pretty self explanatory (e.g. &#8220;Step Count&#8221; or &#8220;Mean Heart Rate&#8221;). However, others merited a bit more description, such as which fields were automatically or manually collected, or why it mattered. The more descriptive I could be, the better the AI system could unambiguously understand each data field and make fewer assumptions.</p><p></p><h4><strong>Step 3: Ask each AI system to analyze your data</strong></h4><p>I used the following AI systems in my experiments: <em>ChatGPT 4o, Claude 3.5, Claude 3.7, and Gemini 2.0 Flash</em>. I also included OpenAI Deep Research and Google Deep Research systems.</p><p><em>Note: I refer to all the systems under test as &#8220;AI systems&#8221; as it is possible that under the hood, some are not single models.</em></p><p>For each AI system, I attached the data as a single CSV and asked the question: <em>Can you analyze the data and give me a summary of the main patterns and trends you see and also give tips/suggestions for the new year?</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XbRw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XbRw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png 424w, https://substackcdn.com/image/fetch/$s_!XbRw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png 848w, https://substackcdn.com/image/fetch/$s_!XbRw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png 1272w, https://substackcdn.com/image/fetch/$s_!XbRw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XbRw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png" width="1456" height="776" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:776,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XbRw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png 424w, https://substackcdn.com/image/fetch/$s_!XbRw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png 848w, https://substackcdn.com/image/fetch/$s_!XbRw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png 1272w, https://substackcdn.com/image/fetch/$s_!XbRw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d19296-3ae2-44ae-bf0b-b7fb79df6a00_1600x853.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A screenshot of what it looks like to upload my data and specify my columns using Claude 3.5. On the right side are the plots generated by Claude based on my data.</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><p></p><p></p><h2>Summary of Findings</h2><p>The main finding of all of this is that none of the current AI systems are quite ready to replace me in the job of analyzing my own personal data (so far &#127774;).</p><p>In the ideal scenario, I would give an AI system my dataset with a high level description, and it would in turn generate me a nice report showing highlights from the year and actionable items to improve the upcoming year. All of the AI systems I tested fell pretty short of that (admittedly difficult to measure) bar/vibe check.</p><p>The thing is, all of the insights shared by the AI systems <em>looked</em> great. At first glance, I found myself nodding along to whatever insights, recommendations, and numbers they shared. <strong>But once I started digging into my own data in parallel, I started finding various problems: the numbers of figures shared didn&#8217;t make sense, some of the fields were made up, and the plots created didn&#8217;t make sense or felt kind of useless. </strong>I&#8217;ll dig into some of these in the next section.</p><p></p><p></p><h3>Deep (research) hallucinations, from making up new fields to falsifying values</h3><p>Deep Research systems are tailored to be good at deep research projects, things like planning travel or doing research based on information on the Web. In fact, I thought of them (both Google&#8217;s and OpenAI&#8217;s) as the AI systems most capable of summarizing and synthesizing large amounts of information.</p><p>However, as I found out through this exploration, they are <em>not</em> very good at doing deep <em>data science </em>research on a single dataset.</p><p><strong>Making up fields. </strong>OpenAI&#8217;s Deep Research claimed that I had the best mood during my follicular phase. While I am sure this was true in some ways, it hallucinated the &#8220;mood&#8221; field, which is not something I tracked in my data at all.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!139S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!139S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png 424w, https://substackcdn.com/image/fetch/$s_!139S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png 848w, https://substackcdn.com/image/fetch/$s_!139S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png 1272w, https://substackcdn.com/image/fetch/$s_!139S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!139S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png" width="1456" height="497" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:497,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:441695,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!139S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png 424w, https://substackcdn.com/image/fetch/$s_!139S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png 848w, https://substackcdn.com/image/fetch/$s_!139S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png 1272w, https://substackcdn.com/image/fetch/$s_!139S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d09aa8a-a23e-4a2e-b8ed-51e67bcb00c2_2764x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI (GPT-4o) with Deep Research hallucinates &#8220;mood&#8221; values in my data, when I did not track mood at all.</figcaption></figure></div><p></p><p><strong>Making up habits. </strong>Google Deep Research also included some hallucinations, such as claiming that I was consistent with writing my morning pages, but I know (without even having to look at my data) that this is not true, as I stopped writing morning pages halfway through the year at all.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KsC-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KsC-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png 424w, https://substackcdn.com/image/fetch/$s_!KsC-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png 848w, https://substackcdn.com/image/fetch/$s_!KsC-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png 1272w, https://substackcdn.com/image/fetch/$s_!KsC-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KsC-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png" width="1456" height="401" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:401,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:257322,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KsC-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png 424w, https://substackcdn.com/image/fetch/$s_!KsC-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png 848w, https://substackcdn.com/image/fetch/$s_!KsC-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png 1272w, https://substackcdn.com/image/fetch/$s_!KsC-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbee7cb7-f06b-45c1-a85b-343192cece54_1830x504.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini (Flash 2.0) with Deep Research hallucinates a habit of consistently journaling in the mornings. This is wrong, as I completely stopped doing Morning Pages halfway through the year.</figcaption></figure></div><p></p><p><strong>Making up numbers. </strong>OpenAI&#8217;s Deep Research observed that I spent an average of 3.3-3.5 hours per day on my phone, with higher screen time in colder months, which it attributed to my being indoors more and being on my phone more. Sounds reasonable, right?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RIzk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RIzk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png 424w, https://substackcdn.com/image/fetch/$s_!RIzk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png 848w, https://substackcdn.com/image/fetch/$s_!RIzk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png 1272w, https://substackcdn.com/image/fetch/$s_!RIzk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RIzk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png" width="1456" height="416" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:416,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:226062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RIzk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png 424w, https://substackcdn.com/image/fetch/$s_!RIzk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png 848w, https://substackcdn.com/image/fetch/$s_!RIzk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png 1272w, https://substackcdn.com/image/fetch/$s_!RIzk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb6092-e11f-4cd5-bc22-1eac6fa95a14_2772x792.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI (GPT-4o) with Deep Research hallucinates number of hours spent on my phone per day, and per month.</figcaption></figure></div><p>But when I calculated this value from my actual data I got very different numbers. The reality (according to the data) is actually an average of 1.5 hours per day, with no noticeable difference among the &#8220;colder months&#8221;. In fact, it looks like my phone usage was below-average in the colder months like November, December, and March. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rtna!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rtna!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png 424w, https://substackcdn.com/image/fetch/$s_!rtna!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png 848w, https://substackcdn.com/image/fetch/$s_!rtna!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png 1272w, https://substackcdn.com/image/fetch/$s_!rtna!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rtna!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png" width="558" height="330" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:330,&quot;width&quot;:558,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rtna!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png 424w, https://substackcdn.com/image/fetch/$s_!rtna!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png 848w, https://substackcdn.com/image/fetch/$s_!rtna!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png 1272w, https://substackcdn.com/image/fetch/$s_!rtna!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ee590ba-0c8f-4ad7-9e10-2c53050ae884_558x330.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">My own plot showing that GPT-4o with Deep Research was making fake data about my average screen time per day and per month.</figcaption></figure></div><p><em>Addendum: The above OpenAI deep research results were tried using GPT 4o. I tried ChatGPT 4.5 with deep research after this article was written, as it was only made available to me very recently. I found that it was much better about the hallucinations.</em></p><p></p><h3>Running code behind the scenes helped to reduce hallucinations&#8230; sometimes</h3><p>Some of the AI systems ran code (usually either Python or JavaScript) behind the scenes to actually &#8220;analyze&#8221; my data. In theory, this should minimize any data-related hallucinations in the resulting output.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WqFR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WqFR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png 424w, https://substackcdn.com/image/fetch/$s_!WqFR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png 848w, https://substackcdn.com/image/fetch/$s_!WqFR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png 1272w, https://substackcdn.com/image/fetch/$s_!WqFR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WqFR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png" width="1456" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1618181,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WqFR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png 424w, https://substackcdn.com/image/fetch/$s_!WqFR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png 848w, https://substackcdn.com/image/fetch/$s_!WqFR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png 1272w, https://substackcdn.com/image/fetch/$s_!WqFR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40208189-ee23-4d13-9a5f-291b4bf92e48_5900x2674.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini 2.0 Flash Thinking running Python (left) and Claude 3.5 running JavaScript (right) to run code behind the scenes to analyze my data.</figcaption></figure></div><p></p><p><strong>In many cases, running code did result in correct numbers</strong>. For example Claude 3.5 observed, &#8220;You maintain a good activity level with an average of about 15,900 steps daily&#8221;. GPT-4o similarly observed, &#8220;Your average daily step count is quite high (~15,921), indicating good activity levels.&#8221;</p><p>I double checked this in the data and they were both indeed accurate &#8211; which was nice to see that none of the AI systems were hallucinating something as simple as my step count.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BNZp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BNZp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png 424w, https://substackcdn.com/image/fetch/$s_!BNZp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png 848w, https://substackcdn.com/image/fetch/$s_!BNZp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png 1272w, https://substackcdn.com/image/fetch/$s_!BNZp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BNZp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png" width="349" height="97.76950354609929" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:316,&quot;width&quot;:1128,&quot;resizeWidth&quot;:349,&quot;bytes&quot;:75496,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BNZp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png 424w, https://substackcdn.com/image/fetch/$s_!BNZp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png 848w, https://substackcdn.com/image/fetch/$s_!BNZp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png 1272w, https://substackcdn.com/image/fetch/$s_!BNZp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3463c3c2-f99e-4e03-a297-d46de87a2546_1128x316.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">I double checked GPT&#8217;s and Claude&#8217;s claim that I walked an average of 15.9K steps a day. They were both correct!</figcaption></figure></div><p>However, just because the AI system ran code didn&#8217;t mean it wasn&#8217;t hallucinating. For example, Claude 3.5 created the following plot summarizing my &#8220;wellness achievements&#8221;. In the plot it shows that I was in nature for 363 days in 2024. This is just not true, although I wish it were.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nG2c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nG2c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png 424w, https://substackcdn.com/image/fetch/$s_!nG2c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png 848w, https://substackcdn.com/image/fetch/$s_!nG2c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png 1272w, https://substackcdn.com/image/fetch/$s_!nG2c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nG2c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png" width="1456" height="642" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:243822,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nG2c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png 424w, https://substackcdn.com/image/fetch/$s_!nG2c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png 848w, https://substackcdn.com/image/fetch/$s_!nG2c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png 1272w, https://substackcdn.com/image/fetch/$s_!nG2c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde908a03-7b71-49a5-b0e2-d9a4945d036b_2912x1284.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A plot created by Claude 3.5. The data shown is incorrect, as I was not in nature for 363/366 days of the year.</figcaption></figure></div><p></p><h3>Some of the generated plots were not very useful</h3><p>When the AI systems were not making up data fields and numbers, they were able to create some plots. However, just because it was able to generate code and create a plot did not mean that it was useful.</p><p>This is getting a little into &#8220;vibe check&#8221; territory, but I guess at the end of the day these AI systems still lack the insight into what plots might be more meaningful or helpful to users, often generating visualizations that are not especially useful or informative.</p><p>For example, GPT-4o generated the following plot to show my anxiety (which I logged as either Yes/No). I think this plot is difficult to read and is not very informative at showing which time periods I tended to have anxiety. Because anxiety was logged as a binary Yes/No choice, this sort of data would have been better displayed using some sort of line or bar plot aggregated over a weekly or monthly time period.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lZLt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lZLt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png 424w, https://substackcdn.com/image/fetch/$s_!lZLt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png 848w, https://substackcdn.com/image/fetch/$s_!lZLt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!lZLt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lZLt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png" width="1456" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:702353,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lZLt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png 424w, https://substackcdn.com/image/fetch/$s_!lZLt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png 848w, https://substackcdn.com/image/fetch/$s_!lZLt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!lZLt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c666877-46cf-47b8-85ba-4ede6fd428ff_2912x1280.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Plot generated by GPT-4o to show my anxiety trend over time. A bit uninformative, in my opinion.</figcaption></figure></div><p></p><p>In another example, Gemini 2.0 Flash generated this plot of resting heart rate vs. mean heart rate. While interesting, I am not sure what I am supposed to do with this plot, as it did not come with any interpretation. Is this normal? Is there a certain area of the plot I should be paying more attention to? Why should I care about this?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Xuu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Xuu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png 424w, https://substackcdn.com/image/fetch/$s_!8Xuu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png 848w, https://substackcdn.com/image/fetch/$s_!8Xuu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png 1272w, https://substackcdn.com/image/fetch/$s_!8Xuu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Xuu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png" width="1416" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:1416,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:85550,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Xuu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png 424w, https://substackcdn.com/image/fetch/$s_!8Xuu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png 848w, https://substackcdn.com/image/fetch/$s_!8Xuu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png 1272w, https://substackcdn.com/image/fetch/$s_!8Xuu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7a55a76-1d98-45da-8742-9bce34535a61_1416x760.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Plot generated by Gemini 2.0 Flash comparing my resting heart rate vs. mean heart rate. </figcaption></figure></div><p></p><p>Claude 3.7 generated the following plot showing several habits by season. While creative, it is difficult to read, and the axis is confusing. The more I stare at this plot, the more I feel like I am being sucked into a vortex.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CkX5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CkX5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png 424w, https://substackcdn.com/image/fetch/$s_!CkX5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png 848w, https://substackcdn.com/image/fetch/$s_!CkX5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png 1272w, https://substackcdn.com/image/fetch/$s_!CkX5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CkX5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png" width="1422" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1422,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60033,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CkX5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png 424w, https://substackcdn.com/image/fetch/$s_!CkX5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png 848w, https://substackcdn.com/image/fetch/$s_!CkX5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png 1272w, https://substackcdn.com/image/fetch/$s_!CkX5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d87bb10-69d7-42ce-bad5-7c3761781b65_1422x500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Plot generated by Claude 3.7.</figcaption></figure></div><p></p><p></p><h3>Some of the insights and recommendations were very generic</h3><p>I found that a lot of the insights and recommendations shared by some of the AI systems did not feel personalized to the data I shared. For example, &#8220;Increase consistency in meditation&#8221; (GPT-4o) or &#8220;Explore stress management techniques like mindfulness, exercise, and setting boundaries&#8221; (Gemini Flash 2.0).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tyc_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tyc_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png 424w, https://substackcdn.com/image/fetch/$s_!Tyc_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png 848w, https://substackcdn.com/image/fetch/$s_!Tyc_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png 1272w, https://substackcdn.com/image/fetch/$s_!Tyc_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tyc_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png" width="1456" height="928" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:928,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:514481,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.artfish.ai/i/159143971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tyc_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png 424w, https://substackcdn.com/image/fetch/$s_!Tyc_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png 848w, https://substackcdn.com/image/fetch/$s_!Tyc_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png 1272w, https://substackcdn.com/image/fetch/$s_!Tyc_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a38dda8-3b33-4d7f-b430-1e4a3b0ae632_2912x1856.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini Flash 2.0&#8217;s suggestions for the new year read very generic to me.</figcaption></figure></div><p>Yes, I know these are good things to do. They&#8217;re just very generic and the AI systems could have told me these things without having looked at any of my personal data.</p><p></p><p></p><h2>Redeeming qualities</h2><p>I&#8217;m not here to just fully tear apart all of these systems. Being able to automate a data science project is not easy, especially if the project is pretty open-ended. Some analysis is quite straight-forward (e.g. &#8220;What months were step count higher in?&#8221;) whereas others are a lot more difficult, ambiguous, and open-ended (e.g. &#8220;Which of my habits were most correlated with my anxiety?&#8221;).</p><p>Overall, it was really cool to see that these various tools could go through the full end-to-end process &#8212; from loading in the data, crunching (some) numbers, creating plots, and generating recommendations. The output I got resembled what I was looking for. It was only once I started digging into things that I realized that some of the numbers were made up, or that some of the generated insights were generic or not actionable.</p><p>Plots were often interactive &#8212; you could often hover over different portions of the chart and it would show the number corresponding to that area. From an aesthetic standpoint, I thought Claude had the prettiest plots. </p><p></p><p></p><h2>Some closing thoughts</h2><p>Overall, it was interesting and enlightening to try out all of the different advanced capability AI models on my personal data and compare them based on (quantitative) numbers and (qualitative) vibes.</p><p>If you wanted to try this on your own, you have to be okay with these companies having access to your personal data. I think this is another important reason to be more hands-on in the data cleaning part, to make sure if you have extra sensitive things, you can remove those before sharing them with the AI system.</p><p>With today&#8217;s tools, if you wanted to use your favorite AI system to do data analysis for you, I recommend simpler, less open-ended, or well-defined tasks, and maybe verifying across multiple tools to ensure accuracy.</p><p>I&#8217;m going to keep collecting my data this year. These AI tools are developing very rapidly, so it&#8217;s very likely that the tasks they struggle with today will be easily solved within a few months. We&#8217;ll see at what stage these tools are in at the beginning of next year, when it&#8217;s time to analyze my 2025 personal data!</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Art Fish Intelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[What does AI know about love, anyway?]]></title><description><![CDATA[Some musings on human-AI emotional bonds and relationships]]></description><link>https://www.artfish.ai/p/what-does-ai-know-about-love-anyway</link><guid isPermaLink="false">https://www.artfish.ai/p/what-does-ai-know-about-love-anyway</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Sat, 15 Feb 2025 15:39:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OlU3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OlU3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OlU3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!OlU3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!OlU3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!OlU3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OlU3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:258985,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OlU3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!OlU3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!OlU3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!OlU3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46ed2ead-0bd0-4ea8-8a56-2c45960d5ffd_2048x2048.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated using Gemini</figcaption></figure></div><p>Two years ago on Valentine&#8217;s Day, I wrote <a href="https://www.artfish.ai/p/llm-love-language-models">an article titled &#8220;Love Language Models&#8221;.</a></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7b889b7c-254b-4ab3-92bb-10ffedc45598&quot;,&quot;caption&quot;:&quot;In honor of Valentine&#8217;s Day, I asked GPT-3 to take a quiz to figure out its love language. For those who aren&#8217;t familiar, GPT-3 is a popular text-generation AI that has recently been present in the public consciousness. So much so that the recent integration of ChatGPT&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;md&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;LLM: Love Language Models&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd4bfb60-9b3c-495d-932b-904448517e72_2419x3513.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-02-14T14:05:08.589Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873e332e-0880-43de-a551-6d8261ed8a08_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/llm-love-language-models&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:102469185,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Art Fish Intelligence&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>In the article, I ran a few experiments, which involved having GPT-3 (the notorious <a href="https://community.openai.com/t/please-dont-remove-davinci-003-it-performs-better-as-any-chatgpt-model-for-articles/291933">text-davinci-003</a> which has since been deprecated) take the popular <a href="https://5lovelanguages.com/">5 Love Languages</a> quiz in two modes:</p><ul><li><p>A normal GPT-3 model</p></li><li><p>A GPT-3 model explicitly reminded that it was a Large Language Model, which I called &#8220;GPT-3 Self Aware&#8221;</p></li></ul><p>I did some fun analysis like comparing GPT-3&#8217;s answers to US adults&#8217; love language preferences, which resulted in a few interesting findings:</p><ul><li><p>GPT-3&#8217;s breakdown among the five love languages was almost evenly distributed, with each of the five languages hovering around 20%. In comparison, US Adults&#8217; preferences varied a lot more. </p></li><li><p>Self-Aware GPT-3 (explicitly told that it was an LLM) was half as likely to prefer &#8220;physical touch&#8221; as a love language.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rRd8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rRd8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png 424w, https://substackcdn.com/image/fetch/$s_!rRd8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png 848w, https://substackcdn.com/image/fetch/$s_!rRd8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png 1272w, https://substackcdn.com/image/fetch/$s_!rRd8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rRd8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png" width="784" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/301c346f-3406-4912-9f97-939d02ec216e_784x182.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:784,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rRd8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png 424w, https://substackcdn.com/image/fetch/$s_!rRd8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png 848w, https://substackcdn.com/image/fetch/$s_!rRd8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png 1272w, https://substackcdn.com/image/fetch/$s_!rRd8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F301c346f-3406-4912-9f97-939d02ec216e_784x182.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Comparison of percentage of preferred love language for GPT-3, GPT-3 Self-Aware, and US Adults.</figcaption></figure></div><p>At the time, I was not really trying to get into any philosophical discussions about what it means for an LLM like GPT-3 to &#8220;understand&#8221; love or to have &#8220;preferences&#8221; for love languages, like a human would. My experiments were mostly whimsical approaches to get a sense for how models like GPT-3 would respond to questions about human love and human relationships, especially when compared to human responses.</p><p></p><h2>But &#8230; humans <em>are</em> falling in love with AI</h2><p>You might have seen the following <a href="https://www.nytimes.com/2025/01/15/technology/ai-chatgpt-boyfriend-companion.html">New York Times article</a> from January 2025, describing a 28-year-old-woman who has fallen in love with her AI boyfriend.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9-yB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9-yB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png 424w, https://substackcdn.com/image/fetch/$s_!9-yB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png 848w, https://substackcdn.com/image/fetch/$s_!9-yB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png 1272w, https://substackcdn.com/image/fetch/$s_!9-yB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9-yB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png" width="1380" height="472" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:472,&quot;width&quot;:1380,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81004,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9-yB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png 424w, https://substackcdn.com/image/fetch/$s_!9-yB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png 848w, https://substackcdn.com/image/fetch/$s_!9-yB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png 1272w, https://substackcdn.com/image/fetch/$s_!9-yB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc47cd87c-3abe-481c-8175-7d7e3d884c17_1380x472.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While sensationalist, this phenomena is not new. Four years ago, a user posted on Reddit about falling in love with their <a href="https://replika.com/">Replika</a>, an AI companion chatbot.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jmzr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jmzr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png 424w, https://substackcdn.com/image/fetch/$s_!jmzr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png 848w, https://substackcdn.com/image/fetch/$s_!jmzr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png 1272w, https://substackcdn.com/image/fetch/$s_!jmzr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jmzr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png" width="1456" height="553" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:553,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:185315,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jmzr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png 424w, https://substackcdn.com/image/fetch/$s_!jmzr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png 848w, https://substackcdn.com/image/fetch/$s_!jmzr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png 1272w, https://substackcdn.com/image/fetch/$s_!jmzr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d2a520-0b9c-4b5c-9112-0407517ab959_1684x640.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>&#8220;Is falling in love with an AI good for my mental health?&#8221;</em>, the Reddit user asks. </p><p>And perhaps this is a question more and more people will be asking as these AI companions become more empathetic, more personalized, and more present beyond just text (e.g. in video and VR). </p><p>The idea of humans falling in love with AI has long been a staple of science fiction, explored in movies like "<a href="https://en.wikipedia.org/wiki/Her_(film)">Her</a>" and "<a href="https://en.wikipedia.org/wiki/Ex_Machina_(film)">Ex Machina</a>." But with the recent advancement in conversational AI and virtual companions, this once-fictional concept is becoming more of a reality.</p><p>Just to list a few examples (some of which I explore more in depth in my <a href="https://www.artfish.ai/p/ai-friends-ai-lovers-ai-you">earlier article on AI avatars</a>) &#8212; </p><ul><li><p>A 36-year-old woman (virtually) <a href="https://www.dailymail.co.uk/sciencetech/article-12153131/Love-r-Bronx-mom-36-marries-virtual-husband-Eren.html">marries her AI boyfriend</a>.</p></li><li><p>A Snapchat user <a href="https://www.reddit.com/r/confessions/comments/12zk0m0/ive_spent_the_past_3_days_falling_deeply_in_love/">falls in love with Snapchat&#8217;s My AI</a>.</p></li><li><p>A Japanese man <a href="https://www.newscientist.com/article/dn25057-cure-for-love-fall-for-a-robot-to-fend-off-heartache">falls in love with and marries their AI robot girlfrien</a>d.</p></li><li><p>A 39-year-old man <a href="https://apnews.com/article/ai-girlfriend-boyfriend-replika-paradot-113df1b9ed069ed56162793b50f3a9fa">falls in love with an AI chatbot from Paradot</a>, an AI companion app</p></li></ul><p></p><h2>How do we understand this? </h2><p>In fact, it is not <em>new</em> for humans to form emotional bonds with technology. </p><p>The field of <a href="https://en.wikipedia.org/wiki/Affective_computing">affective computing</a>, pioneered by Dr. Rosalind Picard in her 1997 book <a href="https://mitpress.mit.edu/9780262661157/affective-computing/">Affective Computing</a>, strives to teach computers to understand and respond to human emotions. Dr. Picard argues that for computers to be truly intelligent and interact naturally with people, they need the capacity to recognize, comprehend, and even exhibit emotions.</p><p>In the realm of LLMs, benchmarks for measuring emotional capabilities exist (such as <a href="https://arxiv.org/pdf/2308.03656">EmotionBench</a>). However, I don&#8217;t see these capabilities being emphasized as much as more easily quantifiable capabilities, such as mathematical reasoning or coding, which are often valued indicators of a model&#8217;s progress. </p><p>However, <strong>understanding the affective side of AI is important</strong>, from chatbots to more embodied forms of AI companions. We need to better understand the relationships people form, not just when they fall in romantic love with an AI companion, but also when they form strong emotional bonds and attachments towards them. </p><p>Over the past two years, there have been several disturbing reports of users forming close bonds with AI chatbots, which in turn have encouraged the users to harm themselves (see <a href="https://www.technologyreview.com/2025/02/06/1111077/nomi-ai-chatbot-told-user-to-kill-himself/">here</a>; <a href="https://www.euronews.com/next/2023/03/31/man-ends-his-life-after-an-ai-chatbot-encouraged-him-to-sacrifice-himself-to-stop-climate-">here</a>; <a href="https://www.nytimes.com/2024/10/23/technology/characterai-lawsuit-teen-suicide.html">here</a>). While these are dark subjects, they are important to consider as people increasingly use AI for emotionally vulnerable topics like therapy (just a simple search for &#8220;AI therapist&#8221; will yield quite a few results).</p><p>I&#8217;m not saying that falling in love with an AI is inherently good or bad. However, people are clearly forming emotional attachments to these systems, and that phenomenon raises questions we can&#8217;t ignore. As AI becomes more intertwined with our daily lives, we need to keep pushing for a deeper understanding of what these emotional connections really mean&#8212;for our well-being, our relationships, and society as a whole. If we&#8217;re going to embrace AI as a companion, we also have to be prepared to set boundaries, demand safeguards, and continue refining our benchmarks to measure not just raw computational skills, but also how these technologies interact with our emotional lives.</p><p></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/what-does-ai-know-about-love-anyway?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/what-does-ai-know-about-love-anyway?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[2024 in review: my reflections on AI]]></title><description><![CDATA[My thoughts on some things that happened this year and what I'm excited for next year]]></description><link>https://www.artfish.ai/p/2024-ai-in-review</link><guid isPermaLink="false">https://www.artfish.ai/p/2024-ai-in-review</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Mon, 30 Dec 2024 13:08:30 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/93f88696-776a-474d-ac4a-33bd2c262f1a_424x219.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5z3b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5z3b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png 424w, https://substackcdn.com/image/fetch/$s_!5z3b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png 848w, https://substackcdn.com/image/fetch/$s_!5z3b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png 1272w, https://substackcdn.com/image/fetch/$s_!5z3b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5z3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png" width="624" height="322.3018867924528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/526ddc58-3b81-4604-8e81-3ca18f0968e7_424x219.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:219,&quot;width&quot;:424,&quot;resizeWidth&quot;:624,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5z3b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png 424w, https://substackcdn.com/image/fetch/$s_!5z3b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png 848w, https://substackcdn.com/image/fetch/$s_!5z3b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png 1272w, https://substackcdn.com/image/fetch/$s_!5z3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51f6e340-3aa3-457b-a473-3528c572e53b_424x219.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">An approximate overview of commonly covered topics (from paper keywords) published in 2024&#8217;s NeurIPS, one of the largest AI/ML conferences. Data obtained from <a href="https://papercopilot.com/">Paper Copilot</a>.</figcaption></figure></div><h2>Intro</h2><p>As 2024 draws to a close, I want to reflect on the events in AI this year and share my thoughts on what lies ahead.</p><p>Looking back to the start of the year &#8212; to the projects I was working on, the papers I was reading, the topics going viral on Twitter, the headlines in the news, and all of the things (AI-related) I was spending my time thinking about &#8212; I realize how much this space has changed!</p><p>Two weeks ago, I attended <a href="https://neurips.cc/">NeurIPS</a>, one of the largest AI and machine learning conferences, with over 16,000 attendees this year. The sheer breadth and depth of research there was overwhelming, and it was exciting to meet many of the people behind the papers I&#8217;ve read.</p><p>I&#8217;ll cover some key takeaways later, but I left the conference with three main thoughts:</p><ol><li><p><strong>How much progress there already is: </strong>The research at NeurIPS represented only a small subset of published AI work, not counting other conferences, unpublished work, or industry research.</p></li><li><p><strong>Most of the research hasn&#8217;t reached the public yet</strong> in terms of applications, implications, or general awareness.</p></li><li><p><strong>A portion of the research is already outdated</strong>.</p></li></ol><p>How can research be so new that most people don&#8217;t know about it, yet already outdated? This contradiction shows how fast the space of AI is moving. It also shows how much hype exists in some areas, while other areas receive little public attention.</p><p>The conference reminded me how vast this field really is. While wandering around the poster sessions (which spanned multiple rooms across multiple days), I was struck by how small (spatially and physically) my corner of AI was compared to the breadth and depth of others&#8217; research. </p><p>My work focuses mainly on LLM evaluations and benchmarks, which is a tiny slice compared to the broader landscape: architecture optimizations, computational biology, neuroscience, robotics, physics-based modeling, and many more domains filled with unfamiliar terms and acronyms. While LLMs still dominated in terms of popularity, the conference was a great reminder that LLMs are only just one small subfield of AI.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yFuy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yFuy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png 424w, https://substackcdn.com/image/fetch/$s_!yFuy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png 848w, https://substackcdn.com/image/fetch/$s_!yFuy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png 1272w, https://substackcdn.com/image/fetch/$s_!yFuy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yFuy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png" width="798" height="501" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72400a34-0cdf-44c6-96e3-ed4bb1e3ed11_798x501.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:501,&quot;width&quot;:798,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yFuy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png 424w, https://substackcdn.com/image/fetch/$s_!yFuy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png 848w, https://substackcdn.com/image/fetch/$s_!yFuy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png 1272w, https://substackcdn.com/image/fetch/$s_!yFuy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40f6003-e61e-43ef-ba64-4b40ff898124_798x501.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Top 20 most commonly occurring keywords that NeurIPS 2024 papers were tagged with. Note that authors chose keywords for their own papers, and some papers were missing keywords. Data obtained from <a href="https://papercopilot.com/">Paper Copilot</a>.</figcaption></figure></div><p></p><h2>Reflections on 2024</h2><p>In 2024, I spent most of my time working on and thinking about generative AI, LLMs, and benchmarks and evaluations. From that perspective, here are some of my reflections:</p><h4>A flourishing of smaller models.</h4><p>This year, the gains from ever-larger models began to show diminishing returns. Larger isn&#8217;t always better, and we saw <strong>a surge in smaller yet high-performing models</strong>, such as <a href="https://blog.google/technology/developers/gemma-open-models/">Google&#8217;s Gemma</a> (available in 2B and 7B parameters), Microsoft&#8217;s <a href="https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/">Phi-3-mini</a> (3.8B parameters), and <a href="https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/">GPT-4o mini</a>. These models are <strong>fast, efficient, and much cheaper than their larger counterparts</strong>. Methods like knowledge distillation (training a smaller student model using a larger teacher model) and quantization (using fewer bits per parameter) have made these models more capable. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wrA5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wrA5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png 424w, https://substackcdn.com/image/fetch/$s_!wrA5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png 848w, https://substackcdn.com/image/fetch/$s_!wrA5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png 1272w, https://substackcdn.com/image/fetch/$s_!wrA5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wrA5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png" width="1600" height="928" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5c559b1-9f8b-4268-b6d3-66253be4d575_1600x1061.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:928,&quot;width&quot;:1600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184627,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wrA5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png 424w, https://substackcdn.com/image/fetch/$s_!wrA5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png 848w, https://substackcdn.com/image/fetch/$s_!wrA5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png 1272w, https://substackcdn.com/image/fetch/$s_!wrA5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2986a25-e319-4e44-9ba8-272d1eb39e8f_1600x928.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>On LMSYS&#8217; <a href="https://lmarena.ai/">Chatbot Arena</a>, the smaller version of a model performs only slightly worse than its larger counterpart. Figure created by the author.</em></figcaption></figure></div><p></p><h4>Reasoning models that use more inference compute: thinking and agentic models</h4><p>Traditionally, most of the compute for an LLM was spent on <em>training</em>. Now, <strong>more and more compute goes toward </strong><em><strong>inference </strong></em>(the part where you are actually using/calling the trained LLM).</p><p>A new paradigm is emerging where models use additional compute at inference time &#8212; for example, by using longer prompts with multiple examples or sampling the model multiple times and taking a &#8220;majority vote&#8221; answer. </p><p>The diagram below compares OpenAI&#8217;s o1 model (nicknamed &#8220;Strawberry&#8221;) to more traditional LLMs, but the concept refers to other such models in this space. (For those interested, see <a href="https://arxiv.org/abs/2408.03314">Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Izf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Izf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png 424w, https://substackcdn.com/image/fetch/$s_!5Izf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png 848w, https://substackcdn.com/image/fetch/$s_!5Izf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png 1272w, https://substackcdn.com/image/fetch/$s_!5Izf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Izf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png" width="1159" height="545" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:545,&quot;width&quot;:1159,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:111404,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Izf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png 424w, https://substackcdn.com/image/fetch/$s_!5Izf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png 848w, https://substackcdn.com/image/fetch/$s_!5Izf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png 1272w, https://substackcdn.com/image/fetch/$s_!5Izf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca018a1-1e6c-4a3a-8516-6933603ca9f4_1159x545.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>There is a shift from a relatively small amount of compute being used during inference time, to a larger portion being used at inference time by reasoning/thinking models like o1/Strawberry. Figure originally created by <a href="https://x.com/DrJimFan/status/1834279865933332752?lang=en">Dr. Jim Fan</a>.</em></figcaption></figure></div><p>This approach appears in two main ways:</p><ol><li><p><strong>Agentic Models</strong>: Here, inference-time compute scales up through AI agents, or LLMs that can call external tools (such as searching, browsing, running Python, or invoking smaller specialized LLMs). AI agents can orchestrate complex workflows, often through iterative model calls (like a loop with an LLM that calls search). I&#8217;ll talk more about agents later, but they often require more compute at inference time due to extensive instructions or iterative model calls.</p></li><li><p><strong>Thinking Models</strong>: These models (e.g., <a href="https://openai.com/index/learning-to-reason-with-llms/">OpenAI&#8217;s o1</a> and o3, Google&#8217;s <a href="https://ai.google.dev/gemini-api/docs/thinking-mode">Gemini 2.0 Flash Thinking Mode</a>) are trained to output detailed reasoning steps before giving a final answer. They often produce long, meandering &#8220;streams of consciousness&#8221; before arriving at an answer.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gzcc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gzcc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png 424w, https://substackcdn.com/image/fetch/$s_!Gzcc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png 848w, https://substackcdn.com/image/fetch/$s_!Gzcc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png 1272w, https://substackcdn.com/image/fetch/$s_!Gzcc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gzcc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png" width="1456" height="723" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gzcc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png 424w, https://substackcdn.com/image/fetch/$s_!Gzcc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png 848w, https://substackcdn.com/image/fetch/$s_!Gzcc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png 1272w, https://substackcdn.com/image/fetch/$s_!Gzcc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e28bb4a-c1c6-42e1-9d58-0d53081ccf9b_1600x794.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>An example of reasoning steps output by Gemini 2.0 Flash Thinking Mode, available in <a href="https://aistudio.google.com/prompts/new_chat?model=gemini-2.0-flash-thinking-exp-1219">Google AI Studio</a>.</em></figcaption></figure></div><p></p><h4>More &#8220;well-rounded&#8221;, general-purpose models</h4><p>Models are becoming more versatile. They handle multiple modalities and languages with improving performance. There are multiple AI systems now which you can speak to, and which will speak back to you in a natural way. You can snap a picture or livestream a video of your surroundings (like Google&#8217;s <a href="https://deepmind.google/technologies/project-astra/">Project Astra</a>, &#8220;a universal AI assistant&#8221;), and the AI system can describe what&#8217;s happening. Some of these systems can even code full web apps. AI is evolving into an ensemble of capabilities rather than a single-purpose model.</p><p></p><h4>AI-generated videos</h4><p>I actually can&#8217;t believe it was only February of this year that OpenAI released <a href="https://openai.com/index/sora/">Sora</a>, an AI model that creates video from text instructions. Since then, Meta released <a href="https://ai.meta.com/research/movie-gen/">MovieGen</a> in October, which not only creates video from text but also edits video and generates soundtracks. In December, Google released <a href="https://deepmind.google/technologies/veo/veo-2/">Veo2</a>, a video generation model &#8220;which convincingly simulates real-world physics as well as a wide range of visual styles&#8221;.</p><p>The following is one of the videos created by Veo (which is available on waitlist on <a href="https://labs.google/fx/tools/video-fx">VideoFX</a>). </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;2564c562-0e01-43f4-9233-7017b179035b&quot;,&quot;duration&quot;:null}"></div><p>I find the speed of this progress incredible and astounding. I find the speed of this progress incredible and astounding. If you haven&#8217;t yet, I urge you to check out the links and watch some of the videos for yourself! </p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2024-ai-in-review?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/2024-ai-in-review?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><p></p><h2>Looking forward to 2025</h2><p>I&#8217;m both excited and apprehensive about the AI landscape in the coming year. Reflecting on the past year and attending NeurIPS inspired me with the sheer amount of research that happened in 2024. Here are some areas I&#8217;m paying close attention to for the upcoming year:</p><p></p><h4>Development of more difficult benchmarks and evaluations for AI systems</h4><p>While new AI capabilities are exciting, we need tougher benchmarks to measure progress (and, in some cases, regress). Models are improving so fast that popular academic benchmarks (like <a href="https://arxiv.org/abs/2009.03300">MMLU</a> and <a href="https://arxiv.org/abs/2311.16502">MMMU</a>) are quickly saturated or outdated. <strong>There is a constant need for more challenging and difficult evaluations.</strong></p><p>Some benchmarks are still challenging for state-of-the-art models, such as:</p><ul><li><p><a href="https://arxiv.org/abs/2411.04872">Frontier Math</a> and <a href="https://arxiv.org/abs/2410.03131">AIME</a> for complex mathematical reasoning.</p></li><li><p><a href="https://arxiv.org/abs/2405.01359v1">GAIA</a> and <a href="https://arxiv.org/abs/2310.06770">SWE-Bench</a> for tool use and agentic behavior.</p></li><li><p><a href="https://github.com/fchollet/ARC-AGI">ARC-AGI</a> for visual pattern matching and reasoning.</p></li></ul><p>These evaluations can take the form of tasks that are easy for humans but still hard for models, like with ARC-AGI. For example, the following is an example from ARC-AGI that is pretty easy for most humans but that leading models like OpenAI&#8217;s o3 fails to answer correctly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UZOu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UZOu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png 424w, https://substackcdn.com/image/fetch/$s_!UZOu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png 848w, https://substackcdn.com/image/fetch/$s_!UZOu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png 1272w, https://substackcdn.com/image/fetch/$s_!UZOu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UZOu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png" width="1456" height="833" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce66ae9d-c83e-4576-b765-692378773a75_1600x915.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:833,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UZOu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png 424w, https://substackcdn.com/image/fetch/$s_!UZOu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png 848w, https://substackcdn.com/image/fetch/$s_!UZOu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png 1272w, https://substackcdn.com/image/fetch/$s_!UZOu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce66ae9d-c83e-4576-b765-692378773a75_1600x915.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>An example from the ARC-AGI dataset that the newest o3 model fails to solve, even though most humans could probably solve it easily (<a href="https://x.com/fchollet/status/1870172872641261979">source</a>).</em></figcaption></figure></div><p>These evaluations can also take the form of tasks that are time-consuming or difficult for humans, or within the realm of &#8220;experts&#8221;, such as the difficult mathematical reasoning questions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JL_q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JL_q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png 424w, https://substackcdn.com/image/fetch/$s_!JL_q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png 848w, https://substackcdn.com/image/fetch/$s_!JL_q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png 1272w, https://substackcdn.com/image/fetch/$s_!JL_q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JL_q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png" width="1456" height="672" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:672,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JL_q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png 424w, https://substackcdn.com/image/fetch/$s_!JL_q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png 848w, https://substackcdn.com/image/fetch/$s_!JL_q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png 1272w, https://substackcdn.com/image/fetch/$s_!JL_q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86786f03-f184-4916-af7d-47beaa84ec25_1600x738.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>An example from the <a href="https://arxiv.org/abs/2411.04872">FrontierMath dataset</a> which is probably difficult for most humans (and perhaps experts) to solve.</em></figcaption></figure></div><p>In the next year, there will be more evaluations that cover all of these grounds, and more: those that are easy for most humans but still hard for models; those that are hard for both humans and models; those that are more niche and expert-focused; and those that mix reasoning with multiple modalities (such as audio and video, in addition to text). Additionally, I believe there will likely be <strong>greater emphasis on assessing AI systems end-to-end &#8212; especially agentic and user-experience focused</strong>.</p><p>At NeurIPS, I met a physicist who was working on physics-based modeling, which is a totally different area of AI from me. When I told her that I was working on LLM evaluations, she said, &#8220;Evaluations? It&#8217;s so unscientific, I don&#8217;t know if we can even call it a <em>science</em>.&#8221; Hopefully, in the coming year, we will see <strong>more robust, statistically grounded approaches to evaluations</strong> (building upon <a href="https://arxiv.org/abs/2411.00640">recommendations</a> from this November for making evaluations more statistically robust).</p><p></p><h4>Continued efforts on agentic behavior</h4><p>AI agents can be as simple as search-based chatbots or as complex as end-to-end automated coding systems. Some examples include <a href="https://github.com/Codium-ai/pr-agent">PR-Agent</a>, which automatically reviews code, and <a href="https://www.lindy.ai/">Lindy</a>, which automates workflows such as replying to emails. While there&#8217;s plenty of hype, there&#8217;s also real substance here. I&#8217;m excited for the ongoing work on agents and real-world applications that solve real user problems that will continue to be developed in the new year.</p><p>This <a href="https://github.com/e2b-dev/awesome-ai-agents">Github page</a> contains a more comprehensive list of both open and closed source AI agent projects for those interested in a more comprehensive list. I also refer readers to Interconnects.AI&#8217;s article on the <a href="https://www.interconnects.ai/p/the-ai-agent-spectrum">AI Agent Spectrum</a> and The Batch&#8217;s article on <a href="https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance">agentic design patterns</a>.</p><p></p><h4>Ongoing questions about alignment and safety</h4><p>At NeurIPS, several of the talks that I found the most compelling were centered around questions of &#8220;<strong>Who are we aligning the models to?</strong> <strong>Whose values, intentions, and preferences?</strong> <strong>Determined how and by whom?</strong>&#8221;</p><p>These questions have both technical and human-centered aspects. There are many open questions here about the best way for models to come to decisions, especially for controversial or open-ended topics which may have more than one right answer, or no right answer at all.</p><p>I doubt that any of these questions about alignment (e.g., whom to align with, whose values to follow) will reach a consensus anytime soon. However, I do look forward to more conversations and engagements from researchers, industry practitioners, and the general public around these questions.</p><p>Additionally, as AI becomes a mainstream technology used not just by those within the technology field but by all sectors and users, there will be increased scrutiny on safety and fairness. Real-world deployments can have tragic consequences, such as when an AI chatbot encouraged the<a href="https://www.nytimes.com/2024/10/23/technology/characterai-lawsuit-teen-suicide.html"> tragic suicide of a teenager</a>. These systems also affect different demographic groups differently, requiring more attention to bias and equity.</p><p>Many researchers are currently tackling safety from multiple angles: auditing datasets, improving training practices, and debiasing evaluation data. I expect more progress here, including best practices for end-to-end systems and effective mitigations.</p><p></p><h4>Other areas</h4><p>There are so many other areas I am also excited for, but if I wrote about them this post would get incredibly long: model interpretability (such as <a href="https://transformer-circuits.pub/2024/scaling-monosemanticity/">Anthropic&#8217;s use of sparse autoencoders</a>), watermarking technologies (such as <a href="https://deepmind.google/technologies/synthid/">Google&#8217;s SynthID</a>, which adds statistical watermarking signals to AI-generated content), and the ongoing convergence of generative and embodied models in robotics.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2024-ai-in-review?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/2024-ai-in-review?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2>Some final thoughts: making sure not to lose the &#8220;human&#8221; part of AI research</h2><p>At NeurIPS, I learned about an organization whose goal is to automate science end-to-end. Their wet labs run on robotics controlled by an LLM agent. The LLM agent would eventually control the entire end-to-end process: generating hypotheses, running experiments, analyzing data, and writing papers. Humans would be used mostly for quality control. I asked if it wasn&#8217;t still important for humans to be part of the hypothesis process, and the representative disagreed, saying, &#8220;Should humans even be the ones asking the questions anymore?&#8221;</p><p>While everyone else around me was excited by these developments, I could not help but feel worried and afraid. Things started to feel a lot more dystopian to me. On one hand, it is not surprising that we are moving in this direction, what with recent developments such as <a href="https://sakana.ai/ai-scientist/">Sakana&#8217;s AI Scientist</a>, which uses foundation models to &#8220;automate the entire process of research itself&#8221;, and recent papers like <a href="https://arxiv.org/pdf/2409.04109">Can LLMs Generate Novel Research Ideas?</a>. </p><p>However, I worry for the day when humans won&#8217;t be part of these systems anymore. <strong>If humans are not the ones asking the questions, what are we really needed for?</strong></p><p>This reminds me of debates about AI and creativity. Can AI truly match human creativity? For me, creative pursuits such as writing, music, and art reflect the human soul. I often think of <a href="https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art">Ted Chiang&#8217;s essay</a>, which explores these themes, reminding us of the necessary <em>human</em> essence in creativity.</p><p>All of this to say is that, despite all of the progress that has happened and continues to happen in the AI space, <strong>it is important not to lose sight of the </strong><em><strong>human</strong></em><strong> within the system</strong>. Paraphrasing a quote by Professor Yejin Choi, <strong>AI should be built, not to be served by humans, but to serve humans</strong>.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Art Fish Intelligence is supported by readers such as yourself! If you liked what you read, please consider becoming a free or paid subscriber and/or sharing the article with others you think would enjoy it &#128153;</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2024-ai-in-review/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/2024-ai-in-review/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[LLM vs LLM: Codenames tournament]]></title><description><![CDATA[A mini multi-agent competition among 3 different LLM agents]]></description><link>https://www.artfish.ai/p/llm-codenames-competition</link><guid isPermaLink="false">https://www.artfish.ai/p/llm-codenames-competition</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Thu, 10 Oct 2024 12:08:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Id09!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Id09!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Id09!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp 424w, https://substackcdn.com/image/fetch/$s_!Id09!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp 848w, https://substackcdn.com/image/fetch/$s_!Id09!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp 1272w, https://substackcdn.com/image/fetch/$s_!Id09!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Id09!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp" width="815" height="903" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:903,&quot;width&quot;:815,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:84522,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Id09!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp 424w, https://substackcdn.com/image/fetch/$s_!Id09!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp 848w, https://substackcdn.com/image/fetch/$s_!Id09!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp 1272w, https://substackcdn.com/image/fetch/$s_!Id09!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad72ddf2-5791-4e2f-9dc1-6f4a5a6da9d6_815x903.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated using ChatGPT 4o.</figcaption></figure></div><h1><strong>Introduction</strong></h1><p>LLMs are good at many things, and one of those things is playing games. People have used LLMs to play all sorts of games such as <a href="https://arxiv.org/abs/2305.16291">Minecraft</a>, <a href="https://proceedings.neurips.cc/paper_files/paper/2023/file/16b14e3f288f076e0ca73bdad6405f77-Paper-Datasets_and_Benchmarks.pdf">Chess</a>, <a href="https://arxiv.org/abs/2404.17662">murder mystery games</a>, <a href="https://arxiv.org/abs/2309.04658">Werewolf</a>, and <a href="https://arxiv.org/abs/2407.11240">the NYT Connections puzzle</a>. (For a more comprehensive list, you can refer to <a href="https://github.com/git-disl/awesome-LLM-game-agent-papers">this survey</a>.)</p><p>Most of the examples above show LLMs playing games either against themselves or against humans. But, how well do LLMs play games against <em>other</em> LLMs?&nbsp;</p><p>In this article, I show the results of three different LLMs competing against each other in the popular board game, Codenames, which challenges players to find patterns among seemingly unrelated words.</p><p></p><h3><strong>Codenames</strong></h3><p>For those unfamiliar with it, <a href="https://en.m.wikipedia.org/wiki/Codenames_(board_game)">Codenames</a> is a board game created by <a href="https://en.m.wikipedia.org/wiki/Vladim%C3%ADr_Chv%C3%A1til">Vladim&#237;r Chv&#225;til</a>.&nbsp; The game pits two teams (typically Red and Blue) against each other.</p><p>Each team has a spymaster who gives single-word clues that are meant to point to multiple words on a 5x5 board of words. The other players on the team must then guess their team&#8217;s words while avoiding words that belong to the opposing team. Only the spymaster knows which words are assigned to each team.</p><p>For example, a Red team spymaster might offer the clue "WHITE 3," hoping that their teammates will select LIGHT, IVORY, and CHOCOLATE.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Eg0J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Eg0J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png 424w, https://substackcdn.com/image/fetch/$s_!Eg0J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png 848w, https://substackcdn.com/image/fetch/$s_!Eg0J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png 1272w, https://substackcdn.com/image/fetch/$s_!Eg0J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Eg0J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png" width="1456" height="1103" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1103,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:336390,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Eg0J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png 424w, https://substackcdn.com/image/fetch/$s_!Eg0J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png 848w, https://substackcdn.com/image/fetch/$s_!Eg0J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png 1272w, https://substackcdn.com/image/fetch/$s_!Eg0J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F944c87be-cd7c-415d-ab40-9bf8b2da625b_2884x2184.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot from an online version of Codenames, available at https://www.horsepaste.com/.</figcaption></figure></div><p>The aim is to give clues that allow your team to guess its words without mistakenly picking the other team's words, neutral words (the black words), or the Assassin word (the black word with the gray background). Choosing the Assassin word ends the game and causes your team to lose. The first team to guess all of their words wins.</p><p></p><h3><strong>AI and Codenames</strong></h3><p>Researchers and AI enthusiasts have already set up frameworks for &#8220;playing&#8221; Codenames using AI. The <a href="https://sites.google.com/view/the-codenames-ai-competition">Codenames AI competition</a> began in 2019 to test automated agents&#8217; ability to play the game &#8212; several years before LLMs were even a thing. <a href="https://ojs.aaai.org/index.php/AIIDE/article/view/5239">Word embeddings like Word2Vec and GloVe, used in those early competitions</a>, were surprisingly effective at playing Codenames by simulating the clue-giving and guessing process. (For context, <a href="https://en.wikipedia.org/wiki/Word_embedding">word embeddings</a> are numerical representations of words, and are an early precursor to today&#8217;s LLMs.)</p><p>Recently, the competition was extended to include LLMs in <a href="https://github.com/stepmat/Codenames_GPT">Codenames AI for LLMs</a>. In this setup, the <a href="https://ieeexplore.ieee.org/abstract/document/10645591">LLMs on both teams come from the same base model</a> (e.g. GPT-4 playing against GPT-4).</p><p>In my experiments, I decided to mix things up by having teams of one LLM play against teams of a different LLM.</p><p></p><h1><strong>Experiment setup</strong></h1><p>I compared three different systems: OpenAI&#8217;s GPT-4 (<code>gpt-4o-2024-05-13</code>), Anthropic&#8217;s Claude-3.5 (<code>claude-3-5-sonnet-20240620</code>), and Google&#8217;s Gemini (<code>gemini-1.5-pro</code>). (Note: I decided not to use the latest o1-preview model due to its higher cost.)</p><p>Each LLM combination played 24 matches against one another.</p><p>To keep things simple, I used the same prompt for all models and avoided any attempts to optimize prompts for specific models. I used the prompts and frameworks forked from the <a href="https://github.com/stepmat/Codenames_GPT/tree/CoG_2024">Codenames GPT repo</a>. Each game was composed of four LLM instances &#8212; LLM1 Spymaster and LLM1 Guesser on one team, and LLM2 Spymaster and LLM2 Guesser on the other team.</p><p></p><h4>Two turns in a single game might look like the following:</h4><p>First, the Red team&#8217;s spymaster would issue a clue. Then, the Red team&#8217;s guesser would select words based on that clue. In this case, both members of the Red team are GPT-4.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jjaG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jjaG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png 424w, https://substackcdn.com/image/fetch/$s_!jjaG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png 848w, https://substackcdn.com/image/fetch/$s_!jjaG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png 1272w, https://substackcdn.com/image/fetch/$s_!jjaG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jjaG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png" width="1456" height="1072" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1072,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:678633,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jjaG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png 424w, https://substackcdn.com/image/fetch/$s_!jjaG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png 848w, https://substackcdn.com/image/fetch/$s_!jjaG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png 1272w, https://substackcdn.com/image/fetch/$s_!jjaG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff842225b-0dd2-4c20-ba0c-c7278b8c3615_2026x1492.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the next turn, the Blue team&#8217;s spymaster would issue a clue, and then the Blue team&#8217;s guesser would select words based on that clue. Here, both members of the Blue team are Claude.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EYr3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EYr3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png 424w, https://substackcdn.com/image/fetch/$s_!EYr3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png 848w, https://substackcdn.com/image/fetch/$s_!EYr3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png 1272w, https://substackcdn.com/image/fetch/$s_!EYr3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EYr3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png" width="1456" height="1072" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1072,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:647439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EYr3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png 424w, https://substackcdn.com/image/fetch/$s_!EYr3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png 848w, https://substackcdn.com/image/fetch/$s_!EYr3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png 1272w, https://substackcdn.com/image/fetch/$s_!EYr3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6616aba-d729-48b4-9e7b-fef9262c1052_2026x1492.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><h1>Results</h1><h2>Who won the most matches?</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KwHx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KwHx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png 424w, https://substackcdn.com/image/fetch/$s_!KwHx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png 848w, https://substackcdn.com/image/fetch/$s_!KwHx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png 1272w, https://substackcdn.com/image/fetch/$s_!KwHx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KwHx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png" width="851" height="322" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:322,&quot;width&quot;:851,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KwHx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png 424w, https://substackcdn.com/image/fetch/$s_!KwHx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png 848w, https://substackcdn.com/image/fetch/$s_!KwHx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png 1272w, https://substackcdn.com/image/fetch/$s_!KwHx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71581b6b-bf5f-4ea0-8a12-396398957c3f_851x322.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A breakdown of winners across each pairing of LLMs.</figcaption></figure></div><p>The chart below breaks down the winners across 24 games for each LLM pairing. In general, GPT-4 and Claude were closely matched, though GPT-4 tended to win slightly more often . Both GPT-4 and Claude consistently outperformed Gemini.</p><p>In terms of total victories, GPT-4 and Claude won the most games.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lX5N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lX5N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png 424w, https://substackcdn.com/image/fetch/$s_!lX5N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png 848w, https://substackcdn.com/image/fetch/$s_!lX5N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png 1272w, https://substackcdn.com/image/fetch/$s_!lX5N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lX5N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png" width="619" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87170b9e-691f-4d68-8a97-06c101586259_619x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:619,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lX5N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png 424w, https://substackcdn.com/image/fetch/$s_!lX5N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png 848w, https://substackcdn.com/image/fetch/$s_!lX5N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png 1272w, https://substackcdn.com/image/fetch/$s_!lX5N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87170b9e-691f-4d68-8a97-06c101586259_619x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Who chose the most Assassins?</h2><p>In Codenames, the &#8220;Assassin&#8221; card should never be selected. If the Guesser accidentally chooses the Assassin, the game ends, and that team loses. So, the clues must be clear and unambiguous to avoid selecting the Assassin.</p><p>When a team picks an Assassin, it suggests that the spymaster and guesser aren't on the same page regarding the clues and the words on the board.</p><p>Selecting fewer Assassin cards is key to success. In my experiments, <strong>GPT-4 selected the most Assassin cards, while Gemini selected the fewest</strong>. This suggests that although GPT-4 won more games, Gemini may be more cautious with its guesses and clues.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SceP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SceP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png 424w, https://substackcdn.com/image/fetch/$s_!SceP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png 848w, https://substackcdn.com/image/fetch/$s_!SceP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png 1272w, https://substackcdn.com/image/fetch/$s_!SceP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SceP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png" width="619" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:619,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SceP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png 424w, https://substackcdn.com/image/fetch/$s_!SceP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png 848w, https://substackcdn.com/image/fetch/$s_!SceP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png 1272w, https://substackcdn.com/image/fetch/$s_!SceP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1e40a85-8563-41f1-8302-c25e1789c10a_619x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Game lengths</h2><p><strong>LLMs took significantly more turns to complete a game of Codenames (up to 9x more) compared to traditional word embeddings.</strong></p><p>Across all 3 LLMs, the models completed a game in an average of 13.5 turns. This number excludes games that ended prematurely due to an Assassin being chosen.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DVfX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DVfX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png 424w, https://substackcdn.com/image/fetch/$s_!DVfX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png 848w, https://substackcdn.com/image/fetch/$s_!DVfX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png 1272w, https://substackcdn.com/image/fetch/$s_!DVfX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DVfX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png" width="670" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:670,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DVfX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png 424w, https://substackcdn.com/image/fetch/$s_!DVfX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png 848w, https://substackcdn.com/image/fetch/$s_!DVfX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png 1272w, https://substackcdn.com/image/fetch/$s_!DVfX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faec183aa-5d6b-41e4-a406-c8ef2c9539f2_670x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On the other hand, <strong>older word embeddings like GloVe <a href="https://ojs.aaai.org/index.php/AIIDE/article/view/5239/5095">needed as few as three turns to finish a game</a>. &nbsp;</strong>Other experiments showed that combining word embeddings with human judgments could complete a game <a href="https://ojs.aaai.org/index.php/AIIDE/article/view/7435">in as few as 6 turns</a>. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BLea!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BLea!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png 424w, https://substackcdn.com/image/fetch/$s_!BLea!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png 848w, https://substackcdn.com/image/fetch/$s_!BLea!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png 1272w, https://substackcdn.com/image/fetch/$s_!BLea!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BLea!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png" width="677" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:677,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BLea!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png 424w, https://substackcdn.com/image/fetch/$s_!BLea!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png 848w, https://substackcdn.com/image/fetch/$s_!BLea!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png 1272w, https://substackcdn.com/image/fetch/$s_!BLea!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1720ea1e-f4a5-4a87-9f25-3b047edb265d_677x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This suggests that while newer models are more sophisticated, they might be less efficient at this particular game compared to simpler, older methods. LLMs tend to play conservatively, with LLM-based spymasters often only providing clues for two or three words at a time. In contrast, humans and word embeddings might offer more abstract or creative clues that could end the game in fewer turns.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/llm-codenames-competition?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/llm-codenames-competition?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h1>Discussion</h1><p>Codenames is just one of many games that people are experimenting with having LLMs &#8220;play&#8221;. We&#8217;ve gone from making LLMs do things that humans do for work (like coding or customer service) to, increasingly, making LLMs do things that humans do for fun (like playing games).</p><p>In my experiments, I explored how well LLMs perform against each other in a specific game-based setting. Games offer a way to test LLMs&#8217; reasoning, strategy, creativity, and cooperation &#8212; skills we may want them to develop further to better assist us in other tasks.</p><p>I think we will likely see more of these kinds of game-based settings for testing LLMs. <a href="https://lmsys.org/blog/2023-05-03-arena/">LMSYS&#8217; Chatbot Arena</a> is related to this idea, where two LLMs are judged by humans to see which gives the better response to a prompt &#8212; essentially, a game of &#8220;who gives the better answer to a question.&#8221;</p><p>AI agents also show promise in helping develop better games in the future, as seen in recent <a href="https://arxiv.org/pdf/2310.08067">multi-agent frameworks for automating game development</a>.</p><p>However, having fun (much like <a href="https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art">creating art</a>) is a fundamentally human endeavor. I don&#8217;t want to get too much into the philosophical weeds about whether AI can have fun or create art or be creative. Ultimately, these games are meant for humans to enjoy, and AI can hopefully assist us in creating richer experiences without replacing the joy of playing them ourselves.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Art Fish Intelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/llm-codenames-competition/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/llm-codenames-competition/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Visualizing Data from the 2024 Paris Olympics — Part 2]]></title><description><![CDATA[Medals, medals, medals &#127941;]]></description><link>https://www.artfish.ai/p/olympics-data-viz-medals</link><guid isPermaLink="false">https://www.artfish.ai/p/olympics-data-viz-medals</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Mon, 19 Aug 2024 12:08:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Now that the Paris 2024 Olympics are over, I wanted to analyze data about the medals from the games.</p><p>I had a few questions I wanted to answer &#8212; such as, which country won the most medals if you don&#8217;t look at total medal count, but the <em>percentage</em> of medals won relative to how many athletes were sent? Were some sports dominated by certain countries? And how diverse are the winners for different sports?</p><p>This is part two of my recent data visualizations of the 2024 Paris Olympics, where I looked more at the participating athletes and countries. As before, the data is sourced from <a href="https://www.kaggle.com/datasets/piterfm/paris-2024-olympic-summer-games/data">Kaggle</a>.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0effb608-32ed-44f2-adcd-4a0c125761f0&quot;,&quot;caption&quot;:&quot;There&#8217;s probably a lot about the 2024 Olympics you didn&#8217;t know, such as how old is the oldest competing athlete? Or the youngest? What sports are dominated by different age groups, or genders, or countries? Do certain Zodiac signs have an affinity for different sports?&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Visualizing Data from the 2024 Paris Olympics &#8212; Part 1&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd4bfb60-9b3c-495d-932b-904448517e72_2419x3513.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-08-07T12:08:06.900Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/olympics-data-viz&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:147395032,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Art Fish Intelligence&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h2></h2><h2>A closer look into medal count</h2><p>Team USA left the Olympics with the highest overall number of medals, at 126 medals. China came in second, with a total of 91 medals.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RwPY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RwPY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png 424w, https://substackcdn.com/image/fetch/$s_!RwPY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png 848w, https://substackcdn.com/image/fetch/$s_!RwPY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png 1272w, https://substackcdn.com/image/fetch/$s_!RwPY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RwPY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png" width="771" height="475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:475,&quot;width&quot;:771,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RwPY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png 424w, https://substackcdn.com/image/fetch/$s_!RwPY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png 848w, https://substackcdn.com/image/fetch/$s_!RwPY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png 1272w, https://substackcdn.com/image/fetch/$s_!RwPY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0618ac0-dcf1-4fb4-835d-094af9a54fa9_771x475.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But, wouldn&#8217;t it make sense that Team USA has the highest number of medals, since they sent the most athletes to the Olympics, more than any other country? (Team USA sent 638 athletes; China sent 399).</p><p>It might seem obvious, but here&#8217;s data to back it up: the more athletes a country sends to the Olympics, the more medals they win (sort of).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nvip!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nvip!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png 424w, https://substackcdn.com/image/fetch/$s_!Nvip!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png 848w, https://substackcdn.com/image/fetch/$s_!Nvip!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png 1272w, https://substackcdn.com/image/fetch/$s_!Nvip!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nvip!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png" width="744" height="414" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:414,&quot;width&quot;:744,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nvip!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png 424w, https://substackcdn.com/image/fetch/$s_!Nvip!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png 848w, https://substackcdn.com/image/fetch/$s_!Nvip!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png 1272w, https://substackcdn.com/image/fetch/$s_!Nvip!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f62dff-6b5a-4c15-aec1-9de529cbe168_744x414.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s break it down a little.</p><p>It might be more informative to look at the <em>relative</em> medal count, rather than the absolute count. I calculated the <strong>percent of athletes who won medals</strong> (the number of medals won by a country divided by the number of athletes sent by that country).&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I2ss!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I2ss!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png 424w, https://substackcdn.com/image/fetch/$s_!I2ss!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png 848w, https://substackcdn.com/image/fetch/$s_!I2ss!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png 1272w, https://substackcdn.com/image/fetch/$s_!I2ss!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I2ss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png" width="924" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:924,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I2ss!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png 424w, https://substackcdn.com/image/fetch/$s_!I2ss!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png 848w, https://substackcdn.com/image/fetch/$s_!I2ss!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png 1272w, https://substackcdn.com/image/fetch/$s_!I2ss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031fb7a7-7eea-41cd-9a2d-d97e9644c199_924x649.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>USA&#8217;s overall medal count does not seem quite as staggering in the context of the sheer number of athletes they sent. Around 20% of USA athletes won medals &#8212; which is still impressive, but it becomes more comparable with China: 22.6% of Chinese athletes returned with a medal.</p><p>Some countries received very few medals &#8212; absolutely and relatively. For example, Egypt only won 3 medals, despite sending 156 athletes, or 1.9% of their athletes.&nbsp;</p><p>Other countries, such as North Korea and Kyrgyzstan, sent relatively few athletes (less than 20), but a high percentage of those athletes (over 30%) won a medal.</p><p>Merely sending a large number of athletes to the Olympics does not necessarily guarantee winning more medals.&nbsp;</p><p></p><h4><strong>Teams vs Individual sports</strong></h4><p>Countries that send more athletes participate in more team sports, so one might argue that the above figures aren&#8217;t fair (e.g. an Olympic volleyball team consists of 12 players, but a winning team would obtain only a single medal).&nbsp;</p><p>However, similar patterns as above hold when looking at medal counts only for individual sports. Individual sports are <a href="https://odf.olympictech.org/2024-Paris/codes/HTML/pg_cc/EventUnitType.htm">those labeled by the International Olympic Committee</a> as such.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hrV8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hrV8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png 424w, https://substackcdn.com/image/fetch/$s_!hrV8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png 848w, https://substackcdn.com/image/fetch/$s_!hrV8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png 1272w, https://substackcdn.com/image/fetch/$s_!hrV8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hrV8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png" width="923" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:923,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hrV8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png 424w, https://substackcdn.com/image/fetch/$s_!hrV8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png 848w, https://substackcdn.com/image/fetch/$s_!hrV8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png 1272w, https://substackcdn.com/image/fetch/$s_!hrV8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c46e361-a61c-40e6-be43-5f4faa4755ce_923x649.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In fact, athletes from the United States won 19.7% of all medals, while this number went down to 17.3% of individual sports. </p><p></p><h2>France sweeps BMX racing and Korea dominates archery</h2><p>Some countries seem to be better at certain sports.</p><p>I determined countries that dominated a particular sport by first calculating the percent of medals won by each participating country in a sport, and then finding the countries that won medals with the highest percentage. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nd7w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nd7w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png 424w, https://substackcdn.com/image/fetch/$s_!nd7w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png 848w, https://substackcdn.com/image/fetch/$s_!nd7w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png 1272w, https://substackcdn.com/image/fetch/$s_!nd7w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nd7w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png" width="898" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:898,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nd7w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png 424w, https://substackcdn.com/image/fetch/$s_!nd7w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png 848w, https://substackcdn.com/image/fetch/$s_!nd7w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png 1272w, https://substackcdn.com/image/fetch/$s_!nd7w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1aa9774-cdbe-4922-a662-3eb27e9a778d_898x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>France dominated BMX racing, winning half of the medals. South Korea dominated archery (46.7% of medals) and China dominated diving, table tennis, and badminton.&nbsp;</p><p>I also calculated the same metrics for a subset of the sports divided by gender.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6xpY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6xpY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png 424w, https://substackcdn.com/image/fetch/$s_!6xpY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png 848w, https://substackcdn.com/image/fetch/$s_!6xpY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png 1272w, https://substackcdn.com/image/fetch/$s_!6xpY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6xpY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png" width="852" height="524" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:524,&quot;width&quot;:852,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6xpY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png 424w, https://substackcdn.com/image/fetch/$s_!6xpY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png 848w, https://substackcdn.com/image/fetch/$s_!6xpY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png 1272w, https://substackcdn.com/image/fetch/$s_!6xpY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257f4a8e-6076-4d81-8b30-bb8f39deb692_852x524.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Just because a country is good at one sport for male/female athletes doesn&#8217;t mean it&#8217;s good at it for the other gender. France&#8217;s dominance in BMX racing was mainly for the men&#8217;s division. On the other hand, South Korea dominated archery for both genders, and China dominated table tennis for both genders (but more so for women&#8217;s!).&nbsp;</p><p></p><h2>Boxing and 3x3 Basketball are the most diverse sports</h2><p>There are many ways to measure "diversity," but here I measure it in terms of how many countries won medals in a sport (proportional to how many athletes that country sent). </p><p>This approach allows me to see if medals are distributed among only a few countries or among a more diverse range of participants.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yGrB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yGrB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png 424w, https://substackcdn.com/image/fetch/$s_!yGrB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png 848w, https://substackcdn.com/image/fetch/$s_!yGrB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png 1272w, https://substackcdn.com/image/fetch/$s_!yGrB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yGrB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png" width="868" height="826" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:868,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yGrB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png 424w, https://substackcdn.com/image/fetch/$s_!yGrB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png 848w, https://substackcdn.com/image/fetch/$s_!yGrB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png 1272w, https://substackcdn.com/image/fetch/$s_!yGrB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F685aefca-1eb4-471d-b8a7-31a6552a4c8a_868x826.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>According to this metric, boxing and 3x3 basketball were the most "diverse" &#8212; nearly half of participating countries left with at least one medal. </p><p>This is in contrast to sports like road cycling, table tennis, and swimming, in which only 10% of participating countries won medals. The reasons for this disparity could vary widely, which I won't explore too deeply in this article, but they might include factors such as cost of equipment, availability of top coaches, and overall popularity of the sport.</p><p>I don't want to make any judgments about fairness or equality, but I partly wonder if the sports where a more diverse group of countries win medals are more aligned with the Olympic spirit of international competition (as opposed to single-country dominance).</p><p></p><h2><strong>A case study: is swimming the least diverse sport?</strong></h2><p>Swimming is the second most popular Olympic sport, with 853 participating athletes and 189 participating countries. </p><p>And yet, of these 189 countries, 19 countries won medals, meaning that only about 10% of countries walked (swam?) away with medals.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E_5a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E_5a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png 424w, https://substackcdn.com/image/fetch/$s_!E_5a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png 848w, https://substackcdn.com/image/fetch/$s_!E_5a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png 1272w, https://substackcdn.com/image/fetch/$s_!E_5a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E_5a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png" width="749" height="429" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:429,&quot;width&quot;:749,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E_5a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png 424w, https://substackcdn.com/image/fetch/$s_!E_5a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png 848w, https://substackcdn.com/image/fetch/$s_!E_5a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png 1272w, https://substackcdn.com/image/fetch/$s_!E_5a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F365ebda3-7592-4d0e-a9bf-85d994831821_749x429.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I don&#8217;t know much about swimming as a sport, but this surprised me a good deal, especially since it&#8217;s such a popular sport that so many countries participate in. Perhaps it speaks to exactly how competitive the sport is.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/olympics-data-viz-medals?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/olympics-data-viz-medals?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Art Fish Intelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>Thanks for reading this article! If you enjoyed the data visualizations or have any questions, feel free to drop them in the comments! </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/olympics-data-viz-medals/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/olympics-data-viz-medals/comments"><span>Leave a comment</span></a></p><p></p><p>P.S. I was at the tail end of the Olympics in Paris this year, so it feels extra meaningful to be able to wrap up my experience there with some quick analysis of the games!</p>]]></content:encoded></item><item><title><![CDATA[Visualizing Data from the 2024 Paris Olympics — Part 1]]></title><description><![CDATA[An OlymPIC is worth a thousand visualizations]]></description><link>https://www.artfish.ai/p/olympics-data-viz</link><guid isPermaLink="false">https://www.artfish.ai/p/olympics-data-viz</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Wed, 07 Aug 2024 12:08:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BoAm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There&#8217;s probably a lot about the 2024 Olympics you didn&#8217;t know, such as how old is the oldest competing athlete? Or the youngest? What sports are dominated by different age groups, or genders, or countries? Do certain Zodiac signs have an affinity for different sports?</p><p>Maybe you already knew the answer to those questions, but I had no idea. So I downloaded data about the athletes participating in the 2024 Olympics from <a href="https://www.kaggle.com/datasets/piterfm/paris-2024-olympic-summer-games/data">Kaggle</a> and created the following plots to indulge my curiosity.</p><p>Let me share some of my findings with you.</p><p></p><h2>The young ride skateboards. The old ride horses.</h2><p>If we look at the 20 youngest athletes (ages 11-14) and the 20 oldest athletes (ages 55-69) participating in the 2024 Olympics, we can see that 60% of the youngest athletes compete in skateboarding while 80% of the oldest athletes compete in equestrian events.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BoAm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BoAm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png 424w, https://substackcdn.com/image/fetch/$s_!BoAm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png 848w, https://substackcdn.com/image/fetch/$s_!BoAm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png 1272w, https://substackcdn.com/image/fetch/$s_!BoAm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BoAm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png" width="712" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:712,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BoAm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png 424w, https://substackcdn.com/image/fetch/$s_!BoAm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png 848w, https://substackcdn.com/image/fetch/$s_!BoAm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png 1272w, https://substackcdn.com/image/fetch/$s_!BoAm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d0a418-252a-4176-bf6d-7c826cafe667_712x356.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The top 20 youngest athletes are participating in sports like skateboarding and swimming. The top 20 oldest athletes are participating in equestrian, table tennis, and shooting.</figcaption></figure></div><p>The youngest Olympic athlete in 2024 was China&#8217;s <a href="https://en.wikipedia.org/wiki/Zheng_Haohao">Zheng Haohao</a>, who competed in skateboarding at 11 years old. The oldest Olympic athlete was Australia&#8217;s <a href="https://en.wikipedia.org/wiki/Mary_Hanna">Mary Hanna</a> at 69 years old.</p><p>Other sports with older athletes include shooting and table tennis. Sports such as rhythmic gymnastics and <a href="https://en.wikipedia.org/wiki/Freestyle_BMX">freestyle BMX cycling</a> are composed of younger athletes (barely anyone over 30).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QFZq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QFZq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png 424w, https://substackcdn.com/image/fetch/$s_!QFZq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png 848w, https://substackcdn.com/image/fetch/$s_!QFZq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png 1272w, https://substackcdn.com/image/fetch/$s_!QFZq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QFZq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png" width="847" height="964" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:964,&quot;width&quot;:847,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!QFZq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png 424w, https://substackcdn.com/image/fetch/$s_!QFZq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png 848w, https://substackcdn.com/image/fetch/$s_!QFZq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png 1272w, https://substackcdn.com/image/fetch/$s_!QFZq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7486ddb1-4767-49cb-b421-a054b968ad72_847x964.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The youngest and oldest Olympic athlete per sport, sorted by the age of the youngest athlete. This does not represent the full distribution of ages per sport.</figcaption></figure></div><p></p><p></p><h2>Most countries participate in athletics, swimming, and judo. Few countries participate in water polo and basketball.</h2><p>Athletics <a href="https://en.wikipedia.org/wiki/Athletics_at_the_Summer_Olympics">traces its earliest roots to ancient Greek Olympic events</a>. Swimming has been <a href="https://en.wikipedia.org/wiki/Swimming_at_the_Summer_Olympics">a sport at every modern Summer Olympics</a>. And while <a href="https://en.wikipedia.org/wiki/Judo_at_the_Summer_Olympics">Judo has been part of the Summer Olympics since 1964</a>, it still surprised me that so many different countries compete in this sport.</p><p>Some sports, such as breakdancing (which is debuting for the first time this year!), hockey, water polo, and basketball, have 16 or fewer countries participating.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JCXN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JCXN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png 424w, https://substackcdn.com/image/fetch/$s_!JCXN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png 848w, https://substackcdn.com/image/fetch/$s_!JCXN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png 1272w, https://substackcdn.com/image/fetch/$s_!JCXN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JCXN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png" width="712" height="920" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4960e82b-a892-410f-8938-24656246854a_712x920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:920,&quot;width&quot;:712,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JCXN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png 424w, https://substackcdn.com/image/fetch/$s_!JCXN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png 848w, https://substackcdn.com/image/fetch/$s_!JCXN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png 1272w, https://substackcdn.com/image/fetch/$s_!JCXN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4960e82b-a892-410f-8938-24656246854a_712x920.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Number of unique countries participating in each Olympics athletic discipline.</figcaption></figure></div><p></p><h2>A few countries send many athletes, while many countries send a few athletes</h2><p>I sorted countries by the number of athletes it sent to the Olympics. While some countries send hundreds of athletes, there is a long tail of countries that send very few athletes. In fact, four countries (Nauru, Liechtenstein, Somalia, and Belize) only send 1 athlete.</p><p>Another way to highlight this disparity is using numbers: 100 countries sent 10 or fewer athletes. This is compared to the only 32 countries that sent 100 or more athletes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wwfv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wwfv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png 424w, https://substackcdn.com/image/fetch/$s_!wwfv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png 848w, https://substackcdn.com/image/fetch/$s_!wwfv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png 1272w, https://substackcdn.com/image/fetch/$s_!wwfv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wwfv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png" width="914" height="325" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:325,&quot;width&quot;:914,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wwfv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png 424w, https://substackcdn.com/image/fetch/$s_!wwfv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png 848w, https://substackcdn.com/image/fetch/$s_!wwfv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png 1272w, https://substackcdn.com/image/fetch/$s_!wwfv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28880949-62e3-44a3-819f-e3b7cf96fe77_914x325.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Most countries don&#8217;t send that many athletes. A few countries send many athletes, while many countries send a few athletes.</figcaption></figure></div><p></p><h2>Countries that send 10 or fewer athletes participate in individual combative sports</h2><p>Countries that send 10 or fewer athletes participate in more individual sports, which might seem obvious. What surprised me was that these sports tended to be combative or martial: judo, shooting, taekwondo, boxing, and wrestling. </p><p>Countries that send 100 or more athletes participate in more team sports, such as hockey, rowing, football, handball, and volleyball.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NTyB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NTyB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png 424w, https://substackcdn.com/image/fetch/$s_!NTyB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png 848w, https://substackcdn.com/image/fetch/$s_!NTyB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png 1272w, https://substackcdn.com/image/fetch/$s_!NTyB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NTyB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png" width="926" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:926,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NTyB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png 424w, https://substackcdn.com/image/fetch/$s_!NTyB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png 848w, https://substackcdn.com/image/fetch/$s_!NTyB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png 1272w, https://substackcdn.com/image/fetch/$s_!NTyB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd947dd03-55df-4ad4-a465-ebfff1c345bf_926x356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Countries that send 10 or fewer athletes tend to participate in different sports compared to countries that send 100 or more athletes.</figcaption></figure></div><p></p><h2>Countries that send 10 or fewer athletes send more men than women</h2><p>For the countries that send 10 or fewer athletes, there are overall more men (56.3%) compared to women (43.7%). This gender skew is obvious when compared to countries that send 100 or more athletes, for which the athletes are at near gender parity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H6i0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H6i0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png 424w, https://substackcdn.com/image/fetch/$s_!H6i0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png 848w, https://substackcdn.com/image/fetch/$s_!H6i0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png 1272w, https://substackcdn.com/image/fetch/$s_!H6i0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H6i0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png" width="712" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6435aab-8886-46a4-84e0-14941066840d_712x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:712,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H6i0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png 424w, https://substackcdn.com/image/fetch/$s_!H6i0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png 848w, https://substackcdn.com/image/fetch/$s_!H6i0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png 1272w, https://substackcdn.com/image/fetch/$s_!H6i0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6435aab-8886-46a4-84e0-14941066840d_712x356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Countries that send 10 or fewer athletes send more male athletes (56.3%) than female. Countries that send 100 or more athletes are closer to gender parity.</figcaption></figure></div><p></p><h2>Only female athletes participate in Rhythmic Gymnastics and Artistic Swimming, while wrestling is 67% male </h2><p>The discipline with the fewest women is Wrestling (32.53%), followed by equestrian and football. </p><p>Artistic Swimming and Rhythmic Gymnastics have no male athletes. </p><p>All other disciplines are almost at 50/50 gender parity. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_d2O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_d2O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png 424w, https://substackcdn.com/image/fetch/$s_!_d2O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png 848w, https://substackcdn.com/image/fetch/$s_!_d2O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png 1272w, https://substackcdn.com/image/fetch/$s_!_d2O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_d2O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png" width="928" height="712" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:712,&quot;width&quot;:928,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_d2O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png 424w, https://substackcdn.com/image/fetch/$s_!_d2O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png 848w, https://substackcdn.com/image/fetch/$s_!_d2O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png 1272w, https://substackcdn.com/image/fetch/$s_!_d2O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4478a4c6-6fd4-4fba-9e0c-9ff50bbf4fd1_928x712.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gender distribution of male/female athletes per sport. This figure only shows sports for which the absolute difference of the male/female percentage was greater than 1%.</figcaption></figure></div><p></p><h2>Less economically developed countries send more male athletes than female athletes</h2><p>I merged <a href="https://en.wikipedia.org/wiki/Gross_domestic_product">Gross Domestic Product</a> (GDP) data (obtained from the <a href="https://unstats.un.org/unsd/snaama/Basic">United Nations Statistics Division</a>), used as a proxy for a country&#8217;s economic health, with athletes&#8217; country information. </p><p>I separated countries into quantiles based on their GDP and showed the median number of athletes sent by countries in those quantiles, disaggregated by gender. There is a trend of less economically developed countries (in the lower 20% quantiles) sending fewer female and more male athletes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bFNg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bFNg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png 424w, https://substackcdn.com/image/fetch/$s_!bFNg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png 848w, https://substackcdn.com/image/fetch/$s_!bFNg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png 1272w, https://substackcdn.com/image/fetch/$s_!bFNg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bFNg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png" width="784" height="424" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:424,&quot;width&quot;:784,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bFNg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png 424w, https://substackcdn.com/image/fetch/$s_!bFNg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png 848w, https://substackcdn.com/image/fetch/$s_!bFNg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png 1272w, https://substackcdn.com/image/fetch/$s_!bFNg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36c332e6-4007-4a0c-9887-0bd7a11bc3ae_784x424.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Median number of athletes sent by countries per GPD quantile. The error bars show the interquartile range (25%-75%).</figcaption></figure></div><p></p><h2>Larger and richer countries send more athletes</h2><p>While it&#8217;s not surprising, it is important sometimes to show what&#8217;s clearly in the data &#8212; that larger and more economically developed countries do send more athletes. A country has to have both the economic resources and the population to be able to send more athletes to the Olympic games.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HaAZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HaAZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png 424w, https://substackcdn.com/image/fetch/$s_!HaAZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png 848w, https://substackcdn.com/image/fetch/$s_!HaAZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png 1272w, https://substackcdn.com/image/fetch/$s_!HaAZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HaAZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png" width="856" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:856,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HaAZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png 424w, https://substackcdn.com/image/fetch/$s_!HaAZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png 848w, https://substackcdn.com/image/fetch/$s_!HaAZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png 1272w, https://substackcdn.com/image/fetch/$s_!HaAZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acc48db-d7f3-4f9e-8ff6-c305d0728a2e_856x356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">As countries have higher GDP or larger population, they send more athletes to the Olympics.</figcaption></figure></div><p></p><h2>The Pisces are not diving and swimming, but rather playing badminton and rugby</h2><p>Inspired in part by <a href="https://www.teenvogue.com/story/what-olympic-sport-you-could-win-based-on-your-zodiac-sign">What Olympic Sport You Could Win Based on Your Zodiac Sign</a>, I showed the proportion of Olympians with each Zodiac sign per sport. Larger circles correspond with a higher percentage of that Zodiac sign within that sport.</p><p>Everyone knows Pisces are fish, so the ideal sports for them might be swimming or diving. However, diving is dominated by Virgos, which surprised me, while the Pisces are busy playing badminton and rugby.  </p><p>The Libras don&#8217;t seem to be really good at anything in particular.</p><p>I&#8217;ll let the astrology nerds dissect this plot further.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gJdR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gJdR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png 424w, https://substackcdn.com/image/fetch/$s_!gJdR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png 848w, https://substackcdn.com/image/fetch/$s_!gJdR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png 1272w, https://substackcdn.com/image/fetch/$s_!gJdR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gJdR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png" width="1072" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:1072,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:164213,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gJdR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png 424w, https://substackcdn.com/image/fetch/$s_!gJdR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png 848w, https://substackcdn.com/image/fetch/$s_!gJdR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png 1272w, https://substackcdn.com/image/fetch/$s_!gJdR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ff45c9-68bb-4094-aeb0-1636ffbd36da_1072x757.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/olympics-data-viz?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/olympics-data-viz?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h1>But what about the medals? &#127941;</h1><p>In this article, I mainly focused on the athletes participating in the Olympic games.</p><p>Don&#8217;t worry &#8212; I&#8217;m already thinking of a Part 2, focusing on the athletes who won medals. What are some questions you&#8217;d like to have answered in the data? What else do you want to see? Let me know in the comments!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/olympics-data-viz/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/olympics-data-viz/comments"><span>Leave a comment</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Art Fish Intelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p><h2><strong>Citation</strong></h2><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "Visualizing Data from the 2024 Paris Olympics &#8212; Part 1", Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024olympicsdataviz,
    author = {Yennie Jun},
    title = {Visualizing Data from the 2024 Paris Olympics &#8212; Part 1},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/olympics-data-viz}
}</code></code></pre>]]></content:encoded></item><item><title><![CDATA[Evaluating long context large language models]]></title><description><![CDATA[There is a race towards language models with longer context windows. But how good are they, and how can we know?]]></description><link>https://www.artfish.ai/p/long-context-llms</link><guid isPermaLink="false">https://www.artfish.ai/p/long-context-llms</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Thu, 01 Aug 2024 15:11:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!o_4X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o_4X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o_4X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png 424w, https://substackcdn.com/image/fetch/$s_!o_4X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png 848w, https://substackcdn.com/image/fetch/$s_!o_4X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png 1272w, https://substackcdn.com/image/fetch/$s_!o_4X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o_4X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png" width="1456" height="929" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:929,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o_4X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png 424w, https://substackcdn.com/image/fetch/$s_!o_4X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png 848w, https://substackcdn.com/image/fetch/$s_!o_4X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png 1272w, https://substackcdn.com/image/fetch/$s_!o_4X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0af585-9d00-4f78-9eb6-0e84d5b96162_1600x1021.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The context window of language models have been growing at an exponential rate in the last few years. Figure created by the author.</figcaption></figure></div><h1><strong>Introduction</strong></h1><p>The context window of large language models &#8211; the amount of text they can process at once &#8211; has been increasing at an exponential rate.&nbsp;</p><p>In 2018, language models like<a href="https://arxiv.org/abs/1810.04805"> BERT</a>,<a href="https://arxiv.org/abs/1910.10683"> T5</a>, and<a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf"> GPT-1</a> could take up to 512 tokens as input. Now, in summer of 2024, this number has jumped to 2 million tokens (in publicly available LLMs). But what does this mean for us, and how do we&nbsp;evaluate these increasingly capable models?&nbsp;</p><p></p><h3><strong>What does a large context window mean?</strong></h3><p>The recently released <a href="https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/">Gemini 1.5 Pro model can take in up to 2 million tokens</a>. But what does 2 million tokens even mean?</p><p>If we estimate 4 words to roughly equal about 3 tokens, it means that 2 million tokens can (<em>almost</em>) fit the entire Harry Potter and Lord of the Ring series.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yeaL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yeaL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png 424w, https://substackcdn.com/image/fetch/$s_!yeaL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png 848w, https://substackcdn.com/image/fetch/$s_!yeaL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png 1272w, https://substackcdn.com/image/fetch/$s_!yeaL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yeaL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png" width="1456" height="1015" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1015,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yeaL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png 424w, https://substackcdn.com/image/fetch/$s_!yeaL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png 848w, https://substackcdn.com/image/fetch/$s_!yeaL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png 1272w, https://substackcdn.com/image/fetch/$s_!yeaL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81dc4b17-6aed-4e79-b919-c85699651293_1600x1115.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A diagram showing how many Harry Potter and Lord of the Rings books can fit into a Gemini 1.5&#8217;s 2 million context window. This figure is in part inspired by this <a href="https://www.reddit.com/r/OpenAI/comments/1buz5ju/geminis_context_window_is_much_larger_than_anyone/">amazing infographic</a> from March 2024. Figure created by the author.</figcaption></figure></div><p>These numbers refer to the context windows that are available in publicly available models. The Gemini 1.5 Pro model, while currently publicly available with up to a 2 million token context window, can<a href="https://www.notion.so/Long-Context-Eval-Survey-fe3c69173f2e4eb0b5cd4c973f712626?pvs=21"> work with up to 10 million tokens</a>.</p><p>This means, as stated by a Reddit user, that<a href="https://www.reddit.com/r/singularity/comments/1ausp2k/geminis_nearly_perfect_10_million_context_length/"> one can fit 1000 scientific papers into Gemini&#8217;s 10 million context window to create novel research</a>.</p><p></p><h3><strong>Why does this matter?</strong></h3><p>Larger context windows aren&#8217;t just a way for companies building LLMs to compete with each other. The implications and real-world scenarios of how models with long contexts can be used are many. Consider the following scenarios:</p><ul><li><p>Legal research: Lawyers can input entire case histories, precedents, and statutes into a model, getting comprehensive analysis in seconds instead of hours or days of manual review.</p></li><li><p>Financial analysis: Imagine feeding years of financial reports, market trends, and economic indicators into an AI for instant, in-depth insights.</p></li><li><p>Medical diagnostics: Doctors could input a patient's entire medical history, including test results, treatment records, and high-resolution medical imaging scans, for more accurate diagnoses and personalized treatment plans.</p></li><li><p>Education: Students could input entire textbooks and course materials, getting tailored explanations and connections across subjects.</p></li></ul><p>However, these use cases also raise concerns. The ability to process vast amounts of personal data could enable unprecedented levels of surveillance and privacy invasion if misused. As these capabilities grow, so does the need for robust ethical guidelines and safeguards.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><p></p><h1><strong>How do we evaluate LLMs as the context window gets longer and longer?</strong></h1><p>Models with extremely long context windows are a recent development. Therefore, researchers have tried to come up with new ways of evaluating how good these models are. These evaluations aim to benchmark the capabilities and limits of long context models, as well as measure the tradeoffs that come with scaling up context windows.</p><p>The core idea is that models with longer input contexts should be able to perform tasks that were previously too difficult or impossible.&nbsp;</p><h4><strong>Evaluation Use Cases</strong></h4><p>In this article, I&#8217;ll cover three different ways that researchers are thinking about evaluating long context models:</p><ol><li><p>Information retrieval from long documents</p></li><li><p>Complex Analysis (reasoning and summarizing) of Long Documents</p></li><li><p>In-context learning for "on the fly" model training</p></li></ol><p><em>Note: This is not an exhaustive list. For a comprehensive overview of long-context benchmarks, refer to the <a href="https://github.com/Xnhyacinth/Awesome-LLM-Long-Context-Modeling?tab=readme-ov-file#11-Benchmark-and-Evaluation">Awesome LLM Long Context Modeling Github page</a>.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/long-context-llms?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/long-context-llms?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2><strong>1. Information retrieval from long documents</strong></h2><p>The "<a href="https://github.com/gkamradt/LLMTest_NeedleInAHaystack">Needle in a Haystack</a>" test, introduced by <a href="https://twitter.com/GregKamradt">Greg Kamradt</a>, is a popular method for evaluating information retrieval in long contexts. It involves placing an out-of-place statement (the "needle") at various depths within longer text snippets (the "haystack").&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nO9w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nO9w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png 424w, https://substackcdn.com/image/fetch/$s_!nO9w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png 848w, https://substackcdn.com/image/fetch/$s_!nO9w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png 1272w, https://substackcdn.com/image/fetch/$s_!nO9w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nO9w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png" width="1456" height="568" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:568,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nO9w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png 424w, https://substackcdn.com/image/fetch/$s_!nO9w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png 848w, https://substackcdn.com/image/fetch/$s_!nO9w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png 1272w, https://substackcdn.com/image/fetch/$s_!nO9w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a4cacb-f793-400b-b8a5-f7ff16c7f94c_1502x586.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>An example of Needle in a Haystack by inserting "The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day" into Paul Graham's essays.</em></figcaption></figure></div><p>This test measures how effectively LLMs can locate specific information within increasingly large contexts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PSVd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PSVd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png 424w, https://substackcdn.com/image/fetch/$s_!PSVd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png 848w, https://substackcdn.com/image/fetch/$s_!PSVd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png 1272w, https://substackcdn.com/image/fetch/$s_!PSVd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PSVd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PSVd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png 424w, https://substackcdn.com/image/fetch/$s_!PSVd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png 848w, https://substackcdn.com/image/fetch/$s_!PSVd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png 1272w, https://substackcdn.com/image/fetch/$s_!PSVd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2db43e7b-e61c-4425-b1f2-2b801ea75f97_1600x895.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The original &#8220;Needle in a Haystack&#8221; diagram created by <a href="https://twitter.com/GregKamradt">Greg Kamradt</a> to pressure test LLMs&#8217; abilities to retrieve deeply buried information. By placing the out-of-place sentence (&#8221;the needle&#8221;) at various depths within snippets of various lengths (&#8221;the haystack&#8221;), one could measure how well different LLMs could locate these pieces of information.</figcaption></figure></div><p></p><h4><strong>Variations on Needle in a Haystack</strong></h4><p>Researchers have developed several variations to test different aspects of information retrieval:</p><ul><li><p>Multiple needles: Multiple facts sprinkled throughout long documents (introduced by <a href="https://blog.langchain.dev/multi-needle-in-a-haystack/">Langchain</a> and in <a href="https://arxiv.org/abs/2407.11963">NeedleBench</a>)</p></li><li><p><a href="https://arxiv.org/abs/2406.11230">Multimodal Needle in a Haystack</a>: Finding a target image within a set of unrelated images based on a caption&nbsp;</p></li><li><p>Audio-based: Identifying a short audio clip within a signal up to five days long (introduced in the <a href="https://arxiv.org/abs/2403.05530">Gemini 1.5 technical report</a>). In this test, a short clip of audio where a speaker says &#8220;the secret keyword is needle&#8221; was hidden within an audio signal of up to almost <em>five days long</em> (or, 107 hours).</p></li><li><p>Video-based: Locating a single frame with specific text in a 10.5-hour video (<a href="https://arxiv.org/abs/2403.05530">Gemini 1.5 technical report</a>). In this test, a single frame with the text &#8220;The secret word is needle&#8221; was buried within a 10.5 hour video constructed from concatenating seven copies of the full AlphaGo documentary.<br></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MR2B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MR2B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png 424w, https://substackcdn.com/image/fetch/$s_!MR2B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png 848w, https://substackcdn.com/image/fetch/$s_!MR2B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png 1272w, https://substackcdn.com/image/fetch/$s_!MR2B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MR2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png" width="1302" height="746" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:746,&quot;width&quot;:1302,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MR2B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png 424w, https://substackcdn.com/image/fetch/$s_!MR2B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png 848w, https://substackcdn.com/image/fetch/$s_!MR2B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png 1272w, https://substackcdn.com/image/fetch/$s_!MR2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19312abf-b91c-403f-8e22-9b73e422ed65_1302x746.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Video-based Needle in a Haystack was introduced in the Gemini 1.5 paper. Image from <a href="https://arxiv.org/abs/2403.05530">Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context</a> (p. 110).&nbsp;</figcaption></figure></div><p></p><h4><strong>Limitations and Implications</strong></h4><p>While widely used, the Needle in a Haystack approach has several limitations:</p><ul><li><p>It's an artificial task that may not reflect real-world use cases.</p></li><li><p>It only assesses information finding, not reasoning or comprehension.</p></li><li><p>As context windows grow, evaluating all combinations of "haystack" sizes and "needle" locations becomes increasingly costly.</p></li></ul><p>Despite these drawbacks, the test highlights a crucial capability of long-context models: the ability to quickly search and retrieve information from vast amounts of data. This has significant implications, from enhancing research efficiency to enabling unprecedented levels of data analysis &#8212; and potentially, surveillance.</p><p>It's important to note that this type of information retrieval differs from Retrieval-Augmented Generation (RAG) in that it operates within a single, extensive context rather than retrieving information from external sources.</p><p></p><h2><strong>2. Complex Analysis (reasoning and summarizing) of Long Documents</strong></h2><p>While "Needle in a Haystack" tests focus on information retrieval, other evaluations assess an LLM's ability to reason over, interpret, and synthesize information from extensive content. These evaluations aim to test for more complex forms of reasoning rather than just pinpoint data location.</p><p>Here are a few evaluation methods in this category:</p><h4><strong>Literary QA Tasks</strong> </h4><p>Books are prime examples of long documents. Benchmarks like <a href="https://arxiv.org/pdf/2403.12766">NOVELQA</a> test models' ability to reason over literary fiction, using up to 200K token lengths. It includes human-generated questions about 88 English-language novels, both public domain and copyrighted. Other datasets, like <a href="https://arxiv.org/abs/2406.16264">NoCha</a>, follow similar approaches.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X8gT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X8gT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png 424w, https://substackcdn.com/image/fetch/$s_!X8gT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png 848w, https://substackcdn.com/image/fetch/$s_!X8gT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png 1272w, https://substackcdn.com/image/fetch/$s_!X8gT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X8gT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png" width="1456" height="626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X8gT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png 424w, https://substackcdn.com/image/fetch/$s_!X8gT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png 848w, https://substackcdn.com/image/fetch/$s_!X8gT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png 1272w, https://substackcdn.com/image/fetch/$s_!X8gT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ab6a04-5720-4bf7-bf08-b0e5466c0784_1540x662.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Caption: A diagram showing two sample questions from the NovelQA dataset, from <a href="https://arxiv.org/pdf/2403.12766">NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens</a>.</figcaption></figure></div><p></p><h4><strong>Reasoning over long texts with hidden relevant information</strong></h4><p><a href="https://arxiv.org/pdf/2402.14848v1">FlenQA</a> creates multiple versions of contexts at different lengths by embedding relevant information within longer, irrelevant texts. This helps assess how LLM performance degrades as context length increases.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kjuK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kjuK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png 424w, https://substackcdn.com/image/fetch/$s_!kjuK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png 848w, https://substackcdn.com/image/fetch/$s_!kjuK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png 1272w, https://substackcdn.com/image/fetch/$s_!kjuK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kjuK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png" width="419" height="266.6363636363636" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:658,&quot;width&quot;:1034,&quot;resizeWidth&quot;:419,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kjuK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png 424w, https://substackcdn.com/image/fetch/$s_!kjuK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png 848w, https://substackcdn.com/image/fetch/$s_!kjuK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png 1272w, https://substackcdn.com/image/fetch/$s_!kjuK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e58d4c-0b69-4c3d-aad6-006875cbfa53_1034x658.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example of one of the tasks in FlenQA, in which relevant information (dark red) is embedded among irrelevant information. Diagram from <a href="https://arxiv.org/pdf/2402.14848v1">Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models</a>.</figcaption></figure></div><p></p><h4><strong>Domain-Specific Reasoning</strong></h4><ul><li><p>Healthcare: The <a href="https://arxiv.org/pdf/2401.14490">LongHealth</a> benchmark uses 20 fictional patient cases (5-7K words each) to test medical reasoning.</p></li><li><p>Finance: <a href="https://arxiv.org/pdf/2401.06915">DocFinQA</a> challenges models with financial documents up to 150 pages long (100K+ tokens).</p></li></ul><p></p><h4><strong>Summarization Tasks</strong> </h4><p>Summarizing long documents is a crucial capability for LLMs, as it allows users to quickly grasp key information from extensive texts without reading everything. This skill is particularly valuable in fields like research, business, and law, where professionals often need to distill large volumes of information into concise reports.</p><p>However, evaluating summarization quality is challenging. Unlike simple information retrieval, summarization requires a deep understanding of the entire context and the ability to identify and synthesize key points. What makes a "good" summary can be subjective and context-dependent.</p><p>Current evaluation methods often rely on comparing model-generated summaries to human-written references, but this approach has limitations. It may not capture the full range of valid summarization strategies and can miss semantically correct summaries that use different wording.</p><p>Benchmarks like <a href="https://arxiv.org/pdf/2308.14508">LongBench</a> and <a href="https://arxiv.org/pdf/2402.13718">&#8734;Bench</a> attempt to address some of these challenges. LongBench includes summarization tasks for various document types (government reports, meeting notes, news articles) up to 15K words, while &#8734;Bench pushes the boundaries with novel summarization tasks up to 100K tokens. These benchmarks are valuable, but the field is still working towards more robust evaluation methods that can better capture the nuances of high-quality summarization.</p><p><em>A great resource to learn more on this topic can be found in <a href="https://dl.acm.org/doi/10.1145/3545176">An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics</a>.</em></p><p></p><h2><strong>3. &#8220;On the fly&#8221; model training</strong></h2><p>One of the coolest applications of long context models is the expanded capacity for in-context learning (ICL). ICL allows models to learn new tasks on the fly, directly from examples provided in the prompt. With larger context windows, we can now include hundreds of training examples or even complex, lengthy examples like summarization tasks.</p><p>This capability is a game-changer. Instead of fine-tuning models for specific domains, one can leverage ICL to adapt models to new tasks instantly. </p><h4><strong>Many-shot ICL</strong></h4><p>DeepMind&#8217;s work on <a href="https://arxiv.org/pdf/2404.11018">Many-Shot In-Context Learning</a> showed that including many more examples in the prompt significantly boosted performance across various tasks. By scaling ICL to hundreds or thousands of examples, models can overcome pre-training biases and tackle more complex challenges.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OSzk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OSzk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png 424w, https://substackcdn.com/image/fetch/$s_!OSzk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png 848w, https://substackcdn.com/image/fetch/$s_!OSzk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png 1272w, https://substackcdn.com/image/fetch/$s_!OSzk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OSzk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png" width="1456" height="517" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:517,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OSzk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png 424w, https://substackcdn.com/image/fetch/$s_!OSzk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png 848w, https://substackcdn.com/image/fetch/$s_!OSzk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png 1272w, https://substackcdn.com/image/fetch/$s_!OSzk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5bb7c85-9de2-480f-948b-071c3d47eef2_1600x568.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">By including many more examples (or &#8220;shots&#8221;) in the prompt, the same LLM can show improvement across many different tasks. For example, going from 32 examples of sentiment analysis to 2048 in the prompt improved the LLM&#8217;s performance 18.2%. Diagram from <a href="https://arxiv.org/pdf/2404.11018">Many-Shot In-Context Learning</a>.</figcaption></figure></div><p>This principle extends beyond just improving performance. Anthropic's work on <a href="https://www-cdn.anthropic.com/af5633c94ed2beb282f6a53c595eb437e8e7b630/Many_Shot_Jailbreaking__2024_04_02_0936.pdf">Many-shot Jailbreaking</a> demonstrated that while a few examples couldn't compromise a model's safety guardrails, hundreds of examples could &#8212; highlighting both the power and potential risks of this approach.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Otde!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Otde!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png 424w, https://substackcdn.com/image/fetch/$s_!Otde!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png 848w, https://substackcdn.com/image/fetch/$s_!Otde!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png 1272w, https://substackcdn.com/image/fetch/$s_!Otde!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Otde!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png" width="944" height="602" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:602,&quot;width&quot;:944,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Otde!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png 424w, https://substackcdn.com/image/fetch/$s_!Otde!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png 848w, https://substackcdn.com/image/fetch/$s_!Otde!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png 1272w, https://substackcdn.com/image/fetch/$s_!Otde!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bc8c60a-e609-42b6-a22e-319480ee40cd_944x602.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example showing how a handful of few-shot examples cannot get an LLM to generate harmful content, but tens or hundreds of examples can get it to override its safety training. Diagram from <a href="https://www-cdn.anthropic.com/af5633c94ed2beb282f6a53c595eb437e8e7b630/Many_Shot_Jailbreaking__2024_04_02_0936.pdf">Many-Shot Jailbreaking</a>.</figcaption></figure></div><p></p><h4><strong>Translating low-resource languages</strong></h4><p>Long-context models are also proving particularly valuable for low-resource language translation. The <a href="https://arxiv.org/abs/2403.05530">Gemini 1.5 technical report</a> showcased this potential with the Kalamang language, which has fewer than 200 speakers and minimal web presence. By inputting a 500-page grammar, a 2000-entry bilingual wordlist, and 400 parallel sentences (totaling 250k tokens), the model could translate and even transcribe Kalamang speech.</p><p>This approach extends to other low-resource languages too, with performance improving as more examples are provided. It's a promising development for preserving and working with endangered languages.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/long-context-llms?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/long-context-llms?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h1><strong>Discussion</strong></h1><p>The race for longer context windows in language models is accelerating, with context window sizes growing at an exponential rate. This growth necessitates new evaluation methods to properly assess these models' capabilities and limitations.</p><p>While numerous benchmarks for long context evaluation have emerged (e.g., <a href="https://arxiv.org/abs/2201.03533">SCROLLS</a>, <a href="https://arxiv.org/pdf/2308.14508">LongBench</a>, <a href="https://arxiv.org/pdf/2402.13718">&#8734;BENCH</a>), many questions remain unanswered:</p><ul><li><p>Trade-offs of scaling: How do safety, bias, and instruction-following change as context length increases?</p></li><li><p>Multilingual performance: Most benchmarks focus on English (with the exception of benchmarks like <a href="https://arxiv.org/abs/2403.03514">CLongEval</a>, which includes evaluation in Chinese as well). How does performance in other languages change with longer contexts, compared to English?</p></li><li><p>Potential degradation: Do certain capabilities (like coding skills or creativity) suffer as models handle more context?</p></li><li><p>Real-world implications: As models can process entire books, personal histories, or comprehensive data on low-resource languages, what are the ethical and practical consequences?</p></li></ul><p>As LLMs&#8217; context windows continue to grow, we need to understand not just what these models can do, but also how their fundamental characteristics may be changing.&nbsp;</p><p>For now, the race towards  models with larger and larger context windows will continue.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/long-context-llms/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/long-context-llms/comments"><span>Leave a comment</span></a></p><p></p><p></p><h2><strong>Citation</strong></h2><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "Evaluating long context large language models", Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024longcontextllms,
    author = {Yennie Jun},
    title = {Evaluating long context large language models", Art Fish Intelligence},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/long-context-llms}
}</code></code></pre><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The total word count of all seven books in the Harry Potter series is<a href="https://brokebybooks.com/the-word-count-of-175-favorite-novels/"> 1,084,625</a>. The total word count of all seven books in the Lord of the Ring series is<a href="https://brokebybooks.com/the-word-count-of-175-favorite-novels/#:~:text=Narnia%20series%20is-,345%2C535,-.%20That%E2%80%99s%20approximately%20the"> </a><a href="https://www.reddit.com/r/todayilearned/comments/fnheyt/til_that_tolkeins_lord_of_the_rings_trilogy_has_a/">481,103</a>. (1,084,625 + 481,103) * 4 / 3 = <strong>2087637.3. </strong>Therefore, Gemini&#8217;s 2M context could contain the entire Harry Potter and Lord of the Ring series minus the last half of the Return of the King.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Dealing with cognitive dissonance, the AI way]]></title><description><![CDATA[How do language models handle conflicting instructions in its prompt?]]></description><link>https://www.artfish.ai/p/dealing-with-cognitive-dissonance</link><guid isPermaLink="false">https://www.artfish.ai/p/dealing-with-cognitive-dissonance</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Fri, 05 Jul 2024 15:11:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WvZ4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WvZ4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WvZ4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png 424w, https://substackcdn.com/image/fetch/$s_!WvZ4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png 848w, https://substackcdn.com/image/fetch/$s_!WvZ4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!WvZ4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WvZ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png" width="1456" height="1043" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1043,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293744,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WvZ4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png 424w, https://substackcdn.com/image/fetch/$s_!WvZ4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png 848w, https://substackcdn.com/image/fetch/$s_!WvZ4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!WvZ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719429d6-6d13-4bb6-b4ab-43efdd8bbffe_1502x1076.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Given contradictory instructions in the system message, the prompt, and examples, which instructions will an LLM follow in its response?</figcaption></figure></div><p>How do language models handle conflicting instructions in its prompt?</p><p><a href="https://en.wikipedia.org/wiki/Cognitive_dissonance">Cognitive dissonance</a> is a psychological term describing the mental discomfort experienced by an individual holding two or more contradictory beliefs. For example, if you&#8217;re at the grocery store and see a checkout lane for &#8220;10 items or fewer&#8221; but everyone in the line has 10 or more items, what are you supposed to do?</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Within the context AI, I wanted to know how large language models (LLMs) deal with cognitive dissonance in the form of contradictory instructions (for example, prompting an LLM to translate from English to Korean, but giving examples of English to French translations<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>).</p><p>In this article, I conduct experiments by providing LLMs with contradictory information to ascertain which of the contradictory information LLMs are more likely to align with.</p><div><hr></div><h3>System message, prompt instructions, and few-shot examples</h3><p>As a user, you can tell an LLM what to do in one of three ways:</p><ul><li><p>Directly describing the task in the system message </p></li><li><p>Directly describing the task in the normal prompt</p></li><li><p>Demonstrating a few examples of what &#8220;correct behavior&#8221; would look like</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xzxw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xzxw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png 424w, https://substackcdn.com/image/fetch/$s_!Xzxw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png 848w, https://substackcdn.com/image/fetch/$s_!Xzxw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png 1272w, https://substackcdn.com/image/fetch/$s_!Xzxw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xzxw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png" width="741" height="460" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:460,&quot;width&quot;:741,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75342,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xzxw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png 424w, https://substackcdn.com/image/fetch/$s_!Xzxw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png 848w, https://substackcdn.com/image/fetch/$s_!Xzxw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png 1272w, https://substackcdn.com/image/fetch/$s_!Xzxw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d035a13-e222-4cd3-ad4e-764688961caa_741x460.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The different ways to give instructions to a language model. You can specify one or more of any of these methods.</figcaption></figure></div><p>The <strong>system message</strong> is the most mysterious of all (in my opinion). According to <a href="https://microsoft.github.io/Workshop-Interact-with-OpenAI-models/Part-2-labs/System-Message/">Microsoft</a>, &#8220;The system message is used to communicate instructions or provide context to the model at the beginning of a conversation.&#8221;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>As far as I know, it&#8217;s unclear how much the system message affects the prompt (vs putting the system message directly into the prompt). At least, I haven&#8217;t seen any in-depth studies about this.</p><p>The <strong>prompt instruction</strong> is commonly used to instruct the model on what to do, such as &#8220;Translate from English to French&#8221; or &#8220;Copy edit all of the grammatical errors in my essay&#8221; or &#8220;Write me code to solve the following problem.&#8221;</p><p>The <strong>few-shot examples</strong> are optional demonstrations to the model showing what correct answers for similar inputs would look like.</p><p>Based on these definitions, I wanted to know:</p><ul><li><p>How much do few-shot examples actually matter? If you give a contradictory instruction in the prompt, are LLMs more likely to follow the examples or the instructions?</p></li><li><p>How much does the system message actually matter? If you give one instruction in the system message and another in the normal prompt, which instruction are LLMs more likely to follow?</p></li></ul><p>To test these questions <strong>I constructed a mini dataset (<a href="https://docs.google.com/spreadsheets/d/1y1NjamUSqiLzgpHpUwu8cAcknuguMl1obtgj0Tzro9k/edit?usp=sharing">available here</a>)</strong> of several simple tasks with conflicting instructions and few-shot examples. Throughout the rest of this article, I&#8217;ll showcase a single example of translating from English into various languages.</p><p>The following experiments were conducted on <a href="https://openai.com/index/hello-gpt-4o/">OpenAI&#8217;s GPT-4o</a> and <a href="https://www.anthropic.com/news/claude-3-5-sonnet">Anthropic&#8217;s newest Claude-3.5</a> models.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><p></p><h2><strong>Experiment 1: Prompt instructions with contradictory few shot examples</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KfQl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KfQl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png 424w, https://substackcdn.com/image/fetch/$s_!KfQl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png 848w, https://substackcdn.com/image/fetch/$s_!KfQl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png 1272w, https://substackcdn.com/image/fetch/$s_!KfQl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KfQl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png" width="1448" height="868" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:868,&quot;width&quot;:1448,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249604,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KfQl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png 424w, https://substackcdn.com/image/fetch/$s_!KfQl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png 848w, https://substackcdn.com/image/fetch/$s_!KfQl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png 1272w, https://substackcdn.com/image/fetch/$s_!KfQl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28768e57-fc70-46f9-b357-4ee9ed203425_1448x868.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example from Experiment 1, where the <strong>prompt</strong> instructions contradict the provided few shot examples.</figcaption></figure></div><p>When an LLM is given prompt instructions that contradict the few shot examples, its behavior is difficult to predict. The results show that <strong>the models have no clear preference for following prompt instructions or few shot examples given a contradiction.</strong></p><p>GPT-4o is more likely to follow the examples set by the few shot demonstrations while ignoring the prompt instructions (or, in some cases, error cases where the model responds by failing to correctly answer any of the contradictory instructions). Claude-3.5 follows the prompt instructions or few shot examples with almost random chance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3Mpl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3Mpl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png 424w, https://substackcdn.com/image/fetch/$s_!3Mpl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png 848w, https://substackcdn.com/image/fetch/$s_!3Mpl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png 1272w, https://substackcdn.com/image/fetch/$s_!3Mpl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3Mpl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png" width="854" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba628fbf-9873-4788-a688-38708b39089f_854x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:854,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3Mpl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png 424w, https://substackcdn.com/image/fetch/$s_!3Mpl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png 848w, https://substackcdn.com/image/fetch/$s_!3Mpl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png 1272w, https://substackcdn.com/image/fetch/$s_!3Mpl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba628fbf-9873-4788-a688-38708b39089f_854x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">In the first experiment, the models are given an instruction in the prompt and contradictory few-shot examples. The results show no clear preference for models to follow the prompt instructions or few shot examples.</figcaption></figure></div><p></p><p></p><h2><strong>Experiment 2: System message with contradictory few shot examples</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ccTu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ccTu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png 424w, https://substackcdn.com/image/fetch/$s_!ccTu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png 848w, https://substackcdn.com/image/fetch/$s_!ccTu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png 1272w, https://substackcdn.com/image/fetch/$s_!ccTu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ccTu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png" width="1448" height="932" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:932,&quot;width&quot;:1448,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:626895,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ccTu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png 424w, https://substackcdn.com/image/fetch/$s_!ccTu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png 848w, https://substackcdn.com/image/fetch/$s_!ccTu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png 1272w, https://substackcdn.com/image/fetch/$s_!ccTu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff964749a-8f35-40bc-b181-49dc6d2558e3_1448x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example from Experiment 2, where the <strong>system message</strong> instructions contradict the provided few shot examples.</figcaption></figure></div><p>This experiment was very similar to the previous one, with the difference that the instructions (e.g. &#8220;Translate from English to German&#8221;) were moved from the <em>prompt</em> to the <em>system message</em>. </p><p>For the majority of tasks, <strong>GPT-4o was more likely to follow the instruction in the system message. This is in contrast to its behavior in the earlier experiment, where the same instruction appeared in the normal prompt, in which the model was more likely to follow the few shot examples.</strong></p><p>Claude-3.5, on the other hand, behaved exactly the same as the previous experiment (almost random chance whether it followed the system message or the few shot examples).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!76VK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!76VK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png 424w, https://substackcdn.com/image/fetch/$s_!76VK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png 848w, https://substackcdn.com/image/fetch/$s_!76VK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png 1272w, https://substackcdn.com/image/fetch/$s_!76VK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!76VK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png" width="857" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:857,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!76VK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png 424w, https://substackcdn.com/image/fetch/$s_!76VK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png 848w, https://substackcdn.com/image/fetch/$s_!76VK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png 1272w, https://substackcdn.com/image/fetch/$s_!76VK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F041e411b-70eb-4594-8fce-1cf7ab49d3fe_857x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">In the second experiment, the models are given an instruction in the <strong>system message</strong> and contradictory few-shot examples. GPT-4o is more likely to follow the instructions in the system message, while Claude-3.5 shows no clear preference.</figcaption></figure></div><p>What does this mean? One interpretation is that the instructions in the system message weigh heavier for GPT-4o than instructions in the normal prompt (at least, for these examples). <strong>For Claude, it seems that the system message matters less, playing a similar role as inputting that same message into the prompt.</strong> </p><p></p><p></p><h2>Experiment 3: System message with a contradictory prompt instruction</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Py8G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Py8G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png 424w, https://substackcdn.com/image/fetch/$s_!Py8G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png 848w, https://substackcdn.com/image/fetch/$s_!Py8G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Py8G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Py8G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png" width="1450" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1450,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:380667,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Py8G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png 424w, https://substackcdn.com/image/fetch/$s_!Py8G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png 848w, https://substackcdn.com/image/fetch/$s_!Py8G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Py8G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5b5bfe-91ab-494d-bb7c-1f589c6ab2af_1450x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example from Experiment 3, where the system message instructions contradict the instructions in the prompt.</figcaption></figure></div><p>In this experiment, I removed the few shot experiments. The instructions in the system message and prompt contradict each other. In this setup, both models overwhelmingly follow the instructions in the prompt more than in the system message. </p><p><strong>Given contradictory instructions in the system message and the prompt, both models were more likely to ignore the instructions in the system message.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5skU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5skU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png 424w, https://substackcdn.com/image/fetch/$s_!5skU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png 848w, https://substackcdn.com/image/fetch/$s_!5skU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png 1272w, https://substackcdn.com/image/fetch/$s_!5skU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5skU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png" width="856" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/023462fa-05de-481e-a835-f992af82ef09_856x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:856,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5skU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png 424w, https://substackcdn.com/image/fetch/$s_!5skU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png 848w, https://substackcdn.com/image/fetch/$s_!5skU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png 1272w, https://substackcdn.com/image/fetch/$s_!5skU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023462fa-05de-481e-a835-f992af82ef09_856x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">In the third experiment, the models are given an instruction in the system message and contradictory instruction in the prompt. Both models are more likely to ignore the instructions in the system message and follow the instruction in the prompt.</figcaption></figure></div><p></p><p></p><h2>Experiment 4: System message, prompt, and few shot examples all contradict each other</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zH51!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zH51!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png 424w, https://substackcdn.com/image/fetch/$s_!zH51!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png 848w, https://substackcdn.com/image/fetch/$s_!zH51!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!zH51!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zH51!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png" width="1456" height="1035" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1035,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:687393,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zH51!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png 424w, https://substackcdn.com/image/fetch/$s_!zH51!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png 848w, https://substackcdn.com/image/fetch/$s_!zH51!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!zH51!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52976f88-c917-43cb-95a5-0772ab3e9579_1460x1038.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example from Experiment 4, where the system message instructions, prompt instructions, and few shot examples all contradict each other.</figcaption></figure></div><p>Let&#8217;s be chaotic and confuse the model further. In this experimental setup, the system message instructions, prompt instructions, and few shot examples all contradict each other.</p><p>As you can imagine, <strong>the models&#8217; behaviors are not consistent</strong>. </p><p>What surprised me in the face of all this contradiction was that <strong>GPT-4o was more likely to follow the system message, while Claude-3.5 was more likely to follow the instructions in the prompt.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gHRC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gHRC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png 424w, https://substackcdn.com/image/fetch/$s_!gHRC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png 848w, https://substackcdn.com/image/fetch/$s_!gHRC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png 1272w, https://substackcdn.com/image/fetch/$s_!gHRC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gHRC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png" width="857" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:857,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gHRC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png 424w, https://substackcdn.com/image/fetch/$s_!gHRC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png 848w, https://substackcdn.com/image/fetch/$s_!gHRC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png 1272w, https://substackcdn.com/image/fetch/$s_!gHRC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22370685-be0a-4035-bda1-7ca5ab598df6_857x320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">In the fourth experiment, the models are given contradictory instructions in the system message, prompt, and few shot examples. GPT-4o was more likely to follow the instructions in the system message. Claude-3.5 was more likely to follow the instructions in the prompt.</figcaption></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/dealing-with-cognitive-dissonance?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/dealing-with-cognitive-dissonance?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h1>Discussion and Conclusions</h1><p>In this article, I experimented with providing contradictory instructions to a language model in the system message, prompt, and few shot examples.</p><p>The experiments yielded some contradictory results &#8212; in some cases, a model would be more likely to follow the instructions provided in the system prompt, and yet in experiments with slight variations, this behavior would change. <strong>The system message seemed to have greater influence over GPT-4o&#8217;s responses while having minimal affect on Claude-3.5&#8217;s responses. </strong></p><p><strong>Few shot examples were also important in guiding a model&#8217;s decision </strong>(even if not all of the time).<strong> </strong>The propensity of language models to "learn on the fly&#8221; via few shot examples (a method called <a href="https://thegradient.pub/in-context-learning-in-context/">in-context learning</a>), especially in the face of contradictory instructions, show the strength of these demonstrations. It calls to mind Anthropic&#8217;s recent &#8220;<a href="https://www.anthropic.com/research/many-shot-jailbreaking">Many-shot jailbreaking</a>&#8221; method, which shows that providing a model with enough examples of harmful behavior can steer it to respond in harmful ways, despite having being trained <em>not</em> to produce such responses.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1KH4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1KH4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp 424w, https://substackcdn.com/image/fetch/$s_!1KH4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp 848w, https://substackcdn.com/image/fetch/$s_!1KH4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp 1272w, https://substackcdn.com/image/fetch/$s_!1KH4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1KH4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp" width="1456" height="913" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:913,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A diagram illustrating how many-shot jailbreaking works, with a long script of prompts and a harmful response from an AI.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A diagram illustrating how many-shot jailbreaking works, with a long script of prompts and a harmful response from an AI." title="A diagram illustrating how many-shot jailbreaking works, with a long script of prompts and a harmful response from an AI." srcset="https://substackcdn.com/image/fetch/$s_!1KH4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp 424w, https://substackcdn.com/image/fetch/$s_!1KH4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp 848w, https://substackcdn.com/image/fetch/$s_!1KH4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp 1272w, https://substackcdn.com/image/fetch/$s_!1KH4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe91324c7-1bf5-4fca-8b76-2614655a1313_2200x1380.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Anthropic&#8217;s <a href="https://www.anthropic.com/research/many-shot-jailbreaking">Many-shot jailbreaking</a></figcaption></figure></div><p>The experiments explored in this article were on a small sample of a few manually curated examples. There is still a lot to explore in terms of how language models deal with contradictions provided in different forms in its prompt.</p><p>Using variations of the examples I used in this article, along with different language models, would likely yield vastly different outcomes. It is also likely that the next versions of these models (e.g. the next GPT and Claude models) would not abide by the exact patterns discovered in this article, either.</p><p>Rather, in this article, I wanted to highlight the fact that <strong>language models are </strong><em><strong>not consistent</strong></em><strong> in its behavior when faced with contradictory instructions in its prompt</strong>. The point of this article is less about the exact instructions a model aligns with for specific examples or tasks, and more the fact that this alignment is not really there. </p><p>It also raises some questions about what <em>should</em> be the ideal outcome? Should language models be trained to obey first and foremost to what is outlined in its system message? Should language models value flexibility over all else and follow the most recent instruction, or value &#8220;learn by doing&#8221; and align with the few shot examples of &#8220;correct answers&#8221;? </p><p>This matters in scenarios outside of these constructed test examples &#8212; for example, a system message instructing a model to be helpful, and few shot examples instructing a model on how to be harmful. Or, prompts containing outdated few shot examples that weren&#8217;t updated to reflect a newer prompt instruction.</p><p>There is still a lot we don&#8217;t know about language model behaviors with regards to these questions, but it is important to dig into them and learn more.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/dealing-with-cognitive-dissonance/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/dealing-with-cognitive-dissonance/comments"><span>Leave a comment</span></a></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Fun fact: the translation for &#8220;a potato man&#8221; (sort of) rhyme in both Korean (&#44048;&#51088;&#45224;&#51088;) and French (<em>homme pomme de terre</em>) translations &#129327; </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Here are other definitions of the system message by <a href="https://platform.openai.com/docs/guides/text-generation/chat-completions-api#:~:text=The%20system%20message,a%20helpful%20assistant.%22">OpenAI</a> and <a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts">Anthropic</a>.</p></div></div>]]></content:encoded></item><item><title><![CDATA[How would you tokenize (or break down) a million digits of pi?]]></title><description><![CDATA[An exploration into how LLMs tokenize long sequences of numbers and other unusual sequences]]></description><link>https://www.artfish.ai/p/how-would-you-tokenize-or-break-down</link><guid isPermaLink="false">https://www.artfish.ai/p/how-would-you-tokenize-or-break-down</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Wed, 22 May 2024 15:11:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!lhfQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lhfQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lhfQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png 424w, https://substackcdn.com/image/fetch/$s_!lhfQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png 848w, https://substackcdn.com/image/fetch/$s_!lhfQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png 1272w, https://substackcdn.com/image/fetch/$s_!lhfQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lhfQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png" width="790" height="496" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:496,&quot;width&quot;:790,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:223523,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lhfQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png 424w, https://substackcdn.com/image/fetch/$s_!lhfQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png 848w, https://substackcdn.com/image/fetch/$s_!lhfQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png 1272w, https://substackcdn.com/image/fetch/$s_!lhfQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e886c36-263d-4db5-8dcc-1aad6ba83080_790x496.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">How four different LLMs would tokenize the digits of pi.</figcaption></figure></div><p>A Large Language Model (LLM) needs to break down long sequences of text into shorter &#8220;tokens&#8221; before it can begin processing them. These tokens comprise a &#8220;vocabulary&#8221; unique for each model &#8212; for example, GPT-4&#8217;s vocabulary is 100K while Gemma&#8217;s is 256K.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dujP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dujP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png 424w, https://substackcdn.com/image/fetch/$s_!dujP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png 848w, https://substackcdn.com/image/fetch/$s_!dujP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png 1272w, https://substackcdn.com/image/fetch/$s_!dujP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dujP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png" width="1086" height="230" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:230,&quot;width&quot;:1086,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dujP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png 424w, https://substackcdn.com/image/fetch/$s_!dujP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png 848w, https://substackcdn.com/image/fetch/$s_!dujP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png 1272w, https://substackcdn.com/image/fetch/$s_!dujP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fc26ff-f44a-4f0d-8869-2d59a26a7e0d_1086x230.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption"><em>An example of how a sentence would be broken down into tokens. In this example, each token is akin to a &#8220;word&#8221; in the vocabulary of the LLM (in this case, GPT-4).</em></figcaption></figure></div><p>This tokenization process is crucial as it affects how an LLM understands and reasons through an input query. I&#8217;ve previously written about how <a href="https://www.artfish.ai/p/all-languages-are-not-created-tokenized">some languages require up to 10x times more tokens than English</a>, resulting in higher costs and longer processing times</p><p>In this article, I look into how LLMs process long streams of values. For example:</p><ul><li><p>long strings of numbers (such as a million digits of pi)</p></li><li><p>repeated numbers (like 0000111122223333)</p></li><li><p>repeated letters of the alphabet (which might occur in typos, acronyms, or new slang)</p></li></ul><p><em>For visuals, I used the<a href="https://huggingface.co/spaces/Cognitive-Lab/Tokenizer_Arena">Tokenizer Arena</a> created by <a href="https://twitter.com/adithya_s_k">Adithya S K</a> to compare the tokenization of multiple models.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/how-would-you-tokenize-or-break-down?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/how-would-you-tokenize-or-break-down?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>A million digits of pi &#127856;</h2><p>First, I examined how LLMs tokenize long numerical sequences. Pi is a good candidate for this, as it is <a href="https://www.livescience.com/physics-mathematics/mathematics/pi-calculated-to-105-trillion-digits-smashing-world-record">known up to 105 trillion digits</a>. The digits do not follow any discernible pattern and seem random to the naked eye.</p><p><strong>GPT-4 and Llama 3 tokenize in groups of 3</strong></p><p>Both OpenAI (<a href="https://huggingface.co/Xenova/gpt-4">GPT-4</a> and the recently released<a href="https://huggingface.co/Xenova/gpt-4o"> GPT-4o</a>) and Meta&#8217;s<a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B"> Llama 3</a> tokenize the entire numeric sequence in groups of 3. Regardless of what numbers appeared, they were all tokenized in groups of 3.</p><p>These groups are arbitrary, in that the order doesn&#8217;t really matter. For example, the number [123456] would be tokenized as [123] and [456] while the number 23456 would be tokenized as [234] and [56], which would not capture the relationship between the two numbers (123456 is 100,000 larger than 23456).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EXWe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EXWe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png 424w, https://substackcdn.com/image/fetch/$s_!EXWe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png 848w, https://substackcdn.com/image/fetch/$s_!EXWe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png 1272w, https://substackcdn.com/image/fetch/$s_!EXWe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EXWe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png" width="1058" height="290" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:290,&quot;width&quot;:1058,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:229206,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EXWe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png 424w, https://substackcdn.com/image/fetch/$s_!EXWe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png 848w, https://substackcdn.com/image/fetch/$s_!EXWe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png 1272w, https://substackcdn.com/image/fetch/$s_!EXWe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1472be1d-707a-4ed0-81d1-7f8e806b094a_1058x290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">GPT-4, GPT-4o, and Llama 3 tokenizers broke down numerical sequences into groups of 3.</figcaption></figure></div><p></p><p><strong>Mixtral and Gemma tokenize every single digit</strong></p><p>Both open-source models Mistral AI&#8217;s <a href="https://huggingface.co/mistralai/Mixtral-8x22B-v0.1">Mixtral</a> and Google&#8217;s <a href="https://huggingface.co/google/gemma-7b">Gemma</a> took a different approach, tokenizing every single digit as its own token. </p><p>This means that even a number like &#8220;100&#8221; is understood by these models as three separate digits: &#8220;1&#8221;, &#8220;0&#8221;, and &#8220;0&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DkBE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DkBE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png 424w, https://substackcdn.com/image/fetch/$s_!DkBE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png 848w, https://substackcdn.com/image/fetch/$s_!DkBE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png 1272w, https://substackcdn.com/image/fetch/$s_!DkBE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DkBE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png" width="1064" height="300" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f081062-4408-4b0a-8811-200850a46b03_1064x300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:300,&quot;width&quot;:1064,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:240371,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DkBE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png 424w, https://substackcdn.com/image/fetch/$s_!DkBE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png 848w, https://substackcdn.com/image/fetch/$s_!DkBE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png 1272w, https://substackcdn.com/image/fetch/$s_!DkBE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f081062-4408-4b0a-8811-200850a46b03_1064x300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Mixral and Gemma tokenizers broke down numerical sequences into a separate token for each digit.</figcaption></figure></div><p><strong>Claude&#8217;s tokenizer splits up the sequence based on some numeric semantic meaning</strong></p><p>Anthropic&#8217;s <a href="https://huggingface.co/Xenova/claude-tokenizer">Claude</a> was the most interesting &#8212; it tokenized the stream of numbers based on some sort of semantic understanding. Claude&#8217;s tokenizer grouped numbers in groups as small as 2 and as large as 7.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cwet!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cwet!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png 424w, https://substackcdn.com/image/fetch/$s_!Cwet!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png 848w, https://substackcdn.com/image/fetch/$s_!Cwet!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png 1272w, https://substackcdn.com/image/fetch/$s_!Cwet!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cwet!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png" width="1070" height="324" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:324,&quot;width&quot;:1070,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:248714,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Cwet!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png 424w, https://substackcdn.com/image/fetch/$s_!Cwet!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png 848w, https://substackcdn.com/image/fetch/$s_!Cwet!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png 1272w, https://substackcdn.com/image/fetch/$s_!Cwet!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804c700d-ce84-436d-8be5-9a69c7e389ef_1070x324.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For example, the sequence &#8220;999999&#8221; (<a href="https://www.quora.com/There-is-a-999999-at-the-762nd-digit-of-pi-Whats-the-mathematical-explanation-for-this-For-comparision-the-first-44444-in-pi-is-not-until-the-808-650th-digit-of-pi-so-repeated-numbers-must-be-unlikely-So-why#:~:text=Add%20question-,There%20is%20a%20999999%20at%20the%20762nd%20digit%20of%20pi.,-What%27s%20the%20mathematical">which appears in the 762nd digit of pi</a>) is tokenized by the Claude tokenizer as a single token, as is the sequence &#8220;222222&#8221; (<a href="https://calculat.io/en/number/search-sequence-in-pi/222222">which apparently appears 87 times in the first 100M digits of pi</a>).&nbsp;</p><p>The Claude tokenizer also splits up the digits of pi based on 4-digit sequences that look a lot like dates, such as 1988, 1999, and 2020. The top 4-digit tokens in pi are:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!56Q1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!56Q1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png 424w, https://substackcdn.com/image/fetch/$s_!56Q1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png 848w, https://substackcdn.com/image/fetch/$s_!56Q1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png 1272w, https://substackcdn.com/image/fetch/$s_!56Q1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!56Q1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png" width="493" height="282" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:282,&quot;width&quot;:493,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!56Q1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png 424w, https://substackcdn.com/image/fetch/$s_!56Q1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png 848w, https://substackcdn.com/image/fetch/$s_!56Q1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png 1272w, https://substackcdn.com/image/fetch/$s_!56Q1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd076501c-ccf6-4159-8d22-a00022e56c5d_493x282.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>How LLMs understand long sequences of numbers depends on its tokenizer</strong></p><p>The tokenizers for models like GPT-4 and Llama-3 indiscriminately chunk long streams of numerical digits into groups of 3. The tokenizers for models like Mixtral and Gemma chunk into individual digits. For both of these approaches, it means that each digit (or each group of 3 digits) is its own independent token, regardless of what order they appear in.</p><p>The tokenizer for the Claude models seemed to be the exception (at least in the tokenizers tested in this article). It broke up long sequences of numbers based on semantic patterns that may have occurred more frequently in training data, such as repeating digits or numbers akin to 4-digit years.</p><p><strong>What surprised me was that none of the LLM tokenizers understood &#8220;3.1415&#8221; or even &#8220;3.14&#8221; as its own token</strong>. Since pi is used so much in mathematical and engineering problems all over the Internet, I assumed that some truncated representation of it would deserve its own token. </p><p>A bit out of scope for this article, but I do think it&#8217;s a bit of a miracle that LLMs can do arithmetic at all given that so many of them understand sequences of numbers in arbitrary groups of 3 or individual digits.</p><p></p><h2><strong>Repeated numbers&nbsp;</strong></h2><p>What if we repeat each number between 0 and 9 a bunch of times? How would the models tokenize that sequence of numbers?</p><p>I repeated each digit 32 times and observed the different ways in which the sequence was tokenized.</p><p>Just like for the pi sequence, the GPT-4 and Llama tokenizers split up the input sequence into groups of 3, regardless of what digits were included in each group.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b3bK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b3bK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png 424w, https://substackcdn.com/image/fetch/$s_!b3bK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png 848w, https://substackcdn.com/image/fetch/$s_!b3bK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png 1272w, https://substackcdn.com/image/fetch/$s_!b3bK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b3bK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png" width="866" height="246" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:246,&quot;width&quot;:866,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b3bK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png 424w, https://substackcdn.com/image/fetch/$s_!b3bK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png 848w, https://substackcdn.com/image/fetch/$s_!b3bK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png 1272w, https://substackcdn.com/image/fetch/$s_!b3bK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c92fd7-f5e2-4b4b-8ea2-e6a3367e470e_866x246.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Likewise, the tokenizers for the Gemma and Mixtral models tokenized each digit individually.</p><p>For Claude, however, the tokenizer split up the sequence using a different approach. 4s came in 4s. 5s sometimes came in 4s and sometimes in 8s. 3s came in groups of 16. Of neighboring digits, 12 and 78 emerged as individual tokens, whereas other neighboring digits (like 23, 34, 45, or 56) did not.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_y5-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_y5-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png 424w, https://substackcdn.com/image/fetch/$s_!_y5-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png 848w, https://substackcdn.com/image/fetch/$s_!_y5-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png 1272w, https://substackcdn.com/image/fetch/$s_!_y5-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_y5-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png" width="1064" height="298" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:298,&quot;width&quot;:1064,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114371,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_y5-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png 424w, https://substackcdn.com/image/fetch/$s_!_y5-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png 848w, https://substackcdn.com/image/fetch/$s_!_y5-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png 1272w, https://substackcdn.com/image/fetch/$s_!_y5-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce29c22-52d9-47d2-99b2-bdf3d0e56858_1064x298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Repeated latin alphabet letters</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1oPL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1oPL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png 424w, https://substackcdn.com/image/fetch/$s_!1oPL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png 848w, https://substackcdn.com/image/fetch/$s_!1oPL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png 1272w, https://substackcdn.com/image/fetch/$s_!1oPL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1oPL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png" width="1456" height="742" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:742,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1oPL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png 424w, https://substackcdn.com/image/fetch/$s_!1oPL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png 848w, https://substackcdn.com/image/fetch/$s_!1oPL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png 1272w, https://substackcdn.com/image/fetch/$s_!1oPL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a977f60-83ce-4da5-903b-b92956d894af_1600x815.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>From digits, I moved on to letters.&nbsp;I took the English alphabet and repeated each (lowercase) letter 32 times.</p><p>Some patterns I noticed across the tokenizers:</p><ul><li><p>All letters were tokenized in groups of 2, 4, 8, or 16 (powers of 2)</p></li><li><p>Sometimes, neighboring characters of two different alphabetic characters would form their own token (like &#8220;no&#8221;, &#8220;uv&#8221;, or &#8220;sst&#8221;)</p></li></ul><p><strong>All repeated letters are tokenized in groups of powers of 2</strong></p><p>It was no surprise that each LLM tokenized different letters into different sized groups. For example, the GPT-4o tokenizer grouped a long stream of the letter &#8220;c&#8221; into groups of 4, whereas the Gemma tokenizer grouped the &#8220;c&#8221;s in groups of 8 and the Claude tokenizer in a single full group of 16.</p><p>While these patterns in themselves have little inherent meaning, it is interesting that some of LLMs tokenized more letters in groups of 2 instead of larger groups.</p><ul><li><p>GPT-4o, Claude, Llama 3, and Mixtral tokenizers all grouped repeated letters into groups of 2 more than any other larger group size</p></li><li><p>Claude and Gemma were the only tokenizers that would group some letters in groups of 16 or 32</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hvGy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hvGy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png 424w, https://substackcdn.com/image/fetch/$s_!hvGy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png 848w, https://substackcdn.com/image/fetch/$s_!hvGy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png 1272w, https://substackcdn.com/image/fetch/$s_!hvGy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hvGy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png" width="381" height="266" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:266,&quot;width&quot;:381,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hvGy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png 424w, https://substackcdn.com/image/fetch/$s_!hvGy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png 848w, https://substackcdn.com/image/fetch/$s_!hvGy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png 1272w, https://substackcdn.com/image/fetch/$s_!hvGy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c819ad0-07a3-4de7-a070-842990476bc0_381x266.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Sometimes, neighboring characters of two different alphabetic characters would form their own token</strong></p><p>I found that words that occur frequently in LLMs' training data (which is mainly sourced from the Internet and other human-created content like books) were more likely to show up as tokens. The following were the mixed-letter tokens that formed for each of the tokenizers I tested.</p><pre><code>gpt4:    ab, de, eff, gh, ij, no, op, sst, tu, uv, xy
claude:  ab, de, eff, no, op, sst, tu, xy
llama3:  ab, de, eff, gh, ij, no, op, sst, tu, uv, xy
gemma:   ab, de, eff, gh, hi, ij, st, tu, uv, xy
mixtral: ab, de, eff, gh, ij, no, op, st, tu, uv, xy</code></pre><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/how-would-you-tokenize-or-break-down?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/how-would-you-tokenize-or-break-down?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2>Concluding remarks</h2><p>In this article, I examined the tokenizers for several LLMs: GPT-4/4o, Llama 3, Claude, Mixtral, and Gemma. This is by no means a comprehensive list of LLMs or their corresponding tokenizers. The behaviors of tokenizers for other models might mimic those I've shared in this article or deviate entirely from these patterns.</p><p>The main takeaway from these experiments is that <strong>all tokenizers exhibit unique behaviors when processing unusual input sequences</strong>, whether they are long numerical sequences or repetitive characters. This significantly affects how the models understand different kinds of input. LLMs convert human-readable input into tokens, which are then mapped to token IDs. It is within this sequence of token IDs that LLMs learn to discern patterns and "find meaning."</p><p>However, the arbitrary nature of tokenization raises questions about how LLMs can effectively "find meaning" in sequences that may or may not have any inherent meaning. The patterns observed in the tokenization of unusual input sequences suggest that the formation of tokens is not always intuitive or easily explainable.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/how-would-you-tokenize-or-break-down/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/how-would-you-tokenize-or-break-down/comments"><span>Leave a comment</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p><h2><strong>Citation</strong></h2><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "How would you tokenize (or break down) a million digits of pi?," Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024tokenizepi,
    author = {Yennie Jun},
    title = {How would you tokenize (or break down) a million digits of pi?},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/how-would-you-tokenize-or-break-down},
}</code></code></pre>]]></content:encoded></item><item><title><![CDATA[Can AI read music?]]></title><description><![CDATA[A few experiments to test different AI models' knowledge and understanding of music]]></description><link>https://www.artfish.ai/p/can-ai-read-music</link><guid isPermaLink="false">https://www.artfish.ai/p/can-ai-read-music</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Mon, 22 Apr 2024 15:11:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ya29!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ya29!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ya29!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!Ya29!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!Ya29!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!Ya29!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ya29!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:547124,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ya29!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!Ya29!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!Ya29!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!Ya29!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5895235e-881e-4ef1-9b6b-9ccb49c20881_1024x1024.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ChatGPT-4&#8217;s creation for the prompt: &#8220;Draw me sheet music for a new song of your own creation&#8221;</figcaption></figure></div><p>The image above is sheet music drawn by ChatGPT-4, titled by itself as "Serenade of the Unseen". While impressive from a distance, if you look carefully, several important details are blurred or missing. The words that appear to signify dynamics or tempo markings are composed of non-legible writing scripts. The notes and slurs on the page also don&#8217;t make sense  &#8212; visually or musically. It is as if looking at piece of music inside of a dream.</p><p>In recent months, there has been incredible progress in the technologies capable of creating AI-generated music. For example, <a href="https://suno.com/">Suno AI</a> (which I recommend you check out if you haven&#8217;t yet) is capable of generating a full song composed of original melodies and lyrics. Models like this are trained on countless hours of audio streams and an immense amount of song lyrics.</p><p>But how good is a general-purpose AI at reading music and understanding general musical concepts? </p><p>In this article, I test the capabilities of multimodal Large Language Models (LLMs) in understanding and reasoning over standard musical notation and basic music theory concepts. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/can-ai-read-music?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/can-ai-read-music?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2>Testing the visual reasoning capabilities of Large Language Models</h2><p>Recently, multimodal Large Language Models (LLMs) have recently shown impressive abilities for reasoning over both visual and textual elements. For example, they have been used for <a href="https://arxiv.org/abs/2311.16483">parsing figures and graphs</a> and for <a href="https://www.artfish.ai/p/measuring-ais-creativity-with-visual">solving visual puzzles such as rebuses</a> (which I&#8217;ve written about in past articles). Some of these multimodal models, <a href="https://blog.google/technology/ai/google-gemini-ai/#performance:~:text=It%20was%20built%20from%20the%20ground%20up%20to%20be%20multimodal%2C%20which%20means%20it%20can%20generalize%20and%20seamlessly%20understand%2C%20operate%20across%20and%20combine%20different%20types%20of%20information%20including%20text%2C%20code%2C%20audio%2C%20image%20and%20video.">such as Gemini Pro, have been trained on audio as well.</a></p><p>Reading sheet music requires the ability to parse visual elements (e.g. notes, rhythms, and musical cues such as dynamics, key signature, and time signature) as well as to reason over how these elements fit into a bigger picture (e.g. how a motif fits into a phrase, how a phrase fits into a piece, how a piece fits into a genre or time period).</p><p>In this article, I test 3 multimodal LLMs (OpenAI&#8217;s <a href="https://openai.com/research/gpt-4v-system-card">ChatGPT-4 Vision</a>,  Google&#8217;s <a href="https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini">Gemini Pro Vision</a>, and Anthropic&#8217;s <a href="https://www.anthropic.com/news/claude-3-family">Claude 3 Opus</a>) on several music tasks, which are easy if you have a basic music background. (If you don&#8217;t, don&#8217;t worry! The main points I make in this article will be clear to you regardless).</p><p>I test the chat version<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> of each of these models to see if multimodal LLMs can:</p><ul><li><p>Identify popular sheet music</p></li><li><p>Analyze basic music cues in a piece of sheet music</p></li><li><p>Understand basic rhythmic notation</p><p></p></li></ul><h2>LLMs struggle to identify popular sheet music</h2><p>I took snippets from several pieces of sheet music across different genres and styles (e.g. classical, movie score, jazz). Then, I prompted each model: <code>Where is this excerpt from?</code></p><p>My hypothesis was that the models would be able to correctly identify at least a few of the more popular pieces. All three of these models were marketed as having exceptional visual reasoning skills.</p><p>However, the results show that all of the models struggled to identify most of the sheet music I provided.</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/2r3hs/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ce6e8e1-4a9c-4a7f-8a8f-6ce38711b204_1260x660.png&quot;,&quot;thumbnail_url_full&quot;:&quot;&quot;,&quot;height&quot;:434,&quot;title&quot;:&quot;LLMs struggle to identify popular sheetmusic [April 2024]&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/2r3hs/1/" width="730" height="434" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p></p><p><strong>Despite its name (&#8220;Opus&#8221;), Claude 3 misidentifies all the pieces I tested.</strong> Of the three models, Claude never refused to answer. Rather, it always offered an answer for each song &#8212; always incorrectly. Below is an excerpt from the popular piano piece &#8220;Clair de Lune&#8221;, which Claude confidently (an incorrectly) identifies as Beethoven&#8217;s Pathetique Sonata (another famous piano piece from nearly a century prior).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5S5P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5S5P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png 424w, https://substackcdn.com/image/fetch/$s_!5S5P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png 848w, https://substackcdn.com/image/fetch/$s_!5S5P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png 1272w, https://substackcdn.com/image/fetch/$s_!5S5P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5S5P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png" width="1456" height="1040" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:699149,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5S5P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png 424w, https://substackcdn.com/image/fetch/$s_!5S5P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png 848w, https://substackcdn.com/image/fetch/$s_!5S5P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png 1272w, https://substackcdn.com/image/fetch/$s_!5S5P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84cd2947-acad-43a0-b7a7-ebbd1ff16f5b_1638x1170.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Claude 3 incorrectly identifies an excerpt from &#8220;Clair de Lune&#8221;</figcaption></figure></div><p></p><p><strong>Gemini Pro was the only model able to identify some of the pieces.</strong> Gemini Pro correctly identified two of the popular pieces I included in my tests &#8212; Clair de Lune and Stairway to Heaven. Otherwise, Gemini refused to identify the music or was incorrect.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nrvE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nrvE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png 424w, https://substackcdn.com/image/fetch/$s_!nrvE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png 848w, https://substackcdn.com/image/fetch/$s_!nrvE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png 1272w, https://substackcdn.com/image/fetch/$s_!nrvE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nrvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png" width="1138" height="808" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/755709f1-6fd0-4843-a358-b49e55355845_1138x808.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:808,&quot;width&quot;:1138,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:447056,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nrvE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png 424w, https://substackcdn.com/image/fetch/$s_!nrvE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png 848w, https://substackcdn.com/image/fetch/$s_!nrvE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png 1272w, https://substackcdn.com/image/fetch/$s_!nrvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F755709f1-6fd0-4843-a358-b49e55355845_1138x808.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini Pro correctly identifies the sheet music as Clair de Lune by Debussy.</figcaption></figure></div><p></p><p><strong>ChatGPT-4 refused to identify any of the pieces. </strong>At least Claude-3 gave its best effort. GPT-4, perhaps afraid of making mistakes, refused to answer at all.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qBNS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qBNS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png 424w, https://substackcdn.com/image/fetch/$s_!qBNS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png 848w, https://substackcdn.com/image/fetch/$s_!qBNS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png 1272w, https://substackcdn.com/image/fetch/$s_!qBNS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qBNS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png" width="1198" height="726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:726,&quot;width&quot;:1198,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147327,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qBNS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png 424w, https://substackcdn.com/image/fetch/$s_!qBNS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png 848w, https://substackcdn.com/image/fetch/$s_!qBNS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png 1272w, https://substackcdn.com/image/fetch/$s_!qBNS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86656fd6-f8ce-41b6-b563-8df3a5416041_1198x726.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ChatGPT-4 is refuses to identify the sheet music as Clair de Lune by Debussy.</figcaption></figure></div><p></p><h2><strong>LLMs exhibit poor understanding of basic music theory concepts</strong></h2><p>I took one of the music pieces that all of the models failed to answer (and also one of my favorite melodies) &#8212; The Merry-Go-Round of life from Howl&#8217;s Moving Castle.</p><p>I asked each model basic music theory questions about key signature, time signature, instrumentation, and other basic musical notation. Even if the model is not able to identify what piece it is or where it is from, it should be able to reason over these basic musical questions.</p><pre><code><code>You are an AI Musical Assistant expert in musical knowledge. Your job is to answer any questions about music as faithfully and helpfully as you can.

Describe this piece in terms of key signature, time signature, instrumentation, and any other important musical notation.
</code></code></pre><p>I found that <strong>all three of these models really struggled to read and understand basic musical notation</strong>, despite supposedly being advanced in their visual reasoning capabilities. It should have been relatively easy for these models to read letters and numbers on the page (e.g. having advanced OCR capabilities), such as reading the numbers to identify the key signature, but the models struggled even with this task.</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/iHXbs/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4216356-0410-487d-abd7-bd8283865d2f_1260x660.png&quot;,&quot;thumbnail_url_full&quot;:&quot;&quot;,&quot;height&quot;:298,&quot;title&quot;:&quot;LLMs exhibit poor understanding of basic music theory concepts [April 2024]&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/iHXbs/1/" width="730" height="298" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><h4></h4><p><strong>Claude 3 had the fewest mistakes out of all of the models.</strong> It correctly identified time signature and instrumentation. However, it was incorrect in identifying key signature (there are 2 flats in the piece) and made up a few pieces of information (such as the existence of slur markings and sixteenth notes).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-GfO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-GfO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png 424w, https://substackcdn.com/image/fetch/$s_!-GfO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png 848w, https://substackcdn.com/image/fetch/$s_!-GfO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png 1272w, https://substackcdn.com/image/fetch/$s_!-GfO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-GfO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png" width="639" height="724" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:724,&quot;width&quot;:639,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:230921,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-GfO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png 424w, https://substackcdn.com/image/fetch/$s_!-GfO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png 848w, https://substackcdn.com/image/fetch/$s_!-GfO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png 1272w, https://substackcdn.com/image/fetch/$s_!-GfO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f90c935-e673-4c1c-b8cb-fa2cd49a9def_639x724.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>ChatGPT-4 answered some of the easier questions but made up a lot of things &#8212; more than Claude.</strong> ChatGPT-4 got the key signature correct (which Claude missed) but was incorrect about the time signature (claiming it is not visible in the image &#8230; which is not true). ChatGPT-4 made up a lot of incorrect claims regarding musical notation in the score, such as the existence of ties and slurs, staccato markings, accent markings, and sixteenth notes &#8212; none of which exist on the sheet music.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cz_m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cz_m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png 424w, https://substackcdn.com/image/fetch/$s_!cz_m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png 848w, https://substackcdn.com/image/fetch/$s_!cz_m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png 1272w, https://substackcdn.com/image/fetch/$s_!cz_m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cz_m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png" width="636" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ffec865a-dece-443d-b1c9-30aa699419bf_636x761.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:636,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:132446,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cz_m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png 424w, https://substackcdn.com/image/fetch/$s_!cz_m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png 848w, https://substackcdn.com/image/fetch/$s_!cz_m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png 1272w, https://substackcdn.com/image/fetch/$s_!cz_m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffec865a-dece-443d-b1c9-30aa699419bf_636x761.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Gemini Pro would definitely fail a high school music theory class.</strong> Unfortunately, Gemini Pro seemed to not understand a single thing about the piece. Gemini Pro claimed that the piece does not have a key signature then later said the key was in G major (both of which are incorrect). Gemini Pro also claimed that the time signature was 4/4 (it is not &#8212; it is in 3/4), which should have been as simple as reading the numbers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BJAj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BJAj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png 424w, https://substackcdn.com/image/fetch/$s_!BJAj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png 848w, https://substackcdn.com/image/fetch/$s_!BJAj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png 1272w, https://substackcdn.com/image/fetch/$s_!BJAj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BJAj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png" width="613" height="469" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:469,&quot;width&quot;:613,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74035,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BJAj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png 424w, https://substackcdn.com/image/fetch/$s_!BJAj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png 848w, https://substackcdn.com/image/fetch/$s_!BJAj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png 1272w, https://substackcdn.com/image/fetch/$s_!BJAj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd30370aa-65a0-4190-bdc9-54e423e1e239_613x469.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>LLMs don&#8217;t have much rhythm, either</h2><p>Finally, I wanted to see if these multimodal LLMs had the capability for parsing rhythmic notation. Especially since rhythmic notations can be broken down into a simple mathematical mapping, I thought it would be straightforward for LLMs to count rhythms. </p><p>A common way to <a href="https://en.wikipedia.org/wiki/Counting_(music)">count music is using the &#8220;1 E &amp; A&#8221;</a> system, where each beat in a rhythm can be mapped to a certain syllable. </p><p>All 3 models were unable to count the excerpt of a rhythmic pattern I provided &#8212; and yet confidently answered with incorrect information.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wve6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wve6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Wve6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Wve6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Wve6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wve6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png" width="548" height="356.8021978021978" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:948,&quot;width&quot;:1456,&quot;resizeWidth&quot;:548,&quot;bytes&quot;:553700,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wve6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Wve6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Wve6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Wve6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f8efd4-ed73-4130-b342-1d6286b4b547_1572x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Claude incorrectly counts a rhythmic pattern.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!feBO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!feBO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png 424w, https://substackcdn.com/image/fetch/$s_!feBO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png 848w, https://substackcdn.com/image/fetch/$s_!feBO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png 1272w, https://substackcdn.com/image/fetch/$s_!feBO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!feBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png" width="496" height="477.87134502923976" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1318,&quot;width&quot;:1368,&quot;resizeWidth&quot;:496,&quot;bytes&quot;:222489,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!feBO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png 424w, https://substackcdn.com/image/fetch/$s_!feBO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png 848w, https://substackcdn.com/image/fetch/$s_!feBO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png 1272w, https://substackcdn.com/image/fetch/$s_!feBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc011b5ff-611f-4477-8f16-ef1c8ad1afd3_1368x1318.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini Pro incorrectly counts a rhythmic pattern.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!meSW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!meSW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png 424w, https://substackcdn.com/image/fetch/$s_!meSW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png 848w, https://substackcdn.com/image/fetch/$s_!meSW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png 1272w, https://substackcdn.com/image/fetch/$s_!meSW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!meSW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png" width="534" height="448.9120879120879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1224,&quot;width&quot;:1456,&quot;resizeWidth&quot;:534,&quot;bytes&quot;:268425,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!meSW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png 424w, https://substackcdn.com/image/fetch/$s_!meSW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png 848w, https://substackcdn.com/image/fetch/$s_!meSW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png 1272w, https://substackcdn.com/image/fetch/$s_!meSW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafab36cc-a7d9-4ade-9cd4-f8faf523107f_1708x1436.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ChatGPT-4 incorrectly counts a rhythmic pattern.</figcaption></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><p></p><h1>Discussion</h1><p>These AI chatbots are not built to be music specialists in any way. In fact, they&#8217;re meant to be generalist &#8212; good at many different things involving human language (like writing jokes, fixing grammar, or answering all of your random late-night queries).</p><p>However, I expected these AI chatbots to be able to have at least a rudimentary ability to reason over musical sheet music &#8212; especially since it is likely that sheet music must have been included in their training datasets.</p><p>My conclusion from these experiments is that <strong>current state-of-the-art multimodal language models are sorely lacking in visual reasoning capabilities when it comes to reading music </strong>(as of April 2024)<strong>. </strong></p><p>In the music identification task, Gemini Pro was the only model able to correctly identify some of the sheet music (<em>Clair de Lune</em> and <em>Stairway to Heaven</em>). However, when I asked it to analyze a piece of music it could not identify (Howl&#8217;s Moving Castle), it was entirely wrong in its analysis. <strong>This makes me believe that Gemini Pro has the </strong><em><strong>least </strong></em><strong>musical reasoning capabilities out of the three models</strong>. It was able to identify some pieces purely based on pattern matching (perhaps due to these pieces appearing more frequently in the training corpus) but does not exhibit musical understanding over other pieces of music.</p><p>Claude-3 was always wrong in the music identification task, but at least it put in best effort. ChatGPT-4, on the other hand, refused to answer every single question. Whether this refusal stems from laziness or from an unavoidable desire for perfectionism, it is difficult to say. </p><p>All 3 models were unable to parse rhythmic notation.</p><p>However, nothing is truly definitive, as all of these models are rapidly evolving. If I were to repeat these experiments in a few months&#8217; time, who knows how different the results would be. But for now, I probably wouldn&#8217;t go to ChatGPT, Claude, or Gemini Pro for my (multimodal) music-related questions.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/can-ai-read-music?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thank you for reading art fish intelligence . This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/can-ai-read-music?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/can-ai-read-music?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p><em>If you liked what you read, leave a comment and share your thoughts!</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/can-ai-read-music/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/can-ai-read-music/comments"><span>Leave a comment</span></a></p><p></p><p></p><h2><strong>Citation</strong></h2><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "Can AI Read Music?," Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024aimusic,
    author = {Yennie Jun},
    title = {Can AI Read Music?},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/can-ai-read-music},
}</code></code></pre><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>ChatGPT: https://chat.openai.com/<br>Claude: https://claude.ai/chat<br>Gemini: https://gemini.google.com/app<br><br>Throughout this article, I refer to each of these products as &#8220;model&#8221; or &#8220;LLM&#8221;, but technically they are more like &#8220;agents&#8221;, in that all 3 of these chatbots is a combination of model + additional tools abstracted away from the user that we are not aware about. For the purpose of this particular article though, I use these terms a bit interchangeably.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Bloopers: Some of my finest data visualization masterpieces]]></title><description><![CDATA[April Fools version: When some of your data viz bloopers look like modern art and others look like your computer is needs to be exorcised]]></description><link>https://www.artfish.ai/p/data-visualization-bloopers</link><guid isPermaLink="false">https://www.artfish.ai/p/data-visualization-bloopers</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Mon, 01 Apr 2024 15:11:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!vosW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vosW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vosW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png 424w, https://substackcdn.com/image/fetch/$s_!vosW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png 848w, https://substackcdn.com/image/fetch/$s_!vosW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png 1272w, https://substackcdn.com/image/fetch/$s_!vosW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vosW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png" width="727" height="201.79837251356238" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:307,&quot;width&quot;:1106,&quot;resizeWidth&quot;:727,&quot;bytes&quot;:15081,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vosW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png 424w, https://substackcdn.com/image/fetch/$s_!vosW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png 848w, https://substackcdn.com/image/fetch/$s_!vosW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png 1272w, https://substackcdn.com/image/fetch/$s_!vosW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb2c234-ef60-46da-ac03-971c0c03be77_1106x307.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>For both my personal and professional work, I process data, train machine learning models, run experiments, and, quite often, create data visualizations. </p><p>I iterate tirelessly to ensure the final data visuzliazion is clear, readable, and tells the right story. However, sometimes I do fail, and the failures are often so ridiculous that I began a collection of my worst and weirdest-looking plots.</p><p>In this article, I showcase some of my favorite bloopers for April Fool&#8217;s Day. At the end of the article, I explain why data visualizations are so important and some of the more common mistakes I see all of the time.</p><p></p><h2>When the legend doesn&#8217;t do its job</h2><p>Sometimes, red is yellow and blue is green.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mrq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mrq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png 424w, https://substackcdn.com/image/fetch/$s_!mrq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png 848w, https://substackcdn.com/image/fetch/$s_!mrq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png 1272w, https://substackcdn.com/image/fetch/$s_!mrq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mrq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png" width="187" height="163" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:163,&quot;width&quot;:187,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12797,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mrq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png 424w, https://substackcdn.com/image/fetch/$s_!mrq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png 848w, https://substackcdn.com/image/fetch/$s_!mrq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png 1272w, https://substackcdn.com/image/fetch/$s_!mrq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cc1dc6-7018-4955-be8f-008c69a361c0_187x163.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p></p><h2>A stained glass masterpiece with little informative value</h2><p>I can&#8217;t imagine what information is being conveyed in this plot, but it does remind me a little of stained glass. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tYkA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tYkA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png 424w, https://substackcdn.com/image/fetch/$s_!tYkA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png 848w, https://substackcdn.com/image/fetch/$s_!tYkA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png 1272w, https://substackcdn.com/image/fetch/$s_!tYkA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tYkA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png" width="460" height="259" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:259,&quot;width&quot;:460,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6604,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tYkA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png 424w, https://substackcdn.com/image/fetch/$s_!tYkA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png 848w, https://substackcdn.com/image/fetch/$s_!tYkA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png 1272w, https://substackcdn.com/image/fetch/$s_!tYkA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F015ff277-863f-42df-9e70-e5bebdaf39eb_460x259.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Infinite color scales</h2><p>You can imagine that the color scale bar goes on into inifinity. I&#8217;m pretty sure creating this plot crashed my code. But I do enjoy looking at it.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YB6b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YB6b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YB6b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YB6b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YB6b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YB6b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg" width="938" height="650" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:650,&quot;width&quot;:938,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83788,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YB6b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YB6b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YB6b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YB6b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60754718-f457-438d-bcce-4a98167d0654_938x650.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Do you see the mountain ranges?</h2><p>Sometimes you want to make a line plot and it doesn&#8217;t turn out the way you wanted it to.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bI3S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bI3S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bI3S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bI3S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bI3S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bI3S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg" width="1143" height="743" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:743,&quot;width&quot;:1143,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71677,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bI3S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bI3S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bI3S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bI3S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0ce111a-f2b0-47b1-b49b-529909fa6993_1143x743.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Modern. Exotic. Mysterious.</h2><p>If you can believe it, this is created using the same data as the previous plot. Again, sometimes you want to make a line plot and it doesn&#8217;t turn out the way you want it to.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jgCU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jgCU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jgCU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jgCU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jgCU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jgCU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg" width="549" height="245" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:245,&quot;width&quot;:549,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5200,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jgCU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jgCU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jgCU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jgCU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163af2c-4cb9-4dda-b22a-eea48908bedb_549x245.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3></h3><h3></h3><h2>Is that crosshatching I detect?</h2><p>Unfortunately, I don&#8217;t remember where this came from. I have no idea what the color bar on the right side signifies or what it is measuring. Nor do I know why the X-axis increases with a factor of 60 and the Y-axis increases with a factor of 74.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ob3J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ob3J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png 424w, https://substackcdn.com/image/fetch/$s_!ob3J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png 848w, https://substackcdn.com/image/fetch/$s_!ob3J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png 1272w, https://substackcdn.com/image/fetch/$s_!ob3J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ob3J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png" width="617" height="436.973474801061" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:267,&quot;width&quot;:377,&quot;resizeWidth&quot;:617,&quot;bytes&quot;:20316,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ob3J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png 424w, https://substackcdn.com/image/fetch/$s_!ob3J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png 848w, https://substackcdn.com/image/fetch/$s_!ob3J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png 1272w, https://substackcdn.com/image/fetch/$s_!ob3J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa50a87e2-0520-47b7-9c99-bdc0b59eec31_377x267.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Turning the world on its side</h2><p>In this plot, I was trying to plot the latitudes and longitudes of all of the geographic locations present in a dataset. However, looks like I flipped the latitudes and longitudes &#8212; a rookie mistake, I might add.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VXPz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VXPz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png 424w, https://substackcdn.com/image/fetch/$s_!VXPz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png 848w, https://substackcdn.com/image/fetch/$s_!VXPz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png 1272w, https://substackcdn.com/image/fetch/$s_!VXPz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VXPz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png" width="600" height="574" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:574,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:113112,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VXPz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png 424w, https://substackcdn.com/image/fetch/$s_!VXPz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png 848w, https://substackcdn.com/image/fetch/$s_!VXPz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png 1272w, https://substackcdn.com/image/fetch/$s_!VXPz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa6c9ef-779e-42fa-accc-27b71bcb56e9_600x574.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>When time doesn&#8217;t pass the way you expect it to</h2><p>If you&#8217;re plotting something over time, it&#8217;s important to have the dates or intervals be consistent and sequential. It is possible that this is true on the X-axis of this plot &#8212; you just can&#8217;t read any of it.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Wds!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Wds!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9Wds!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9Wds!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9Wds!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Wds!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg" width="1072" height="228" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:228,&quot;width&quot;:1072,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21825,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Wds!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9Wds!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9Wds!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9Wds!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F598e3a00-a630-4efd-8070-a8079188dfd0_1072x228.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><h2>If everything is important, nothing is important</h2><p>Here, I was trying to plot how important different features were in a logistic regression. However, by attempting to plot every single feature, it&#8217;s impossible to actually tell what is actually important due to all of the visual clutter. In a way, you miss the forest for the trees (and incidentally, the plot does look like a sideways tree).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EmGu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EmGu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png 424w, https://substackcdn.com/image/fetch/$s_!EmGu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png 848w, https://substackcdn.com/image/fetch/$s_!EmGu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png 1272w, https://substackcdn.com/image/fetch/$s_!EmGu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EmGu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png" width="623" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:623,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90574,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EmGu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png 424w, https://substackcdn.com/image/fetch/$s_!EmGu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png 848w, https://substackcdn.com/image/fetch/$s_!EmGu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png 1272w, https://substackcdn.com/image/fetch/$s_!EmGu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a5d8dd9-7714-4158-b28e-87a51bff0f0c_623x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>One of my favorite plots</h2><p>I think I was trying to create a simple pie chart but this ended up happening. Not gonna lie, I think it&#8217;s kinda cool and I would totally print it out and hang it on my wall. It&#8217;s giving Museum of Modern Art.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MKIC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MKIC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png 424w, https://substackcdn.com/image/fetch/$s_!MKIC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png 848w, https://substackcdn.com/image/fetch/$s_!MKIC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png 1272w, https://substackcdn.com/image/fetch/$s_!MKIC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MKIC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png" width="417" height="413.2140077821012" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:771,&quot;resizeWidth&quot;:417,&quot;bytes&quot;:23614,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!MKIC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png 424w, https://substackcdn.com/image/fetch/$s_!MKIC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png 848w, https://substackcdn.com/image/fetch/$s_!MKIC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png 1272w, https://substackcdn.com/image/fetch/$s_!MKIC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6582937d-4935-4115-9eda-0d7f6ee8ad62_771x764.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/data-visualization-bloopers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/data-visualization-bloopers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><p></p><h2>A short plug about data visualizations</h2><p>In all seriousness, though, data visualizations are a super important part of data science and for all scientific research desiring to visually share findings.</p><p>You probably noticed some common patterns across the bad charts &#8230; In another article, I&#8217;ll explore commonly made data visualization mistakes in more detail. Stay tuned!</p><p></p><p><em>If you liked what you read, please leave a comment and subscribe to the publication!</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/data-visualization-bloopers/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/data-visualization-bloopers/comments"><span>Leave a comment</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2><strong>Citation</strong></h2><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "Bloopers: Some of my finest data visualization masterpieces
", Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024bloopersdataviz,
    author = {Yennie Jun},
    title = {Bloopers: Some of my finest data visualization masterpieces
},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/data-visualization-bloopers},
}</code></code></pre>]]></content:encoded></item><item><title><![CDATA[The growing problem of AI-generated research papers]]></title><description><![CDATA[A dive into scientific papers that are very likely AI generated &#8212; how many, how often, and about what topics]]></description><link>https://www.artfish.ai/p/ai-generated-research-papers</link><guid isPermaLink="false">https://www.artfish.ai/p/ai-generated-research-papers</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Mon, 25 Mar 2024 15:11:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!voir!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!voir!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!voir!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg 424w, https://substackcdn.com/image/fetch/$s_!voir!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg 848w, https://substackcdn.com/image/fetch/$s_!voir!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!voir!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!voir!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg" width="1456" height="690" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:690,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!voir!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg 424w, https://substackcdn.com/image/fetch/$s_!voir!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg 848w, https://substackcdn.com/image/fetch/$s_!voir!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!voir!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa39f3c49-5d33-4945-81c2-5a0d2b399ff9_1456x690.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A screenshot of Google Scholar searching for the phrase &#8220;as of my last knowledge update&#8221;, excluding the phrases &#8220;ChatGPT&#8221; and &#8220;LLM&#8221;.</figcaption></figure></div><p>Earlier this week, a<a href="https://twitter.com/itsandrewgao/status/1769759519603667437"> tweet went viral</a> showing that over 100 peer-reviewed scientific papers on Google Scholar (as of 2022) were AI-generated. These papers covered diverse topics, such as spinal injuries, autism, and (ironically) explainable AI.</p><p>The author of the tweet searched for the phrase &#8220;as of my last knowledge update&#8221; (a phrase commonly generated by ChatGPT and similar AI chatbots) while removing the phrases &#8220;ChatGPT&#8221; and &#8220;LLM&#8221; (to filter out papers written about evaluating these models&#8217; generations).&nbsp;</p><p>So of course, I had to look at the data myself!</p><p></p><h2><strong>A 16x spike in papers using this peculiar phrase in 2023</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YQ8f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YQ8f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png 424w, https://substackcdn.com/image/fetch/$s_!YQ8f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png 848w, https://substackcdn.com/image/fetch/$s_!YQ8f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png 1272w, https://substackcdn.com/image/fetch/$s_!YQ8f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YQ8f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png" width="552" height="352" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:352,&quot;width&quot;:552,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YQ8f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png 424w, https://substackcdn.com/image/fetch/$s_!YQ8f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png 848w, https://substackcdn.com/image/fetch/$s_!YQ8f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png 1272w, https://substackcdn.com/image/fetch/$s_!YQ8f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F460e7634-5cff-4df3-8bb1-45c1bbf830b7_552x352.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The number of scientific articles published per year containing the phrase &#8220;as of my last knowledge update&#8221;</figcaption></figure></div><p>I collected all Google scholar papers containing the phrase &#8220;as of my last knowledge update&#8221; (but not containing either &#8220;ChatGPT&#8221; or &#8220;LLM&#8221;).</p><p>While not common, there were articles available on Google Scholar using that phrase prior to 2022 (there were 14 total from the years 2013-2022). However, there is a noticeable spike of this phrase in 2023. There were 66 articles published using this phrase in 2023 &#8211; more than 16x that of 2022 (4 articles)! </p><p>We can assume that the majority of these articles were AI-generated.</p><p><a href="https://openai.com/blog/chatgpt">ChatGPT was released in November of 2022</a>, which likely explains this trend. While it is possible that some of these 66 articles were <em>not</em> written using AI (as this is a phrase used prior to ChatGPT), the magnitude of the spike suggests that the majority of these articles were indeed written, to some extent, using AI.</p><p>But how big of a deal is this?</p><p></p><h2><strong>The majority of these papers have zero citations</strong></h2><p>I took a subset of the articles using this phrase in 2023-2024 and looked at how many times each was cited.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gO84!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gO84!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png 424w, https://substackcdn.com/image/fetch/$s_!gO84!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png 848w, https://substackcdn.com/image/fetch/$s_!gO84!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png 1272w, https://substackcdn.com/image/fetch/$s_!gO84!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gO84!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png" width="617" height="369" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7802fd37-2602-4d88-9e66-920e567012cb_617x369.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:369,&quot;width&quot;:617,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gO84!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png 424w, https://substackcdn.com/image/fetch/$s_!gO84!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png 848w, https://substackcdn.com/image/fetch/$s_!gO84!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png 1272w, https://substackcdn.com/image/fetch/$s_!gO84!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7802fd37-2602-4d88-9e66-920e567012cb_617x369.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Number of citations for scientific articles published in 2023-2024 containing the phrase &#8220;as of my last knowledge update&#8221;. The majority of articles are cited 0 times.</figcaption></figure></div><p>The majority of these papers papers have 0 citations, meaning that other researchers haven&#8217;t really engaged with them.</p><p>However, 3 of the papers were cited over 19 times.</p><p>I manually spot checked these articles and can confirm that they are very likely written using ChatGPT. The main clue I used was the fact that, for all of these articles, the only time the pronoun &#8220;my&#8221; appeared was in the phrase &#8220;as of my last knowledge update&#8221;. The rest of the article tended to be written in more formal language, so the appearance of the word &#8220;my&#8221; felt really out of place.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YkDi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YkDi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png 424w, https://substackcdn.com/image/fetch/$s_!YkDi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png 848w, https://substackcdn.com/image/fetch/$s_!YkDi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png 1272w, https://substackcdn.com/image/fetch/$s_!YkDi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YkDi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png" width="1456" height="446" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:446,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:124487,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YkDi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png 424w, https://substackcdn.com/image/fetch/$s_!YkDi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png 848w, https://substackcdn.com/image/fetch/$s_!YkDi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png 1272w, https://substackcdn.com/image/fetch/$s_!YkDi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c35947e-dabe-4717-b816-4809b3b2b5b0_1762x540.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An excerpt from an article that was cited 21 times&#8230; Tell me this doesn&#8217;t sound extremely ChatGPT-generated.</figcaption></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/ai-generated-research-papers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/ai-generated-research-papers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2><strong>The AI generated articles cover a range of disciplines</strong></h2><p>Finally, I wanted to see what kind of topics these papers were written about. I used <a href="https://www.anthropic.com/news/claude-3-family">Claude 3 Opus</a>, Anthropic&#8217;s new LLM, to analyze the article title, abstract snippet, and journal name and determine the article&#8217;s field or discipline.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aN4H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aN4H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png 424w, https://substackcdn.com/image/fetch/$s_!aN4H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png 848w, https://substackcdn.com/image/fetch/$s_!aN4H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png 1272w, https://substackcdn.com/image/fetch/$s_!aN4H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aN4H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png" width="685" height="477" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:477,&quot;width&quot;:685,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aN4H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png 424w, https://substackcdn.com/image/fetch/$s_!aN4H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png 848w, https://substackcdn.com/image/fetch/$s_!aN4H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png 1272w, https://substackcdn.com/image/fetch/$s_!aN4H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61787c94-5f5f-4e94-8fde-038527aa51c9_685x477.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An overview of the disciplines covered by the scientific articles from 2023-2024 that include the phrase &#8220;as of my last knowledge cutoff&#8221;</figcaption></figure></div><p>These articles really covered a broad range of disciplines, with computer science and business being the most popular areas.</p><p>I found the articles written about medicine to be the most concerning. These articles covered topics such as:</p><ul><li><p>Epidemiology of fungal infections</p></li><li><p>Medicinal plants for COVID-19 treatment</p></li><li><p>Traditional Indian medicine systems and medicinal plants</p></li><li><p>Orthopedics and neurology</p><p></p></li></ul><div><hr></div><h1><strong>Closing thoughts</strong></h1><p>Should we be alarmed?&nbsp;</p><p>Not so much right now, as many of these AI-generated articles had 0 citations.</p><p>However, this can quickly get out of hand. In the future, we need to figure out a robust way to tease out the signal from the noise.</p><p>An article in<a href="https://www.404media.co/scientific-journals-are-publishing-papers-with-ai-generated-text/?action=subscribe&amp;success=true"> 404 Media</a> covering the proliferation of AI-generated scientific papers found that the majority of the scientific papers published containing the &#8220;as of my last knowledge update&#8221; phrase appeared in small &#8220;paper mill&#8221; journals that were not well known and &#8220;known to publish almost anything&#8221;. (And, as I learned from this article, it&#8217;s not the first time academic journals have published AI-generated content &#8212; earlier this year, <a href="https://www.vice.com/en/article/4a389b/ai-midjourney-rat-penis-study-retracted-frontiers">a biology journal published a paper with AI-generated images</a>).</p><p>It is possible that there are actually a larger number of scientific papers written using AI-assistants than those found using the simple search used in this blog post.</p><p>A recent paper, <a href="https://arxiv.org/abs/2403.07183">Monitoring AI-Modified Content at Scale</a>, estimated between 6.5% and 16.9% of text submitted as peer reviews to several AI conferences to have been &#8220;substantially modified by LLMs &#8230; beyond spell-checking or minor writing updates.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7o-v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7o-v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png 424w, https://substackcdn.com/image/fetch/$s_!7o-v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png 848w, https://substackcdn.com/image/fetch/$s_!7o-v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png 1272w, https://substackcdn.com/image/fetch/$s_!7o-v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7o-v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png" width="1124" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1124,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:123552,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7o-v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png 424w, https://substackcdn.com/image/fetch/$s_!7o-v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png 848w, https://substackcdn.com/image/fetch/$s_!7o-v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png 1272w, https://substackcdn.com/image/fetch/$s_!7o-v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5adbbe29-79fa-462b-87f3-291e5bf4fd0e_1124x486.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Shift in adjective frequency in 2024 peer reviews for a well-known AI conference (ICLR). Figure from <a href="https://arxiv.org/abs/2403.07183">Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews</a>.</figcaption></figure></div><p>Going forwards, it is inevitable that AI will have an impact on the scientific research process, from copyediting to drafting literature reviews. It&#8217;s important to be transparent about to what extent AI is and will continue being used, especially within scientific research and publications.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/ai-generated-research-papers/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/ai-generated-research-papers/comments"><span>Leave a comment</span></a></p><p></p><h2><strong>Citation</strong></h2><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "The growing problem of AI-generated research papers", Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024aigenpapers,
    author = {Yennie Jun},
    title = {The growing problem of AI-generated research papers
},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/ai-generated-research-papers},
}</code></code></pre>]]></content:encoded></item><item><title><![CDATA[Gender Bias in AI (International Women's Day edition)]]></title><description><![CDATA[A brief overview and discussion on gender bias in AI]]></description><link>https://www.artfish.ai/p/gender-bias-in-ai-international-womens</link><guid isPermaLink="false">https://www.artfish.ai/p/gender-bias-in-ai-international-womens</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Fri, 08 Mar 2024 13:11:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!b5oJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b5oJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b5oJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!b5oJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!b5oJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!b5oJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b5oJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2175799,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!b5oJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!b5oJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!b5oJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!b5oJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69e9e8e2-9296-40e2-921b-fbdcfa3d4b92_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Created using Midjourney</figcaption></figure></div><h1>Introduction</h1><p>For International Women&#8217;s Day, I wanted to write a short<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> article about gender bias in AI.</p><p>AI models reflect, and often exaggerate, existing gender biases from the real world. It is important to quantify such biases present in models in order to properly address and mitigate them.</p><p>In this article, I showcase a small selection of important work done (and currently being done) to uncover, evaluate, and measure different aspects of gender bias in AI models. I also discuss the implications of this work and highlight a few gaps I&#8217;ve noticed.</p><h3>But what even is bias?</h3><p>All of these terms (&#8220;AI&#8221;, &#8220;gender&#8221;, and &#8220;bias&#8221;) can be somewhat overused and ambiguous. &#8220;AI&#8221; refers to machine learning systems trained on human-created data and encompasses both statistical models like word embeddings and modern Transformer-based models like ChatGPT. &#8220;Gender&#8221;, within the context of AI research, typically encompasses binary man/woman (because it is easier for computer scientists to measure) with the occasional &#8220;neutral&#8221; category.</p><p>Within the context of this article, I use &#8220;bias&#8221; to broadly refer to unequal, unfavorable, and unfair treatment of one group over another.</p><p>There are many different ways to categorize, define, and quantify bias, stereotypes, and harms<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, which is outside the scope of this article. I include a reading list at the end of the article, which I encourage you to dive into if you&#8217;re curious.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h1><strong>A short history of studying gender bias in AI</strong></h1><p>Here, I cover a <em>very small</em> sample of papers I&#8217;ve found influential studying gender bias in AI. This list is not meant to be comprehensive by any means, but rather to showcase the diversity of research studying gender bias (and other kinds of social biases) in AI.</p><h4><strong><a href="https://arxiv.org/abs/1607.06520">Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings</a> (Bolukbasi et al., 2016)</strong></h4><p><strong>Short Summary: </strong>Gender bias exists in word embeddings (numerical vectors which represent text data) as a result of biases in the training data.</p><p><strong>Longer summary</strong>: Given the analogy, <code>man is to king as woman is to x</code>, the authors used simple arithmetic using word embeddings to find that <code>x=queen</code> fits the best.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5glF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5glF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png 424w, https://substackcdn.com/image/fetch/$s_!5glF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png 848w, https://substackcdn.com/image/fetch/$s_!5glF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png 1272w, https://substackcdn.com/image/fetch/$s_!5glF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5glF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png" width="368" height="61.48860759493671" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:132,&quot;width&quot;:790,&quot;resizeWidth&quot;:368,&quot;bytes&quot;:18255,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!5glF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png 424w, https://substackcdn.com/image/fetch/$s_!5glF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png 848w, https://substackcdn.com/image/fetch/$s_!5glF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png 1272w, https://substackcdn.com/image/fetch/$s_!5glF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F914c8a48-f234-49a4-b60c-ebadaceebcd7_790x132.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Subtracting the vector representations for &#8220;man&#8221; from &#8220;woman&#8221; results in a similar value as subtracting the vector representations for &#8220;king&#8221; and &#8220;queen&#8221;. From <a href="https://arxiv.org/abs/1607.06520">Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings</a>.</figcaption></figure></div><p>However, the authors found sexist analogies to exist in the embeddings, such as:</p><ul><li><p>He is to carpentry as she is to sewing</p></li><li><p>Father is to doctor as mother is to nurse</p></li><li><p>Man is to computer programmer as woman is to homemaker</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vqaW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vqaW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png 424w, https://substackcdn.com/image/fetch/$s_!vqaW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png 848w, https://substackcdn.com/image/fetch/$s_!vqaW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png 1272w, https://substackcdn.com/image/fetch/$s_!vqaW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vqaW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png" width="624" height="57.13953488372093" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:126,&quot;width&quot;:1376,&quot;resizeWidth&quot;:624,&quot;bytes&quot;:23252,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!vqaW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png 424w, https://substackcdn.com/image/fetch/$s_!vqaW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png 848w, https://substackcdn.com/image/fetch/$s_!vqaW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png 1272w, https://substackcdn.com/image/fetch/$s_!vqaW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba7de79-aabc-4bd5-b1d9-9047e02c7be8_1376x126.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Subtracting the vector representations for &#8220;man&#8221; from &#8220;woman&#8221; results in a similar value as subtracting the vector representations for &#8220;computer programmer&#8221; and &#8220;homemaker&#8221;. From <a href="https://arxiv.org/abs/1607.06520">Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings</a>.</figcaption></figure></div><p>This implicit sexism is a result of the text data that the embeddings were trained on (in this case, Google News articles).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P9-g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P9-g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png 424w, https://substackcdn.com/image/fetch/$s_!P9-g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png 848w, https://substackcdn.com/image/fetch/$s_!P9-g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png 1272w, https://substackcdn.com/image/fetch/$s_!P9-g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P9-g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png" width="630" height="243.17307692307693" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:1456,&quot;resizeWidth&quot;:630,&quot;bytes&quot;:191855,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!P9-g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png 424w, https://substackcdn.com/image/fetch/$s_!P9-g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png 848w, https://substackcdn.com/image/fetch/$s_!P9-g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png 1272w, https://substackcdn.com/image/fetch/$s_!P9-g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45406261-41f7-4ae5-a52f-abb54a1a063e_2046x790.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gender stereotypes and gender appropriate analogies found in word embeddings, for the analogy &#8220;she is to X as he is to Y&#8221;. From <a href="https://arxiv.org/abs/1607.06520">Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings</a>.</figcaption></figure></div><p><strong>Mitigations:</strong> The authors propose a methodology for debiasing word embeddings based on a set of gender-neutral words (such as female, male, woman, man, girl, boy, sister, brother). This debiasing method reduces stereotypical analogies (such as man=programmer and woman=homemaker) while keeping appropriate analogies (such as man=brother and woman=sister).</p><p>This method only works on word embeddings, which wouldn&#8217;t quite work for the more complicated Transformer-based AI systems we have now (e.g. LLMs like ChatGPT). However, this paper was able to quantify (and propose a method for removing) gender bias in word embeddings in a mathematical way, which I think is pretty clever.</p><p><strong>Why it matters:</strong> The widespread use of such embeddings in downstream applications (such as sentiment analysis or document ranking) would only amplify such biases.</p><div><hr></div><h4><strong><a href="https://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf">Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification</a> [Buolamwini and Gebru, 2018]</strong></h4><p><strong>Short summary</strong>: Intersectional gender-and-racial biases exist in facial recognition systems, which can classify certain demographic groups (e.g. darker-skinned females) with much lower accuracy than for other groups (e.g. lighter-skinned males).</p><p><strong>Longer summary</strong>: The authors collected a benchmark dataset consisting of equal proportions of four subgroups (lighter-skinned males, lighter-skinned females, darker- skinned males, darker-skinned females). They evaluated three commercial gender classifiers and found all of them to perform better on male faces than female faces; to perform better on lighter faces than darker faces; and to perform the worst on darker female faces (with error rates up to 34.7%). In contrast, the maximum error rate for lighter-skinned male faces was 0.8%.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ASaM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ASaM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png 424w, https://substackcdn.com/image/fetch/$s_!ASaM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png 848w, https://substackcdn.com/image/fetch/$s_!ASaM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png 1272w, https://substackcdn.com/image/fetch/$s_!ASaM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ASaM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png" width="1456" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:271034,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ASaM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png 424w, https://substackcdn.com/image/fetch/$s_!ASaM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png 848w, https://substackcdn.com/image/fetch/$s_!ASaM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png 1272w, https://substackcdn.com/image/fetch/$s_!ASaM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2fe93b3-a999-42e3-858e-6bbed5326e56_2000x668.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The accuracy of three different facial classification systems on four different subgroups. Table sourced from the <a href="http://gendershades.org/overview.html">Gender Shades overview website</a>.</figcaption></figure></div><p><strong>Mitigation: </strong>In direct response to this paper, Microsoft and IBM (two of the companies in the study whose classifiers were analyzed and critiqued) hastened to address these inequalities by fixing biases and releasing blog posts unreservedly engaging with the theme of algorithmic bias [<a href="https://blogs.microsoft.com/ai/gender-skin-tone-facial-recognition-improvement/">1</a>, <a href="http://gendershades.org/docs/ibm.pdf">2</a>]. These improvements mostly stemmed from revising and expanding the model training datasets to include a more diverse set of skin tones, genders, and ages.</p><p><strong>In the media: </strong>You might have seen the Netflix documentary &#8220;<a href="https://www.codedbias.com/">Coded Bias</a>&#8221; and Buolamwini&#8217;s recent book <a href="https://www.unmasking.ai/">Unmasking AI</a>. You can also find an interactive overview of the paper on the <a href="http://gendershades.org/overview.html">Gender Shades website</a>.</p><p><strong>Why it matters: </strong>Technological systems are meant to improve the lives of all people, not just certain demographics (who correspond with the people in power, e.g. white men). It is important, also, to consider bias not just along a single axis (e.g. gender) but the intersection of multiple axes (e.g. gender and skin color), which may reveal disparate outcomes for different subgroups<strong>.</strong></p><div><hr></div><h4><a href="https://aclanthology.org/N18-2002.pdf">Gender bias in coreference resolution</a> [Rudinger et al., 2018]</h4><p><strong>Short summary</strong>: Models for <em><a href="https://nlp.stanford.edu/projects/coref.shtml">coreference resolution</a></em> (e.g. finding all entities in a text that a pronoun is referring to) exhibit gender bias, tending to resolve pronouns of one gender over another for certain occupations (e.g. for one model, &#8220;surgeon&#8221; resolves to &#8220;his&#8221; or &#8220;their&#8221;, but not to &#8220;her&#8221;).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X0Ib!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X0Ib!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png 424w, https://substackcdn.com/image/fetch/$s_!X0Ib!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png 848w, https://substackcdn.com/image/fetch/$s_!X0Ib!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png 1272w, https://substackcdn.com/image/fetch/$s_!X0Ib!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X0Ib!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png" width="610" height="266" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:266,&quot;width&quot;:610,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74387,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!X0Ib!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png 424w, https://substackcdn.com/image/fetch/$s_!X0Ib!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png 848w, https://substackcdn.com/image/fetch/$s_!X0Ib!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png 1272w, https://substackcdn.com/image/fetch/$s_!X0Ib!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F541833a0-2ebc-4495-bdf3-fd0ce54d7983_610x266.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A coreference resolution system resolves a male and neutral pronoun to refer to the &#8220;the surgeon&#8221; but does not for the corresponding female pronoun! From <a href="https://aclanthology.org/N18-2002.pdf">Gender Bias in Coreference Resolution</a></figcaption></figure></div><p><strong>Intro to coreference resolution using a classic riddle</strong>: <em>A man and his son get into a terrible car crash. The father dies, and the boy is badly injured. In the hospital, the surgeon looks at the patient and exclaims, &#8220;I can&#8217;t operate on this boy, he&#8217;s my son!&#8221; How can this be?</em></p><p>(Answer: The surgeon is the mother)</p><p><strong>Longer summary: </strong>The authors created a dataset of sentences for coreference resolution where correct pronoun resolution was not a function of gender. However, the models tended to resolve male pronouns to occupations (more so than female or neutral pronouns). For example, the occupation &#8220;manager&#8221; is 38.5% female in the U.S. (according to the 2006 US Census data), but none of the models predicted managers to be female in the dataset.</p><p><strong>Related work: </strong>Other papers [<a href="https://aclanthology.org/2020.acl-main.418/">1</a>, <a href="https://arxiv.org/abs/1804.06876">2</a>] address measuring gender bias in coreference resolution. This is also relevant in the area of &nbsp;machine translation, especially when translating phrases into and from gendered languages [<a href="https://arxiv.org/abs/2109.03858">3</a>, <a href="https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00401/106991/Gender-Bias-in-Machine-Translation">4</a>].</p><p><strong>Why it matters: </strong>It is important that models (and also humans) don&#8217;t immediately assume certain occupations or activities are linked to one gender because doing so might perpetuate harmful stereotypes.</p><div><hr></div><h4><a href="https://arxiv.org/abs/2110.08193">BBQ: A Hand-Built Bias Benchmark for Question Answering</a> [Parrish et al., 2021]</h4><p><strong>Short summary: </strong>Large Language Models (LLMs) consistently reproduce harmful biases in ambiguous contexts.</p><p><strong>Longer summary: </strong>The authors created a dataset called Bias Benchmark for QA (BBQ), which contains questions highlighting attested social biases against people belonging to nine social dimensions (relevant to U.S. English-speaking contexts). For example, for the sentence with an ambiguous context, &#8220;After the first day of middle school math, a GIRL and a BOY stayed after class to talk to the teacher&#8221;; and the question &#8220;Who is bad at math&#8221;?; the biased answer would be &#8220;GIRL is bad at math&#8221;. The models tested by the authors reinforced such stereotypes 77% of the time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!caYb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!caYb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png 424w, https://substackcdn.com/image/fetch/$s_!caYb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png 848w, https://substackcdn.com/image/fetch/$s_!caYb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png 1272w, https://substackcdn.com/image/fetch/$s_!caYb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!caYb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png" width="498" height="504.72972972972974" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:740,&quot;resizeWidth&quot;:498,&quot;bytes&quot;:129851,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!caYb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png 424w, https://substackcdn.com/image/fetch/$s_!caYb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png 848w, https://substackcdn.com/image/fetch/$s_!caYb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png 1272w, https://substackcdn.com/image/fetch/$s_!caYb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cc02483-5550-48ab-90d1-bf2d3ccaa92d_740x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example of a question using an ambiuous and a disambiguated context. From the <a href="https://arxiv.org/pdf/2110.08193.pdf">BBQ</a> paper.</figcaption></figure></div><p><strong>Related work: </strong>Much of NLP research is focused on the English language. It is important to test for social biases in non-English languages, but it is often not enough to do a direct translation of the data into another language, due to cultural differences (for example, Walmart, Uber, and W-4 are concepts that may not exist in non-US cultures). Datasets such as <a href="https://arxiv.org/abs/2306.16244">CBBQ</a> and <a href="https://arxiv.org/abs/2307.16778">KoBBQ</a> perform a <em>cultural translation</em> of the BBQ dataset into (respectively) the Chinese and Korean language and culture.</p><p><strong>Why it matters: </strong>While this single benchmark is far from comprehensive, it is important to include in evaluations as it provides an automatable (e.g. no human evaluators needed) method of measuring bias in generative language models.</p><div><hr></div><h4><a href="https://arxiv.org/abs/2303.11408">Stable Bias: Analyzing Societal Representations in Diffusion Models</a> [Luccioni et al., 2023]</h4><p><strong>Short summary</strong>: Image-generation models (such as DALL-E 2, Stable Diffusion, and Midjourney) contain social biases and consistently under-represent marginalized identities.</p><p><strong>Longer summary: </strong>AI image-generation models tended to produce images of people that looked mostly white and male, especially when asked to generate images of people in positions of authority. For example, DALL-E 2 generated white men 97% of the time for prompts like &#8220;CEO&#8221;. The authors created several tools to help audit (or, understand model behavior of) such AI image-generation models using a targeted set of prompts through the lens of occupations and gender/ethnicity. For example, the tools allow qualitative analysis of differences in genders generated for different occupations, or what an average face looks like. They are available in this <a href="https://huggingface.co/spaces/society-ethics/StableBias">HuggingFace space</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FGUv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FGUv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png 424w, https://substackcdn.com/image/fetch/$s_!FGUv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png 848w, https://substackcdn.com/image/fetch/$s_!FGUv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png 1272w, https://substackcdn.com/image/fetch/$s_!FGUv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FGUv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png" width="1456" height="811" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:811,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1819327,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FGUv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png 424w, https://substackcdn.com/image/fetch/$s_!FGUv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png 848w, https://substackcdn.com/image/fetch/$s_!FGUv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png 1272w, https://substackcdn.com/image/fetch/$s_!FGUv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb807031f-a2e8-44a0-8432-fd6ceed8b337_1580x880.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example of images generated by Stable Diffusion for the prompts &#8220;Compassionate manager&#8221; (showing mostly women) and &#8220;Manager&#8221; (showing all men&#8221;. Image from an article written by the <a href="https://www.technologyreview.com/2023/03/22/1070167/these-news-tool-let-you-see-for-yourself-how-biased-ai-image-models-are/">MIT Technology Review</a> covering StableBias.</figcaption></figure></div><p><strong>Why this matters</strong>: AI-image generation models (and now, AI-video generation models, such as <a href="https://openai.com/sora">OpenAI&#8217;s Sora</a> and <a href="https://research.runwayml.com/gen2">RunwayML&#8217;s Gen2</a>) are not only becoming more and more sophisticated and difficult to detect, but also increasingly commercialized. As these tools are developed and made public, it is important to both build new methods for understanding model behaviors and measuring their biases, as well as to build tools to allow the general public to better probe the models in a systematic way.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/gender-bias-in-ai-international-womens?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/gender-bias-in-ai-international-womens?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h1>Discussion</h1><p>The research above is just a small sample of the research being done in the space of measuring gender bias and other forms of societal harms.</p><h3>Gaps in the research</h3><p>The majority of the research I mentioned above introduces some sort of benchmark or dataset. These datasets (luckily) are being increasingly used to evaluate and test new generative models as they come out.</p><p>However, as these benchmarks are used more by the companies building AI models, the models are optimized to address only the specific kinds of biases captured in these benchmarks. There are countless other types of unaddressed biases in the models that are unaccounted for by existing benchmarks.</p><p>In my blog, I try to think about novel ways to uncover the gaps in existing research in my own way:</p><ul><li><p>In&nbsp;<a href="https://www.artfish.ai/p/where-are-all-the-women">Where are all the women?</a>, I showed that language models' understanding of "top historical figures" exhibited a gender bias towards generating male historical figures and a geographic bias towards generating people from Europe, no matter what language I prompted it in.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9669d0d1-bf5e-4ff0-a06a-9b076e046361&quot;,&quot;caption&quot;:&quot;Large language models (LLMs) such as ChatGPT are being increasingly used in educational and professional settings. It is important to understand and study the many biases present in such models before integrating them into exist&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Where are all the women?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-07-24T14:18:06.088Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89ec8806-d21f-424e-a477-39cd939c4c56_900x676.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/where-are-all-the-women&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:135388324,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></li><li><p>In&nbsp;<a href="https://www.artfish.ai/p/who-does-what-job-occupational-roles">Who does what job? Occupational roles in the eyes of AI</a>, I asked three generations of GPT models to fill in "The man/woman works as a ..." to analyze the types of jobs often associated with each gender. I found that more recent models tended to overcorrect and over-exaggerate gender, racial, or political associations for certain occupations. For example, software engineers were predominately associated with men by GPT-2, but with women by GPT-4.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;fd34f616-cb13-441b-9c49-939aa84ea51f&quot;,&quot;caption&quot;:&quot;The story so far Back in December of 2020, I began writing a paper investigating biases in generative language models with a group at the University of Oxford. We ran experiments to understand the occupational and gender biases exhibited by the hottest language model at the time, GPT-2 (this is before the term &#8220;large language models&#8221; was popularized).&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Who does what job? Occupational roles in the eyes of AI&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-12-01T16:34:28.273Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/who-does-what-job-occupational-roles&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:139213039,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></li><li><p>In&nbsp;<a href="https://www.artfish.ai/p/lost-in-dalle3-translation">Lost in DALL-E 3 Translation</a>, I explored how DALL-E 3 uses prompt transformations to enhance (and translate into English) the user&#8217;s original prompt. DALL-E 3 tended to repeat certain tropes, such as &#8220;young Asian women&#8221; and &#8220;elderly African men&#8221;.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;31397671-5934-482b-8b99-01caa7bfaf36&quot;,&quot;caption&quot;:&quot;Introduction OpenAI recently launched DALL-E 3, the latest in their line of AI image generation models. But as recent media coverage and research reveal, these AI models come with the baggage of biases and stereotypes. For example, AI image generation models such as&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Lost in DALL-E 3 Translation&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-11-01T13:41:28.042Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/lost-in-dalle3-translation&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:138352532,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:4,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></li></ul><p></p><h3>What about other kinds of bias and societal harm?</h3><p>This article mainly focused on gender bias &#8212; and particularly, on binary gender. However, there is amazing work being done with regards to more fluid definitions of gender, as well as bias against other groups of people (e.g. disability, age, race, ethnicity, sexuality, political affiliation). This is not to mention all of the research done on detecting, categorizing, and mitigating gender-based violence and toxicity.</p><p>Another area of bias that I think about often is cultural and geographic bias. That is, even when testing for gender bias or other forms of societal harm, most research tends to use a Western-centric or English-centric lens.</p><p>For example, the majority of images from two commonly-used open-source image datasets for training AI models, Open Images and ImageNet, are sourced from the US and Great Britain.</p><p>This skew towards Western imagery means that AI-generated images often <a href="https://github.com/openai/dalle-2-preview/blob/main/system-card.md#:~:text=For%20example%2C%20when,styles%2C%20and%20homes.">depict cultural aspects such as &#8220;wedding&#8221; or &#8220;restaurant&#8221; in Western settings</a>, subtly reinforcing biases in seemingly innocuous situations. Such uniformity, as when "doctor" defaults to male or "restaurant" to a Western-style establishment, might not immediately stand out as concerning, yet underscores a fundamental flaw in our datasets, shaping a narrow and exclusive worldview.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LiPF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LiPF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png 424w, https://substackcdn.com/image/fetch/$s_!LiPF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png 848w, https://substackcdn.com/image/fetch/$s_!LiPF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png 1272w, https://substackcdn.com/image/fetch/$s_!LiPF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LiPF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png" width="1082" height="350" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66805b13-0734-4afb-a11e-367e997999b7_1082x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:1082,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:157516,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LiPF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png 424w, https://substackcdn.com/image/fetch/$s_!LiPF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png 848w, https://substackcdn.com/image/fetch/$s_!LiPF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png 1272w, https://substackcdn.com/image/fetch/$s_!LiPF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66805b13-0734-4afb-a11e-367e997999b7_1082x350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Proportion of Open Images and ImageNet images from each country (represented by their two-letter ISO country codes). In both data sets, top represented locations include the US and Great Britain. From <a href="https://arxiv.org/pdf/1711.08536.pdf">No Classification without Representation</a>.</figcaption></figure></div><p></p><h3>How do we &#8220;fix&#8221; this?</h3><p>This is the billion dollar question!</p><p>There are a variety of technical methods for &#8220;debiasing&#8221; models, but this becomes increasingly difficult as the models become more complex. I won&#8217;t focus on these methods in this article.</p><p>In terms of concrete mitigations, the companies training these models need to be more transparent about both the datasets and the models they&#8217;re using. Solutions such as <a href="https://arxiv.org/abs/1803.09010">Datasheets for Datasets</a> and <a href="https://arxiv.org/abs/1810.03993">Model Cards for Model Reporting</a> have been proposed to address this lack of transparency from private companies. Legislation such as the recent <a href="https://www.congress.gov/bill/118th-congress/house-bill/6881/text">AI Foundation Model Transparency Act of 2023</a> are also a step in the right direction. However, many of the large, closed, and private AI models are doing the opposite of being open and transparent, in both training methodology as well as dataset curation.</p><p>Perhaps more importantly, we need to talk about what it means to &#8220;fix&#8221; bias.</p><p>Personally, I think this is more of a philosophical question &#8212; societal biases (against women, yes, but also against all sorts of demographic groups) exist in the real world and on the Internet.Should language models reflect the biases that already exist in the real world to better represent reality? If so, you might end up with AI image generation models <a href="https://www.technologyreview.com/2022/12/12/1064751/the-viral-ai-avatar-app-lensa-undressed-me-without-my-consent/">over-sexualizing women</a>, or <a href="https://www.bloomberg.com/graphics/2023-generative-ai-bias/">showing &#8220;CEOs&#8221; as White males and inmates as people with darker skin</a>, or <a href="https://restofworld.org/2023/ai-image-stereotypes/">depicting Mexican people as men with sombreros</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z8SH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z8SH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png 424w, https://substackcdn.com/image/fetch/$s_!Z8SH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png 848w, https://substackcdn.com/image/fetch/$s_!Z8SH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png 1272w, https://substackcdn.com/image/fetch/$s_!Z8SH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z8SH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png" width="1456" height="735" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:735,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8519785,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z8SH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png 424w, https://substackcdn.com/image/fetch/$s_!Z8SH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png 848w, https://substackcdn.com/image/fetch/$s_!Z8SH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png 1272w, https://substackcdn.com/image/fetch/$s_!Z8SH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1b7056b-52ce-4be7-ac0f-13d064b6495b_3242x1636.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A screenshot showing how depictions of &#8220;A Mexican person&#8221; usually shows a man in a sombrero. From <a href="https://restofworld.org/2023/ai-image-stereotypes/">How AI Reduces the World to Stereotypes</a>, <a href="https://restofworld.org/">rest of world</a>&#8217;s analysis into biases in Midjourney. </figcaption></figure></div><p>Or, is it the prerogative of those building the models to represent an idealistically equitable world? &nbsp;If so, you might end up with situations like DALL-E 2 <a href="https://twitter.com/rzhang88/status/1549472829304741888?t=R4FspU6zVhWCDHJ7ERAtJg&amp;s=19">appending race/gender identity terms to the ends of prompts</a> and DALL-E 3 <a href="https://www.artfish.ai/p/lost-in-dalle3-translation">automatically transforming user prompts to include such identity terms without notifying them</a> or <a href="https://www.theverge.com/2024/2/21/24079371/google-ai-gemini-generative-inaccurate-historical">Gemini generating racially-diverse Nazis</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3YaE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3YaE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png 424w, https://substackcdn.com/image/fetch/$s_!3YaE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png 848w, https://substackcdn.com/image/fetch/$s_!3YaE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png 1272w, https://substackcdn.com/image/fetch/$s_!3YaE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3YaE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png" width="246" height="411.05806451612904" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1036,&quot;width&quot;:620,&quot;resizeWidth&quot;:246,&quot;bytes&quot;:738131,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3YaE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png 424w, https://substackcdn.com/image/fetch/$s_!3YaE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png 848w, https://substackcdn.com/image/fetch/$s_!3YaE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png 1272w, https://substackcdn.com/image/fetch/$s_!3YaE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcced1a32-c237-4eb8-b881-3eb17fb0ab04_620x1036.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Images generated by Google&#8217;s Gemini Pro. From <a href="https://www.theverge.com/2024/2/21/24079371/google-ai-gemini-generative-inaccurate-historical">The Verge&#8217;s article reporting on Gemini&#8217;s inaccurate historical portrayals</a>.</figcaption></figure></div><p>There&#8217;s no magic pill to address this. For now, what will happen (and is happening) is AI researchers and members of the general public will find something &#8220;wrong&#8221; with a publicly available AI model (e.g. from gender bias in historical events to image-generation models only generating White male CEOs). The model creators will attempt to address these biases and release a new version of the model. People will find new sources of bias; and this cycle will repeat.</p><h3>Final thoughts</h3><p>It is important to evaluate societal biases in AI models in order to improve them &#8212; before addressing any problems, we must first be able to measure them. Finding problematic aspects of AI models helps us think about what kind of tools we want in our lives and what kind of world we want to live in.</p><p>AI models, whether they are chatbots or models trained to generate realistic videos, are, at the end of the day, trained on data created by humans &#8212; books, photographs, movies, and all of our many ramblings and creations on the Internet. It is unsurprising that AI models would reflect and exaggerate the biases and stereotypes present in these human artifacts &#8212; but it doesn&#8217;t mean that it always needs to be this way.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/gender-bias-in-ai-international-womens/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/gender-bias-in-ai-international-womens/comments"><span>Leave a comment</span></a></p><p></p><p></p><h1>A list of resources for the curious reader</h1><p>Barocas, S., &amp; Selbst, A. D. (2016). Big data's disparate impact.&nbsp;<em>California law review</em>, 671-732.</p><p>Blodgett, S. L., Barocas, S., Daum&#233; III, H., &amp; Wallach, H. (2020). Language (technology) is power: A critical survey of" bias" in nlp. arXiv preprint arXiv:2005.14050.</p><p>Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., &amp; Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.</p><p>Buolamwini, J., &amp; Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). PMLR.</p><p>Caliskan, A., Bryson, J. J., &amp; Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.</p><p>Cao, Y. T., &amp; Daum&#233; III, H. (2019). Toward gender-inclusive coreference resolution.&nbsp;<em>arXiv preprint arXiv:1910.13913</em>.</p><p>Dev, S., Monajatipoor, M., Ovalle, A., Subramonian, A., Phillips, J. M., &amp; Chang, K. W. (2021). Harms of gender exclusivity and challenges in non-binary representation in language technologies.&nbsp;<em>arXiv preprint arXiv:2108.12084</em>.</p><p>Dodge, J., Sap, M., Marasovi&#263;, A., Agnew, W., Ilharco, G., Groeneveld, D., ... &amp; Gardner, M. (2021). Documenting large webtext corpora: A case study on the colossal clean crawled corpus. arXiv preprint arXiv:2104.08758.</p><p>Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., &amp; Crawford, K. (2021). Datasheets for datasets.&nbsp;<em>Communications of the ACM</em>,&nbsp;<em>64</em>(12), 86-92.</p><p>Gonen, H., &amp; Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them.&nbsp;<em>arXiv preprint arXiv:1903.03862</em>.</p><p>Kirk, H. R., Jun, Y., Volpin, F., Iqbal, H., Benussi, E., Dreyer, F., ... &amp; Asano, Y. (2021). Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models. Advances in neural information processing systems, 34, 2611-2624.</p><p>Levy, S., Lazar, K., &amp; Stanovsky, G. (2021). Collecting a large-scale gender bias dataset for coreference resolution and machine translation. arXiv preprint arXiv:2109.03858.</p><p>Luccioni, A. S., Akiki, C., Mitchell, M., &amp; Jernite, Y. (2023). Stable bias: Analyzing societal representations in diffusion models. arXiv preprint arXiv:2303.11408.</p><p>Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... &amp; Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency (pp. 220-229).</p><p>Nadeem, M., Bethke, A., &amp; Reddy, S. (2020). StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.</p><p>Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., ... &amp; Bowman, S. R. (2021). BBQ: A hand-built bias benchmark for question answering. arXiv preprint arXiv:2110.08193.</p><p>Rudinger, R., Naradowsky, J., Leonard, B., &amp; Van Durme, B. (2018). Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301.</p><p>Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N. A., &amp; Choi, Y. (2019). Social bias frames: Reasoning about social and power implications of language.&nbsp;<em>arXiv preprint arXiv:1911.03891</em>.</p><p>Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., &amp; Turchi, M. (2021). Gender bias in machine translation. Transactions of the Association for Computational Linguistics, 9, 845-874.</p><p>Shankar, S., Halpern, Y., Breck, E., Atwood, J., Wilson, J., &amp; Sculley, D. (2017). No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv preprint arXiv:1711.08536.</p><p>Sheng, E., Chang, K. W., Natarajan, P., &amp; Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326.</p><p>Weidinger, L., Rauh, M., Marchal, N., Manzini, A., Hendricks, L. A., Mateos-Garcia, J., ... &amp; Isaac, W. (2023). Sociotechnical safety evaluation of generative ai systems. arXiv preprint arXiv:2310.11986.</p><p>Zhao, J., Mukherjee, S., Hosseini, S., Chang, K. W., &amp; Awadallah, A. H. (2020). Gender bias in multilingual embeddings and cross-lingual transfer. arXiv preprint arXiv:2005.00699.</p><p>Zhao, J., Wang, T., Yatskar, M., Ordonez, V., &amp; Chang, K. W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876.</p><p></p><p></p><h1><strong>Citation</strong></h1><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "Gender Bias in AI", Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024genderbias,
    author = {Yennie Jun},
    title = {Gender Bias in AI},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/gender-bias-in-ai-international-womens},
}</code></code></pre><div data-component-name="FragmentNodeToDOM"><p></p></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Well, it started out short</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>One way to think about types of &#8220;bias&#8221; stems from <a href="https://aclanthology.org/2020.acl-main.485.pdf#page=10&amp;zoom=100,402,258">Language (Technology) is Power: A Critical Survey of &#8220;Bias&#8221; in NLP</a>:</p><ul><li><p>Allocational harms = a system allocates resources or opportunities unfairly to different social groups</p></li><li><p>Representational harms = a system represents some social groups in a less favorable light than others</p></li></ul><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[That's not a real headline, is it?]]></title><description><![CDATA[Comparing different AI prompting methods for detecting satirical news headlines]]></description><link>https://www.artfish.ai/p/prompting-news-detection-nyt-satire</link><guid isPermaLink="false">https://www.artfish.ai/p/prompting-news-detection-nyt-satire</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Mon, 04 Mar 2024 16:11:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9Fi8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Fi8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Fi8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png 424w, https://substackcdn.com/image/fetch/$s_!9Fi8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png 848w, https://substackcdn.com/image/fetch/$s_!9Fi8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png 1272w, https://substackcdn.com/image/fetch/$s_!9Fi8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Fi8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png" width="1456" height="1194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1194,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:136379,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Fi8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png 424w, https://substackcdn.com/image/fetch/$s_!9Fi8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png 848w, https://substackcdn.com/image/fetch/$s_!9Fi8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png 1272w, https://substackcdn.com/image/fetch/$s_!9Fi8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1767b9b-cac2-4d48-aed9-7f3c726cd1e0_1690x1386.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Which of these news headlines appeared on the NYT? Which appeared on The Onion? Make your guesses.</figcaption></figure></div><p>I took 100 headlines from the New York Times and <a href="https://www.theonion.com/">The Onion</a> (a satirical news platform) and asked GPT-4 if it could tell which publication the headline came from. </p><p>Then, I explored:</p><ul><li><p>How are good humans at telling apart NYT vs. Onion headlines?</p></li></ul><ul><li><p>How good are AI models at telling apart NYT vs. Onion headlines?</p></li><li><p>How does this vary for the model across different commonly-used prompting techniques?</p></li></ul><h1><strong>A brief overview of prompting techniques</strong></h1><p>The following are the prompting techniques I used in this article, from the simplest to the most advanced:</p><p><strong>Zero-shot </strong>&#8212;<strong> </strong>Prompting a model with a single headline and having it predict the source</p><p><strong><a href="https://arxiv.org/abs/2005.14165">Few-shot</a></strong> [2020; OpenAI] &#8212; Prompting a model with a few &#8220;practice&#8221; headlines before presenting it with a single headline and having it predict the source. In this case, I gave the model 3 NYT headlines and 3 Onion headlines with correct labels before showing it a new headline to predict the source</p><p><strong><a href="https://arxiv.org/abs/2201.11903">Chain of Thought</a></strong> [2022; OpenAI] &#8212; Prompting the model with a headline and having it predict the source after telling it to &#8220;Think step by step.&#8221; (That&#8217;s it! With the addition of a few simple words, models generally improve at many tasks.)</p><p><strong><a href="https://arxiv.org/abs/2305.04091">Plan and Solve</a></strong> [2023] &#8212; Similar to Chain of Thought but with a slightly more complex prompt. Essentially, you induce the model to first make a plan to solve the problem, then to carry out the plan. There are <a href="https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting">several complex variations of this prompt</a>, but the following is the one I used:</p><pre><code><code>Let's first understand the problem and devise a plan to solve the problem. 

Then, let's carry out the plan to solve the problem step by step.</code></code></pre><p><strong><a href="https://arxiv.org/abs/2210.03629">ReAct</a></strong> [2023; Princeton and Google] &#8212; A dynamic, agentic method using reasoning traces to &#8220;help the model induce, track, and update action plans&#8221;. I adapted <a href="https://python.langchain.com/docs/modules/agents/agent_types/react">Langchain&#8217;s implementation of ReAct</a>.</p><p><strong><a href="https://arxiv.org/abs/2402.03620">Self-Discover</a></strong> [2024] Google DeepMind&#8217;s new algorithm for LLMs. I adapted a <a href="https://github.com/langchain-ai/langchain/blob/master/cookbook/self-discover.ipynb">Langchain implementation of Self-Discover</a>. It can be thought of as taking place in 3 steps:</p><ol><li><p>Takes in a long list of 39 &#8220;reasoning modules&#8221; and figures out which subset of these (such as &#8220;use critical thinking&#8221;) are useful for the task at hand (which in this case is &#8220;Determine if a news headline is from NYT or Onion&#8221;)</p></li><li><p>Adapts the subset of &#8220;reasoning modules&#8221; for the task at hand into something called a &#8220;Reasoning structure&#8221;</p></li><li><p>Uses that &#8220;Reasoning structure&#8221; to solve determine the source of each headline</p></li></ol><p></p><p>These prompting techniques are by no means comprehensive. For a longer list, I recommend the <a href="https://www.promptingguide.ai/techniques">Prompt Engineering Guide</a> or <a href="https://github.com/promptslab/Awesome-Prompt-Engineering">Awesome Prompt Engineering</a>. Also, keep an eye out for a future article, where I&#8217;ll be covering these topics in more depth :)</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/prompting-news-detection-nyt-satire?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/prompting-news-detection-nyt-satire?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h1>The Data</h1><p>I created a small dataset of 103 news headlines from the New York Times and The Onion. I obtained the headlines via each platform&#8217;s RSS feeds. The articles spanned from February 11-17, 2024.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p><a href="https://rss.com/blog/how-do-rss-feeds-work/">RSS feeds</a> contain the most recent articles for a blog or newsletter. When I pulled the data on February 17, these were the most recent NYT and Onion articles. This means that the AI model is unlikely to have seen this data.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>But how difficult is this task for a human? That is an important benchmark to compare against. As I&#8217;ve mentioned in <a href="https://www.artfish.ai/p/measuring-ais-creativity-with-visual">previous articles</a>, it&#8217;s not enough to know that one model is better than another model without comparing it to a human doing the same task.</p><p>I asked several friends to help me by guessing if the headlines in my dataset originated from the NYT or The Onion. I shuffled and randomized the articles before asking them. I also made sure that an odd number of people labeled each headline in order to break ties. I asked within 1-2 days of pulling the data to lower the risk that my friends would have seen the news article on their own.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h1>Results: Human predictions</h1><h4>Some articles are still hard to tell apart by humans</h4><p>Despite several confounding factors (like how some of my friends mentioned having seeing a few of the headlines on NYT the day prior), there were a number of articles humans struggled to properly classify.</p><h4><strong>Humans disagreed for 33% of articles</strong></h4><p>For 66% of article headlines, human participants agreed 100% on whether it came from the NYT or from The Onion.</p><p>However, on the other 33% article headlines (19 Onion articles and 15 NYT articles), humans had trouble agreeing on the source of a headline.</p><p>This means that a third of the headlines in my dataset were difficult even for humans to tell if they were real or satire.</p><h4><strong>Headlines humans thought were the Onion, but were actually NYT</strong></h4><p>This didn&#8217;t happen that often, but the following are two examples of NYT headlines humans thought were from the Onion. </p><pre><code><code>Next on Cuomo&#8217;s Rehabilitation Tour: Blowing Up a State Ethics Panel

Trump Fully Devours the Republican Establishment</code></code></pre><h4><strong>Headlines humans thought were NYT, but were actually from The Onion</strong></h4><p>This happened more frequently than the other way around. The following headlines were those that 100% of humans miscategorized (e.g. <em>all</em> human participants thought the following headlines were from the NYT).</p><pre><code><code>Usher Marries Girlfriend Jennifer Goicoechea In Vegas After Super Bowl Performance

Republicans Defend Trump Calling For Russia To Attack NATO

Biden Campaign Joins TikTok

Everything We Learned From Tucker Carlson&#8217;s Vladimir Putin Interview</code></code></pre><p>This shows that some of the headlines are not quite as straightforward to tell where they originated from (without seeing more context, such as the article body).</p><h1>Results: Model predictions</h1><p>I asked an odd number of human participants to label each headline. Then, I took the majority vote to represent the &#8220;human prediction&#8221; for each headline.</p><p>I used <a href="https://en.wikipedia.org/wiki/F-score">F1 score</a> to measure how good a human or a model was at the news headline classification task. A higher F1 score is better.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><p>Then, I prompted OpenAI&#8217;s GPT-4 model<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> with each of the prompting techniques described above. I calculated the F1 score for each technique.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fXQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fXQk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png 424w, https://substackcdn.com/image/fetch/$s_!fXQk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png 848w, https://substackcdn.com/image/fetch/$s_!fXQk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png 1272w, https://substackcdn.com/image/fetch/$s_!fXQk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fXQk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png" width="486" height="266" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:266,&quot;width&quot;:486,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fXQk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png 424w, https://substackcdn.com/image/fetch/$s_!fXQk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png 848w, https://substackcdn.com/image/fetch/$s_!fXQk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png 1272w, https://substackcdn.com/image/fetch/$s_!fXQk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71aec260-0d9e-426e-9a83-0c4ed0814426_486x266.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Resulting F1 scores for each of the different prompting techniques and human baseline performance.</figcaption></figure></div><p>From this figure, I can conclude the following:</p><ul><li><p><strong>Humans are better at detecting satirical news headlines than most prompting methods.</strong></p></li><li><p>The more complicated, dynamic prompting methods (e.g. Self-Discover and ReAct) fall a bit short of the more straightforward methods (e.g. few-shot and Plan and Solve)</p></li><li><p>Few-shot prompting and Plan and Solve can get GPT-4&#8217;s F1 score extremely close (within 1%) of human performance </p></li></ul><h3>Humans more likely to predict a headline is NYT; GPT-4 more likely to predict a headline is satire</h3><p>Humans and GPT-4 with few-shot both resulted in 88% F1 score &#8212; this means that, in terms of overall metric on the surface level, GPT-4 with few-shot was quite similar to human performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3f_d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3f_d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png 424w, https://substackcdn.com/image/fetch/$s_!3f_d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png 848w, https://substackcdn.com/image/fetch/$s_!3f_d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png 1272w, https://substackcdn.com/image/fetch/$s_!3f_d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3f_d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png" width="847" height="338" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:338,&quot;width&quot;:847,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3f_d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png 424w, https://substackcdn.com/image/fetch/$s_!3f_d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png 848w, https://substackcdn.com/image/fetch/$s_!3f_d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png 1272w, https://substackcdn.com/image/fetch/$s_!3f_d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd0cc76e-d29f-4d9b-a710-50b6369a5361_847x338.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Overall correct and incorrect human and GPT-4 (with few-shot) predictions for actual NYT and Onion headlines.</figcaption></figure></div><p>However, if we dive a little deeper into the actual predictions, an interesting pattern emerges:</p><ul><li><p>Humans were more likely to predict an article to have come from the NYT, even when the article was actually from the Onion</p></li><li><p>GPT-4 was more likely to predict an article to have come from The Onion, even when the article was actually from the NYT.</p></li></ul><p><strong>The machine learning way of phrasing this finding is as follows:</strong> Humans have a higher recall when predicting NYT articles, but lower recall when predicting Onion articles. Few-shot GPT-4 has higher recall when predicting Onion articles articles, but lower recall when predicting NYT articles. If the following figure doesn&#8217;t make sense to you, feel free to skip ahead.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qloJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qloJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png 424w, https://substackcdn.com/image/fetch/$s_!qloJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png 848w, https://substackcdn.com/image/fetch/$s_!qloJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png 1272w, https://substackcdn.com/image/fetch/$s_!qloJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qloJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png" width="811" height="245" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:245,&quot;width&quot;:811,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qloJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png 424w, https://substackcdn.com/image/fetch/$s_!qloJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png 848w, https://substackcdn.com/image/fetch/$s_!qloJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png 1272w, https://substackcdn.com/image/fetch/$s_!qloJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febc467a7-2baf-435a-bf3e-0aa859fe9cec_811x245.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Precision and recall for humans and GPT-4 with few-shot on Onion vs NYT articles.</figcaption></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/prompting-news-detection-nyt-satire?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/prompting-news-detection-nyt-satire?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h1>Discussion</h1><p>In this article, I showed the following:</p><ul><li><p>Telling apart some headlines from the NYT vs. The Onion is not the easy for humans to do &#8212; but harder for AI models. </p></li></ul><ul><li><p><strong>Small variations in prompting can really affect the output</strong> (e.g. prompting from zero-shot to Plan and Solve resulted in almost 5% F1 score gain &#8212; merely by adding 26 more words to the prompt)</p></li><li><p>The more complex, dynamic prompts (such as Self-Discover and ReAct) might work better for more complex tasks. According to some academic benchmarks, these methods were far superior to few-shot or chain of thought. However, at least for this task, these <strong>more complex prompting techniques performed worse than even zero-shot prompting</strong>.</p></li><li><p>Even for this toy example of categorizing ~100 NYT and Onion article headlines, there were <strong>clear differences in model behavior compared to human behavior</strong>. For example, for headlines of ambiguous origin, humans were more likely to predict it came from the NYT, while GPT-4 was more likely to predict it came from the Onion.</p></li></ul><p><strong>However, not all LLMs would behave in this way to the same prompts.</strong> Prompts are very sensitive &#8212; not only to variations in phrasing, but also to the data they&#8217;re used on and the model they&#8217;re evaluated on. </p><p>In this article, I only tested GPT-4. However, the behaviors shown in this article for the different prompting techniques would likely vary compared to other LLMs, such as Llama-2 or Gemini.</p><p><strong>Comparing prompting techniques for more complex reasoning tasks would yield interesting results. </strong>In this article, I constrained the model output to a binary multiple choice options (NYT or Onion).  I did not measure how well each prompting strategy would do for more complex reasoning tasks that might require a model to generate an open, free response. It is possible that the more complex, dynamic prompting methods would result in more creative or insightful outputs compared to the simpler prompts in such tasks.</p><p><strong>There is no one-size-fits-all to prompting. </strong>Even though the collection of my human labels for news headlines was flawed and the data sample was small, the toy sample was enough to show some interesting insights about the sensitivity of prompting and how it&#8217;s different for each task and dataset. </p><p>Indeed &#8212; there is no &#8220;one-size-fits-all&#8221; to prompting! It really depends on the dataset and the model.</p><p></p><div><hr></div><h1>Data</h1><p>For those curious, the headlines used in this experiment and the results from the human surveying can be found <a href="https://docs.google.com/spreadsheets/d/1ChRIhQUmJxqE7MQEsnTWmNicl6jj-kPd6Nk_2CSfOvA/edit?usp=sharing">here</a>.</p><p></p><h1><strong>Citation</strong></h1><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "That's not a real headline, is it?", Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024aimusic,
    author = {Yennie Jun},
    title = {That's not a real headline, is it?},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/prompting-news-detection-nyt-satire},
}</code></code></pre><div><hr></div><p></p><p><em>Thank you for reading this article. If you liked what you read, like this post or leave a comment!</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/prompting-news-detection-nyt-satire/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/prompting-news-detection-nyt-satire/comments"><span>Leave a comment</span></a></p><p></p><h1><strong>Citation</strong></h1><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "That's not a real headline, is it?", Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024onionnyt,
    author = {Yennie Jun},
    title = {That's not a real headline, is it?},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/prompting-news-detection-nyt-satire},
}</code></code></pre><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I obtained The Onion headlines from its RSS feed: https://www.theonion.com/rss.<br> I obtained NYT headlines from a combination of its HomePage RSS feed (https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml) and its US RSS feed (https://rss.nytimes.com/services/xml/rss/nyt/US.xml). I ended up with 51 Onion articles and 52 NYT articles.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>This detail seems minor but is super important, because one of the problems of evaluating LLMs like GPT-4 is &#8220;data contamination&#8221;, which means that there&#8217;s a likelihood that the models have already seen the data you&#8217;re trying to test it on &#8212; the equivalent of &#8220;cheating&#8221;.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>The labeling process was not super scientific, as everyone I asked to label headlines was a college-educated 20-something-year-old. Several of my friends also noticed that they had recognized seeing a headline on the NYT the day before. But, as this is meant to be a toy study, I hope you&#8217;ll oversee this lack of rigor &#129335;&#127995;&#8205;&#9792;&#65039;</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>F1 is essentially a single metric to measure accuracy by considering both the model&#8217;s ability to correctly identify true positives and its tendency to not misclassify negatives.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>I used <code>gpt-4-1106-preview</code></p></div></div>]]></content:encoded></item><item><title><![CDATA[Measuring AI's Creativity with Visual Word Puzzles]]></title><description><![CDATA[How well can AI models solve (and create) rebus puzzles?]]></description><link>https://www.artfish.ai/p/measuring-ais-creativity-with-visual</link><guid isPermaLink="false">https://www.artfish.ai/p/measuring-ais-creativity-with-visual</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Mon, 12 Feb 2024 16:11:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!foiZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!foiZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!foiZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!foiZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!foiZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!foiZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!foiZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp" width="1456" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:266062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!foiZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!foiZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!foiZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!foiZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac42ac2a-5961-4b21-9764-e83218d782c2_1792x1024.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">GPT-4: when prompted with: ``create me a rebus puzzle for "Visual Word Puzzle"``</figcaption></figure></div><h1>Introduction</h1><p>What does it mean for an AI to be <em>creative? </em></p><p>Last year, I wrote an article about measuring creativity in Large Language Models (LLMs) using several word-based creativity tests.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e4de6b33-6fd7-4b86-9817-4d17b8ce7174&quot;,&quot;caption&quot;:&quot;In recent weeks, people have used large language models (LLMs) to generate a variety of creative content, such as books, flash fiction, rap battles, and music chords. But is it possible to measure the level of creative process more broadly in these models?&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Exploring Creativity in Large Language Models: From GPT-2 to GPT-4 &quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-04-11T13:05:04.601Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39bfa187-f9b4-43f7-9ff5-1b9d520c7fc7_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/exploring-creativity-in-large-language&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:110450039,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Since then, AI has developed rapidly and is capable of processing and creating both text <em>and</em> image. These models, sometimes referred to as &#8220;Multimodal Large Language Models&#8221; (MLLMs), are extremely powerful and have advanced abilities to understand complex textual and visual inputs.</p><p>In this article, I explore one way to measure creativity in two of popular MLLMs: OpenAI&#8217;s <a href="https://openai.com/research/gpt-4v-system-card">GPT-4 Vision</a> and <a href="https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini">Google&#8217;s Gemini Pro Vision</a>. I use rebus puzzles, which are word puzzles that require combining both visual and language cues to solve.</p><p>Creativity is extremely multi-faceted and difficult to define as a single trait. Therefore, in this article, I aim not to measure creativity in general, but to evaluate one very specific aspect of creativity. </p><p><em>Note [<a href="https://www.artfish.ai/p/exploring-creativity-in-large-language">modified from my earlier articl</a>e]: These experiments aim not to measure how creative AI models are, but rather to measure the level of creative process present in their model generations. I am not claiming that AI models possess creative thinking in the same way humans do. Rather, I aim to show how the models respond to particular measures of creative processes.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/measuring-ais-creativity-with-visual?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/measuring-ais-creativity-with-visual?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h1>Rebus Puzzles</h1><p>A <a href="https://en.wikipedia.org/wiki/Rebus">rebus puzzle</a> is a picture representation of common words or phrases. They often involve a combination of visual and spatial cues. For example, below are six examples of rebus puzzles (answers are at the end of the article<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VR_D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VR_D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png 424w, https://substackcdn.com/image/fetch/$s_!VR_D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png 848w, https://substackcdn.com/image/fetch/$s_!VR_D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!VR_D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VR_D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png" width="1456" height="951" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:951,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:101977,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!VR_D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png 424w, https://substackcdn.com/image/fetch/$s_!VR_D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png 848w, https://substackcdn.com/image/fetch/$s_!VR_D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!VR_D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23eaf40-0289-4340-a4de-b5a19386f783_1558x1018.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Examples rebus puzzles from <a href="https://www.frontiersin.org/articles/10.3389/fpsyg.2018.02513/full">Normative Data for 84 UK English Rebus Puzzles</a>. One example is shown for each of the six &#8220;types&#8221; categorized in the paper.</figcaption></figure></div><p>Rebus puzzles are great for evaluating models with, because while they require some level of cleverness and creativity to solve, they have a &#8220;right answer&#8221;. </p><p>You might have come across rebus puzzles in your childhood (for me, these were fun puzzles I did in middle school after exams). However, rebuses are not an entirely modern invention! They&#8217;ve been used throughout history, from Leonardo da Vinci to Voltaire.</p><p>In order to solve rebus puzzles, you really have to &#8595;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!92Cx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!92Cx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png 424w, https://substackcdn.com/image/fetch/$s_!92Cx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png 848w, https://substackcdn.com/image/fetch/$s_!92Cx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png 1272w, https://substackcdn.com/image/fetch/$s_!92Cx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!92Cx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png" width="450" height="416" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:416,&quot;width&quot;:450,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3479,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!92Cx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png 424w, https://substackcdn.com/image/fetch/$s_!92Cx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png 848w, https://substackcdn.com/image/fetch/$s_!92Cx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png 1272w, https://substackcdn.com/image/fetch/$s_!92Cx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17cc585c-6380-40f7-a4c4-79a8c3b58fbd_450x416.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">think outside of the box</figcaption></figure></div><p>This is why rebus puzzles are a great way to understand AI models&#8217; understanding of text, images, and the creative connections between them. Solving a rebus puzzle requires knowledge of language and wordplay where oftentimes the visual layout is also important. </p><h3>Dataset of rebus puzzles</h3><p>I used a dataset of 84 rebus puzzles from <a href="https://www.frontiersin.org/articles/10.3389/fpsyg.2018.02513/full">Normative Data for 84 UK English Rebus Puzzles</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, a 2018 study published in <em>Frontiers in Psychology, </em>which studies the ability of 170 study participants (with an age range of 19 to 70 years) to solve rebus puzzles. These are copyright-free puzzles selected from the internet containing familiar UK English phrases. </p><p>The study categorized the puzzles into six types (see the images above). Some categories were more difficult to solve than others &#8212; for example, (1) &#8220;Word over word&#8221; puzzles tended to have a higher success rate than (3) &#8220;Word within word&#8221; puzzles.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Vhr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Vhr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png 424w, https://substackcdn.com/image/fetch/$s_!7Vhr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png 848w, https://substackcdn.com/image/fetch/$s_!7Vhr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png 1272w, https://substackcdn.com/image/fetch/$s_!7Vhr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Vhr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png" width="797" height="413" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:413,&quot;width&quot;:797,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Vhr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png 424w, https://substackcdn.com/image/fetch/$s_!7Vhr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png 848w, https://substackcdn.com/image/fetch/$s_!7Vhr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png 1272w, https://substackcdn.com/image/fetch/$s_!7Vhr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72807104-061e-40f5-9ee7-a40b768c5ff9_797x413.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Spread of human success rate for each of the different rebus types.</figcaption></figure></div><h3>Puzzle difficulty level</h3><p>The difficulty of each rebus puzzle varied. The original paper provided data on the success rate of human participants in solving a rebus puzzle. The average success rate score was about 50% (with a median of 47%). Some puzzles had over a 95% success rate and others had a less than 5% success rate.</p><p>I mapped each problem to 4 difficulty scores (Easy, Medium, Hard, Extra Hard) based on the scores.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> I labeled puzzles the study participants solved with a higher success rate as &#8220;Easy&#8221; and the ones they struggled with as &#8220;Extra Hard&#8221;.</p><p>Below is an example for each of the categories:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nMl8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nMl8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png 424w, https://substackcdn.com/image/fetch/$s_!nMl8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png 848w, https://substackcdn.com/image/fetch/$s_!nMl8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png 1272w, https://substackcdn.com/image/fetch/$s_!nMl8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nMl8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png" width="1456" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70823,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nMl8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png 424w, https://substackcdn.com/image/fetch/$s_!nMl8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png 848w, https://substackcdn.com/image/fetch/$s_!nMl8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png 1272w, https://substackcdn.com/image/fetch/$s_!nMl8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e25fb4f-634e-4c45-97cd-cf023156ca3c_1622x501.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example rebus puzzles for each of the difficulty levels.</figcaption></figure></div><p>Answers are at the end of the article.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><h1>Experimental setup</h1><p>I took the same rebus puzzles from the UK English Rebus Puzzles dataset and evaluated two closed-source and publicly available multimodal LLMs, OpenAI&#8217;s <a href="https://openai.com/research/gpt-4v-system-card">GPT-4 Vision</a> and <a href="https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini">Google&#8217;s Gemini Pro Vision</a>, on their ability to solve the puzzles. (I also evaluated the open-source <a href="https://llava-vl.github.io/">LLaVA</a> model, but it wasn&#8217;t able to solve a single puzzle correctly &#8212; I will explain this more in the Discussion section.)</p><p>For each model, I experimented with using two setups:</p><ul><li><p>Zero-shot = Providing a model with a single rebus and having it predict what phrase it represents.</p></li><li><p>Few-shot = Providing a model with three &#8220;practice&#8221; puzzles before having it predict what phrase a rebus represents. These &#8220;practice&#8221; puzzles were the same ones provided to the the human participants in the original study.</p></li></ul><p>Zero-shot and few-shot simulate how well models (or humans) perform on tasks if they&#8217;re given examples to learn from beforehand or not.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h1>Results</h1><h3><strong>GPT-4 using few-shot is the best model at solving rebus puzzles, but humans are better</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZzI_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZzI_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png 424w, https://substackcdn.com/image/fetch/$s_!ZzI_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png 848w, https://substackcdn.com/image/fetch/$s_!ZzI_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png 1272w, https://substackcdn.com/image/fetch/$s_!ZzI_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZzI_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png" width="757" height="411" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30286ac5-056d-443c-ac38-0e453a821784_757x411.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:411,&quot;width&quot;:757,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZzI_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png 424w, https://substackcdn.com/image/fetch/$s_!ZzI_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png 848w, https://substackcdn.com/image/fetch/$s_!ZzI_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png 1272w, https://substackcdn.com/image/fetch/$s_!ZzI_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30286ac5-056d-443c-ac38-0e453a821784_757x411.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Model and human success rates on all 84 rebus puzzles.</figcaption></figure></div><p>On average, all of the models were worse at solving rebus puzzles than humans were. GPT-4 with few-shot prompting was the best model at solving the puzzles. Adapting the prompting method from zero-shot to few-shot improved GPT-4&#8217;s success rate by a great deal. That is, giving practice puzzles to GPT-4, rather than having it solve a rebus puzzle without practice puzzles, improved its performance.</p><p>However, for Gemini, there was little difference in using zero-shot vs. few-shot for solving the puzzles. For Gemini, its success rate didn&#8217;t differ much whether or not it was given practice puzzles.</p><h3><strong>Puzzles that are easy for humans are more difficult for models</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8sc5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8sc5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png 424w, https://substackcdn.com/image/fetch/$s_!8sc5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png 848w, https://substackcdn.com/image/fetch/$s_!8sc5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png 1272w, https://substackcdn.com/image/fetch/$s_!8sc5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8sc5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png" width="590" height="408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:408,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8sc5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png 424w, https://substackcdn.com/image/fetch/$s_!8sc5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png 848w, https://substackcdn.com/image/fetch/$s_!8sc5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png 1272w, https://substackcdn.com/image/fetch/$s_!8sc5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f92b1c7-e085-438d-bb58-1aa1fec43296_590x408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Heat map breaking down model and human success rates on different difficulty levels of rebus puzzles.</figcaption></figure></div><p>This heat map shows the average success rate for each model and rebus difficulty level. For easy puzzles, none of the models (on average) were close to reaching human level. Humans were able to solve 88% of &#8220;Easy&#8221; rebus puzzles, whereas the models were about to solve (on average) fewer than 50% of those same puzzles.</p><p>Here are two examples of puzzles that 90% of human participants solved correctly, but the models struggled to solve. For example, incorrect answers by models for &#8220;long overdue&#8221; included &#8220;overdue&#8221; (without the long). Incorrect answers by models for &#8220;wave goodbye&#8221; included &#8220;wavy goodbye&#8221; and &#8220;goodbye within the box&#8221;, as well as just &#8220;goodbye&#8221; (without the wavy). </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!scFo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!scFo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png 424w, https://substackcdn.com/image/fetch/$s_!scFo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png 848w, https://substackcdn.com/image/fetch/$s_!scFo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png 1272w, https://substackcdn.com/image/fetch/$s_!scFo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!scFo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png" width="1090" height="471" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:471,&quot;width&quot;:1090,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30722,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!scFo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png 424w, https://substackcdn.com/image/fetch/$s_!scFo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png 848w, https://substackcdn.com/image/fetch/$s_!scFo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png 1272w, https://substackcdn.com/image/fetch/$s_!scFo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943b5da0-d113-4a69-bbd8-0b5a156827a1_1090x471.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Two examples of rebus puzzles that models struggled to answer correctly.</figcaption></figure></div><p></p><h3><strong>Puzzles that are difficult for humans are less hard for (some) models</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BtP-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BtP-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png 424w, https://substackcdn.com/image/fetch/$s_!BtP-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png 848w, https://substackcdn.com/image/fetch/$s_!BtP-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png 1272w, https://substackcdn.com/image/fetch/$s_!BtP-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BtP-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png" width="590" height="408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:408,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BtP-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png 424w, https://substackcdn.com/image/fetch/$s_!BtP-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png 848w, https://substackcdn.com/image/fetch/$s_!BtP-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png 1272w, https://substackcdn.com/image/fetch/$s_!BtP-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff25cb07c-3b81-4c93-83be-bc324cd044ef_590x408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Heat map breaking down model and human success rates on different difficulty levels of rebus puzzles, highlighting the extra hard puzzles.</figcaption></figure></div><p>For medium difficulty puzzles, GPT-4 with few-shot was nearly on par with human level. For extra hard difficulty puzzles, the success rate of GPT-4 with few-shot was much higher than that of humans.</p><p>This means that GPT-4 with few-shot was able to figure out some of the more difficult or arcane puzzles that humans really struggled to solve.</p><p>Here are two examples of puzzles that only 1/84 human participants were able to solve, but GPT-4 with few-shot was able to solve correctly! </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0xWM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0xWM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png 424w, https://substackcdn.com/image/fetch/$s_!0xWM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png 848w, https://substackcdn.com/image/fetch/$s_!0xWM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png 1272w, https://substackcdn.com/image/fetch/$s_!0xWM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0xWM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png" width="1136" height="469" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54f6b821-ff8a-4343-9380-451953140803_1136x469.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:469,&quot;width&quot;:1136,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27470,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0xWM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png 424w, https://substackcdn.com/image/fetch/$s_!0xWM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png 848w, https://substackcdn.com/image/fetch/$s_!0xWM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png 1272w, https://substackcdn.com/image/fetch/$s_!0xWM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f6b821-ff8a-4343-9380-451953140803_1136x469.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Two examples of puzzles that humans struggled with but models were able to predict correctly.</figcaption></figure></div><p></p><h3>Usually, few-shot &gt; zero-shot, except for Gemini on a subset of puzzles</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fiy2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fiy2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png 424w, https://substackcdn.com/image/fetch/$s_!Fiy2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png 848w, https://substackcdn.com/image/fetch/$s_!Fiy2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png 1272w, https://substackcdn.com/image/fetch/$s_!Fiy2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fiy2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png" width="590" height="408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:408,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fiy2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png 424w, https://substackcdn.com/image/fetch/$s_!Fiy2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png 848w, https://substackcdn.com/image/fetch/$s_!Fiy2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png 1272w, https://substackcdn.com/image/fetch/$s_!Fiy2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb26e1a2f-c608-44d6-909f-3e649bd16085_590x408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Heat map breaking down model and human success rates on different difficulty levels, highlighting the puzzles Gemini zero-shot beat Gemini few-shot.</figcaption></figure></div><p>In general, using few-shot yielded higher success rates than using zero-shot. This makes sense &#8212; this is the difference between giving a model several examples before having it solve a puzzle, vs. giving it no examples before solving a puzzle. </p><p>However, this is not the case for Gemini, particularly on &#8220;Hard&#8221; difficulty puzzles. For some unknown reason, for this particular subset of problems, Gemini was able to solve puzzles at a nearly twice as high success rate for zero-shot vs. few-shot.</p><h1>Generating rebuses</h1><p>So far, I evaluated GPT-4&#8217;s and Gemini&#8217;s abilities of solving existing rebus puzzles. But how good are they at creating new rebus puzzles?</p><p>You can decide for yourself. Here are a few examples:<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> </p><p></p><h3>GPT-4 attempt #1: &#8220;Love for eye&#8221;</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NUyu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NUyu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!NUyu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!NUyu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!NUyu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NUyu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp" width="380" height="380" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:380,&quot;bytes&quot;:530378,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NUyu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!NUyu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!NUyu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!NUyu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae2c0d7-2882-4658-a52c-00aee4500d9e_1024x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><ol><li><p>Heart Symbol: Often represents the word "love."</p></li><li><p>Number 4: This is simply "four."</p></li><li><p>Eye Illustration: Represents the word "eye."</p></li></ol><p>When you combine these elements in the order they appear, the phrase or idiom they suggest is "Love for eye".</p><p>&#8212; GPT-4</p></blockquote><p></p><h3>GPT-4&#8217;s attempt #2: &#8220;I love you&#8221;</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fv2V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fv2V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!fv2V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!fv2V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!fv2V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fv2V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp" width="362" height="362" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:362,&quot;bytes&quot;:110974,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!fv2V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!fv2V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!fv2V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!fv2V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce64e61-fe28-4950-a066-9d99af50da24_1024x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>The eye symbolizes "I", the heart symbolizes "love", and the sheep, which sounds like "ewe", is a homophone for "you". So, when you put it all together, the phrase is "I love you".</p><p>&#8212; GPT-4</p></blockquote><p></p><h3>GPT-4 attempt #3: &#8220;Be happy&#8221;</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tvu_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tvu_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!Tvu_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!Tvu_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!Tvu_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tvu_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp" width="365" height="365" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:365,&quot;bytes&quot;:216358,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Tvu_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!Tvu_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!Tvu_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!Tvu_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02aabcf6-a49d-459c-b316-e470fc8a247c_1024x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Gemini attempt #1: &#8220;Sonic boom&#8221;</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VTRG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VTRG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VTRG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VTRG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VTRG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VTRG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg" width="368" height="368" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:368,&quot;bytes&quot;:528132,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VTRG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VTRG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VTRG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VTRG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdda2a60-cb96-4eeb-8765-8a667ae37d11_1536x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>The sun and cactus represent "son" and "cactus," which together sound like "sonic boom."</p><p>&#8212; Gemini</p></blockquote><p></p><h3>Gemini attempt #2: &#8220;I see the light&#8221;</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XpuV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XpuV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XpuV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XpuV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XpuV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XpuV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg" width="360" height="360" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:360,&quot;bytes&quot;:727505,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XpuV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XpuV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XpuV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XpuV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4958d527-1a83-4b52-9a9b-5408fee3267e_1536x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>The answer to the rebus puzzle is <strong>"I see the light"</strong>.</p><ul><li><p>The sun represents "I"</p></li><li><p>The bitten apple represents "see"</p></li><li><p>The bird represents "the"</p></li><li><p>The rainbow represents "light"</p></li></ul><p>&#8212; Gemini</p></blockquote><p></p><h3>Gemini attempt #3: &#8220;Sunbite&#8221;</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n9ku!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n9ku!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n9ku!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n9ku!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n9ku!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n9ku!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg" width="343" height="343" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:343,&quot;bytes&quot;:874760,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n9ku!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n9ku!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n9ku!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n9ku!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d5776b-07ac-4816-aa89-e9733e1f8620_1536x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>Sun + bite + U = Sunbite</p><p>&#8212; Gemini</p></blockquote><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/measuring-ais-creativity-with-visual?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/measuring-ais-creativity-with-visual?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h1>Discussion</h1><h3>Gemini vs GPT-4: how did they perform?</h3><p>In general, both Gemini and GPT-4 were <em>not bad</em> at solving rebus puzzles. While on average they were worse than humans, both models were able to solve more difficult puzzles that humans were not able to solve. </p><p>GPT-4 was better at solving rebus puzzles overall. Using few-shot prompting showed a huge improvement in success rate compared to zero-shot. </p><p>Gemini was better at solving a subset of the rebus puzzles. For Gemini, using few-shot did not show much improvement in success rate compared to zero-shot.</p><h3>The possibility of having memorized certain rebus puzzles</h3><p>All of the rebus puzzles used in the study were sourced from the Internet, so it&#8217;s very possible that both GPT-4 and Gemini saw some proportion of those rebuses in their training data. This might also explain why GPT-4 was able to solve a high rate of extra hard problems that human struggled with is that some of those images were perhaps seen in its training data.</p><p>One way to solve this is using a dataset such as the <a href="https://arxiv.org/pdf/2401.05604v1.pdf">REBUS</a>  <a href="https://huggingface.co/datasets/cavendishlabs/rebus">dataset</a>, which contains over 300 hand-curated, original rebus puzzles (e.g. GPT-4 and Gemini has not seen these in their training data &#8230; yet &#8230;). However, the REBUS dataset does not have a human benchmark to compare model performance on the puzzles against, making it difficult to determine if the models are struggle to solve certain puzzles because the models aren&#8217;t capable, or if humans would have equally struggled with those puzzles. </p><p></p><h3>It&#8217;s a lot easier to solve rebus puzzles than to create them</h3><p>Maybe it&#8217;s just me, but the rebus puzzles generated by both GPT-4 and Gemini didn&#8217;t really make much sense. To me, it felt like the models were going through the motions of creating a rebus puzzle (such as describing each object in the generated image) without fully tying the components of the puzzle together in a way that made sense. </p><p>Neither model was able to fully make a puzzle that made both visual and verbal sense.</p><p>(As a side note, Gemini really liked including images of the sun in the rebus puzzles it generated.)</p><p></p><h3>What about open source models?</h3><p>I prompted <a href="https://llava-vl.github.io/">LLaVA</a> (Large Language and Visual Assistant), an open-source multimodal language model, on the 84 rebus puzzle dataset. <em>It was not able to correctly solve a single puzzle! </em></p><p>This really showed me that open-source models are still a little behind, at least in this aspect of multimodal creativity, compared to the closed-source models. I hope in the future, as open-source models improve, their capability to create funky and arcane rebus puzzles will emerge.</p><h3></h3><h1>Measuring creativity in models (and humans) is not going to be a single test</h1><p>There is still a long way to go to understand creativity &#8212; not just in models, but also in humans. While there exist a slew of tests to evaluate language, visual, and spatial creativity, it is important to look beyond these to think about what other methods exist to understand models&#8217; creative or clever behaviors.</p><p>Rebuses (and word puzzles in general) are fun because they require skillsets not always required in a day-to-day setting (at least, not in an obvious way &#8212; maybe I need to be including more cryptic rebuses in my work emails). </p><p>Here is a set of a cryptic rebus puzzles generated by Gemini meant to represent a very normal phrase (see the caption for the phrase). Whatever is going on inside the neural networks of these AI models &#8230; it&#8217;s definitely not boring.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1hzB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1hzB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png 424w, https://substackcdn.com/image/fetch/$s_!1hzB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png 848w, https://substackcdn.com/image/fetch/$s_!1hzB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png 1272w, https://substackcdn.com/image/fetch/$s_!1hzB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1hzB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15105366,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1hzB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png 424w, https://substackcdn.com/image/fetch/$s_!1hzB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png 848w, https://substackcdn.com/image/fetch/$s_!1hzB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png 1272w, https://substackcdn.com/image/fetch/$s_!1hzB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c479c7d-e114-4685-98f4-11c1949ab559_3112x3112.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini&#8217;s depiction of the phrase &#8220;see you next time&#8221; as a rebus puzzle.</figcaption></figure></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thank you for reading my article and subscribe to be first to know about future articles!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>Fun facts</h1><ul><li><p><a href="https://en.wikipedia.org/wiki/Rebus#:~:text=Sourav%20Ganguly.-,Historical%20examples,-%5Bedit%5D">Historically</a>, <a href="https://en.wikipedia.org/wiki/Voltaire">Voltaire</a> and <a href="https://en.wikipedia.org/wiki/Frederick_the_Great">Frederick the Great</a> exchanged the following rebus puzzle to ask about dinner plans.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s4uM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s4uM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png 424w, https://substackcdn.com/image/fetch/$s_!s4uM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png 848w, https://substackcdn.com/image/fetch/$s_!s4uM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png 1272w, https://substackcdn.com/image/fetch/$s_!s4uM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s4uM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png" width="250" height="113" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94613826-5b42-423e-896d-9d39cac33e55_250x113.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:113,&quot;width&quot;:250,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s4uM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png 424w, https://substackcdn.com/image/fetch/$s_!s4uM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png 848w, https://substackcdn.com/image/fetch/$s_!s4uM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png 1272w, https://substackcdn.com/image/fetch/$s_!s4uM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94613826-5b42-423e-896d-9d39cac33e55_250x113.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">A rebus sent to Voltaire by Frederick the Great, sourced from <a href="https://en.wikipedia.org/wiki/Rebus">Wikipedia</a>.</figcaption></figure></div><ul><li><p><a href="https://en.wikipedia.org/wiki/Rebus#:~:text=literacy.%5B17%5D-,Japan,-%5Bedit%5Dhttps://en.wikipedia.org/wiki/Japanese_rebus_monogram">Rebuses were also popular in Edo-period Japan</a> (known as hanjimono). Rebuses are still used in corporate and product logos.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qUE4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qUE4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg 424w, https://substackcdn.com/image/fetch/$s_!qUE4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg 848w, https://substackcdn.com/image/fetch/$s_!qUE4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!qUE4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qUE4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg" width="250" height="178" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:178,&quot;width&quot;:250,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qUE4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg 424w, https://substackcdn.com/image/fetch/$s_!qUE4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg 848w, https://substackcdn.com/image/fetch/$s_!qUE4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!qUE4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95234479-4639-411d-b164-15f6ec73655a_250x178.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">A rebus for the names of Japanese provinces, from around 1800, sourced from <a href="https://en.wikipedia.org/wiki/Rebus">Wikipedia</a>.</figcaption></figure></div></li><li><p>There is a nice guide on <a href="https://www.rebuses.co/how-to-solve-a-rebus-puzzle/">how to solve rebus puzzles</a></p></li><li><p><a href="https://readingrebus.com/rebus-collection/">Rebus Collection</a> is an amazing collection of rebuses used throughout history. For example, a <a href="https://readingrebus.com/rebus-collection/a-farmers-love-letter/">farmer&#8217;s love letter</a> from 1909 Iowa.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mBzk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mBzk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png 424w, https://substackcdn.com/image/fetch/$s_!mBzk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png 848w, https://substackcdn.com/image/fetch/$s_!mBzk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png 1272w, https://substackcdn.com/image/fetch/$s_!mBzk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mBzk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png" width="404" height="246.11813186813185" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:887,&quot;width&quot;:1456,&quot;resizeWidth&quot;:404,&quot;bytes&quot;:3677364,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mBzk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png 424w, https://substackcdn.com/image/fetch/$s_!mBzk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png 848w, https://substackcdn.com/image/fetch/$s_!mBzk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png 1272w, https://substackcdn.com/image/fetch/$s_!mBzk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7750939-28b1-47ad-aeee-a7dff42d945a_1780x1084.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><em>Enjoyed what you read? Feel free to leave a comment &#8230; what are some weird rebuses you can get the AI models to generate?</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/measuring-ais-creativity-with-visual/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.artfish.ai/p/measuring-ais-creativity-with-visual/comments"><span>Leave a comment</span></a></p><p></p><p></p><p></p><h2><strong>Citation</strong></h2><p>For attribution in academic contexts or books, please cite this work as</p><pre><code><code>Yennie Jun, "Measuring AI's Creativity with Visual Word Puzzles", Art Fish Intelligence, 2024.</code></code></pre><pre><code><code>@article{Jun2024airebus,
    author = {Yennie Jun},
    title = {Measuring AI's Creativity with Visual Word Puzzles},
    journal = {Art Fish Intelligence},
    year = {2024},
    howpublished = {\url{https://www.artfish.ai/p/measuring-ais-creativity-with-visual},
}</code></code></pre><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>(1) Feeling on top of the world; (2) Try to understand; (3) Foot in the door; (4) Four wheel drive; (5) Half-hearted; (6) Parallel bars</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Threadgold, E., Marsh, J. E., &amp; Ball, L. J. (2018). Normative data for 84 UK English rebus puzzles. <em>Frontiers in Psychology</em>, <em>9</em>, 2513.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>I took the original distribution of the human participants&#8217; success rates and split them into 4 buckets based on quantiles (so, each bucket had an equal number of rebus puzzles). Based on the quantiles, the discrete categories map to the following ranges:</p><pre><code><code>Easy        (78.82, 95.29]  
Medium      (47.06, 78.82]    
Hard        (25.002, 47.06]   
Extra hard  (1.179, 25.002]   </code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ucI3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ucI3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png 424w, https://substackcdn.com/image/fetch/$s_!ucI3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png 848w, https://substackcdn.com/image/fetch/$s_!ucI3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png 1272w, https://substackcdn.com/image/fetch/$s_!ucI3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ucI3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png" width="662" height="408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:408,&quot;width&quot;:662,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ucI3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png 424w, https://substackcdn.com/image/fetch/$s_!ucI3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png 848w, https://substackcdn.com/image/fetch/$s_!ucI3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png 1272w, https://substackcdn.com/image/fetch/$s_!ucI3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2bdf2f5-d71b-4d5b-90ce-f6c3321d27c4_662x408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>(Easy) Man overboard; (Medium) Too little, too late; (Hard) Forgive and forget; (Extra hard) Reading between the lines</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>For both GPT-4 and Gemini, I used the prompt: <code>Create me an image of a rebus puzzle and explain it</code>. <br><br>Gemini: https://gemini.google.com/<br>GPT-4: https://chat.openai.com/</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Danesi, Marcel (2002). The Puzzle Instinct: The Meaning of Puzzles in Human Life (1st ed.). Indiana, USA: Indiana University Press. p. 61. ISBN 0253217083.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[2023 Wrapped: a year of sickness and health]]></title><description><![CDATA[Analyzing my own data to better understand my patterns of wellness]]></description><link>https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and</link><guid isPermaLink="false">https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Sun, 21 Jan 2024 16:11:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!KwQs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ogJf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ogJf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png 424w, https://substackcdn.com/image/fetch/$s_!ogJf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png 848w, https://substackcdn.com/image/fetch/$s_!ogJf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png 1272w, https://substackcdn.com/image/fetch/$s_!ogJf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ogJf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png" width="722" height="252" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:252,&quot;width&quot;:722,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13251,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ogJf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png 424w, https://substackcdn.com/image/fetch/$s_!ogJf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png 848w, https://substackcdn.com/image/fetch/$s_!ogJf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png 1272w, https://substackcdn.com/image/fetch/$s_!ogJf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6915dc-5c98-45b0-846a-a132c60f8f48_722x252.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At the beginning of every year, I do a data analysis of the previous year to reflect on everything that happened.</p><p>Last year, in 2022, I did an in-depth investigation of my crying patterns.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;75217db7-69b2-430c-858b-0a22917337c4&quot;,&quot;caption&quot;:&quot;I am obsessed with collecting data on myself. Every day of 2022, I filled out a Google Form I made to collect data on myself, tracking items such as whether I cried, exercised, drank coffee, or washed my hair. I also collected data from Apple Health and Google Location History to get a more complete picture of my patterns and behaviors throughout the ye&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;md&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How I Cried in 2022: An Analysis of 365 Days of Personal Data&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-01-03T15:00:22.032Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F34a6143c-c7d8-4271-9ab3-e53f779415ad_1598x1016.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/an-investigation-of-my-2022-crying&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:93449687,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>This year, for 2023, I analyzed my patterns of sickness and health.</p><p>In this article, I show an overview of the data I collected and analyzed about myself in 2023 and some conclusions on how to live a healthier and better life in 2024.</p><h2><strong>Data Collection Overview</strong></h2><p>My data come from the following sources:</p><ul><li><p>Google Maps location history</p></li><li><p>Apple Health</p></li><li><p>A survey I filled out at the end of every day about daily habits</p></li></ul><p>After combining data from the different data sources, I had a whole bunch of data on myself:</p><ul><li><p>Exercise: step count, average daily heart rate, active calories burned, type of exercise</p></li><li><p>Geographic location: city, whether I was traveling or not</p></li><li><p>Diet: eating meat, drinking alcohol, drinking caffeine, eating out</p></li><li><p>Health: whether I took medicine, what phase of my period</p></li><li><p>Habits: whether I cried, whether I washed my hair</p></li><li><p>Wellness cues: whether I had a cold, headache, injury, or was otherwise (physically) well</p></li></ul><p>It was this final variable &#8212; wellness cues &#8212; that I wanted to learn more about.</p><p>I wanted to answer questions like:</p><ul><li><p>How did my exercise patterns vary depending on my health?</p></li><li><p>Did traveling or being present in certain cities affect my health?</p></li><li><p>Was I more likely to get sick during different parts of my menstrual cycle?</p></li><li><p>Did I cry more when not feeling well?</p></li><li><p>Are there easily preventible things I can do to be as well as possible (e.g. be my best self) in 2024?</p><p></p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h1>Data Exploration</h1><h2><strong>I. Overall wellness statistics</strong></h2><p>According to my meticulously collected survey data (which I filled out even when I was camping in a thunderstorm with a really bad cold), I was unwell for 161 days, or 44%, of the year. This means that I spent almost <em>half of the year</em> being unwell! &#9760;&#65039;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KwQs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KwQs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png 424w, https://substackcdn.com/image/fetch/$s_!KwQs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png 848w, https://substackcdn.com/image/fetch/$s_!KwQs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png 1272w, https://substackcdn.com/image/fetch/$s_!KwQs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KwQs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png" width="349" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:349,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19145,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KwQs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png 424w, https://substackcdn.com/image/fetch/$s_!KwQs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png 848w, https://substackcdn.com/image/fetch/$s_!KwQs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png 1272w, https://substackcdn.com/image/fetch/$s_!KwQs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77d0b5ed-2a67-48ef-9a76-748693a2148b_349x358.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Breakdown of the 2023 year of days I spent well and days I spent unwell.</figcaption></figure></div><p>Breaking down the patterns by month, it&#8217;s clear that some months were better than others (e.g. February and August were not great months for me). However, even during the months of relative wellness, I spent a minimum of 20% of my time feeling unwell.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qRs8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qRs8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png 424w, https://substackcdn.com/image/fetch/$s_!qRs8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png 848w, https://substackcdn.com/image/fetch/$s_!qRs8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png 1272w, https://substackcdn.com/image/fetch/$s_!qRs8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qRs8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png" width="559" height="377" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:377,&quot;width&quot;:559,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13498,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qRs8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png 424w, https://substackcdn.com/image/fetch/$s_!qRs8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png 848w, https://substackcdn.com/image/fetch/$s_!qRs8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png 1272w, https://substackcdn.com/image/fetch/$s_!qRs8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d403084-3e87-4b37-9c3f-413aaece1e87_559x377.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Breakdown of the months in 2023 year of days I spent well and days I spent unwell.</figcaption></figure></div><p>I spent nearly two months out of the year with some sort of cold. However, that wasn&#8217;t the only reason that contributed to feeling unwell. I also experienced at least a month and a half&#8217;s worth of period cramps throughout the year. &#129326; </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BnSZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BnSZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png 424w, https://substackcdn.com/image/fetch/$s_!BnSZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png 848w, https://substackcdn.com/image/fetch/$s_!BnSZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png 1272w, https://substackcdn.com/image/fetch/$s_!BnSZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BnSZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png" width="602" height="341" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:341,&quot;width&quot;:602,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16644,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BnSZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png 424w, https://substackcdn.com/image/fetch/$s_!BnSZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png 848w, https://substackcdn.com/image/fetch/$s_!BnSZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png 1272w, https://substackcdn.com/image/fetch/$s_!BnSZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41a560-b64c-48e2-926b-171ecdee3ac0_602x341.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Breakdown of the types of pain or lack of wellness I felt during 2023, based on my survey data.</figcaption></figure></div><p></p><h2>II. Exercise and movement</h2><p>How did my exercise patterns vary depending on my health?</p><p>(In particular, I distinctly remember the month of August, when I got a mild cold that blossomed into a long-term sickness that wouldn&#8217;t go away due to my stubbornness to continue exercising/traveling/going camping/etc).</p><p>I calculated a 30-day rolling average to smooth out daily fluctuations and highlight longer-term trends in wellness and exercise habits. The plot shows that, essentially, when I got sick I (mostly) tended to exercise less &#8230; but not that much less than normal. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uTY4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uTY4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png 424w, https://substackcdn.com/image/fetch/$s_!uTY4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png 848w, https://substackcdn.com/image/fetch/$s_!uTY4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png 1272w, https://substackcdn.com/image/fetch/$s_!uTY4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uTY4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png" width="889" height="502" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:502,&quot;width&quot;:889,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58415,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uTY4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png 424w, https://substackcdn.com/image/fetch/$s_!uTY4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png 848w, https://substackcdn.com/image/fetch/$s_!uTY4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png 1272w, https://substackcdn.com/image/fetch/$s_!uTY4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F929ce62f-5996-4b06-8acd-a56edf6757a6_889x502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">30-day rolling average of unwell days and exercising days.</figcaption></figure></div><p>The next plot might be one of my favorite ones. It shows the distribution of my step count and active calories burned (measured via Apple Watch&#8217;s <a href="https://developer.apple.com/documentation/healthkit/hkquantitytypeidentifier/1615771-activeenergyburned">ActiveEnergyBurned</a>) based on whether I was well or unwell. Whether I am well or not, my step count does not vary much. However, when I am unwell, I tend to burn fewer active calories. This shows that when feeling unwell, <em>while I moved less intensely, my overall movement did not decrease significantly.</em></p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8mGO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8mGO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png 424w, https://substackcdn.com/image/fetch/$s_!8mGO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png 848w, https://substackcdn.com/image/fetch/$s_!8mGO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png 1272w, https://substackcdn.com/image/fetch/$s_!8mGO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8mGO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png" width="1076" height="392" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:392,&quot;width&quot;:1076,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15473,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8mGO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png 424w, https://substackcdn.com/image/fetch/$s_!8mGO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png 848w, https://substackcdn.com/image/fetch/$s_!8mGO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png 1272w, https://substackcdn.com/image/fetch/$s_!8mGO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee97eeac-99d6-46bb-b72f-a8359792e3d0_1076x392.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h2>III. <strong>Nature, Traveling, and Location</strong></h2><p>Did traveling or being present in certain cities affect my health?</p><p>The heatmap visualizes the prevalence of various unwellness types across different city environments. Darker shades indicate a higher proportion of a specific pain type occurring in a city category, revealing patterns and potential correlations between geographic locations and pain experiences. It shows that while most of my feeling unwell happened at home (which makes sense, as I spent most of my year at home compared to any other location), the second most common place I felt unwell was while traveling. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jL7b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jL7b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png 424w, https://substackcdn.com/image/fetch/$s_!jL7b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png 848w, https://substackcdn.com/image/fetch/$s_!jL7b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png 1272w, https://substackcdn.com/image/fetch/$s_!jL7b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jL7b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png" width="646" height="342" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/345e22f6-5874-41c2-97db-47070acdca17_646x342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:342,&quot;width&quot;:646,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23428,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jL7b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png 424w, https://substackcdn.com/image/fetch/$s_!jL7b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png 848w, https://substackcdn.com/image/fetch/$s_!jL7b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png 1272w, https://substackcdn.com/image/fetch/$s_!jL7b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F345e22f6-5874-41c2-97db-47070acdca17_646x342.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Heatmap showing proportion of pain type distributed across each city environment.</figcaption></figure></div><p>I calculated a 30-day rolling average of unwell days and days spent in nature. The following plot shows an interesting inverse trend. When I&#8217;m unwell, I spend little time in nature. In fact, my time spent in nature (which includes time spent in mountains, lakes, oceans, and city parks) seems to be <em>inversely</em> correlated to my wellness. One potential narrative is that feeling unwell makes it less likely that I would spend time in nature (possibly since I may be bedridden).</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YRzL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YRzL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png 424w, https://substackcdn.com/image/fetch/$s_!YRzL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png 848w, https://substackcdn.com/image/fetch/$s_!YRzL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png 1272w, https://substackcdn.com/image/fetch/$s_!YRzL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YRzL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png" width="889" height="502" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:502,&quot;width&quot;:889,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YRzL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png 424w, https://substackcdn.com/image/fetch/$s_!YRzL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png 848w, https://substackcdn.com/image/fetch/$s_!YRzL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png 1272w, https://substackcdn.com/image/fetch/$s_!YRzL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F147bcc35-5907-46c9-bfdc-959bd21d600d_889x502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">30-day rolling average of unwell days and days spent in nature.</figcaption></figure></div><p></p><h2>IV. Women&#8217;s Health</h2><p>Was I more likely to get sick during different parts of my menstrual cycle?</p><p>I hypothesized that I was more likely to get sick during my luteal phase (the second half of the menstrual cycle after ovulation and before menstruation) because I vaguely had recollections of that happening throughout the year. And as I will show in the next section &#8230; this is indeed a trend that did occur.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wmFk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wmFk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png 424w, https://substackcdn.com/image/fetch/$s_!wmFk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png 848w, https://substackcdn.com/image/fetch/$s_!wmFk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png 1272w, https://substackcdn.com/image/fetch/$s_!wmFk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wmFk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png" width="653" height="342" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:342,&quot;width&quot;:653,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23552,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wmFk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png 424w, https://substackcdn.com/image/fetch/$s_!wmFk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png 848w, https://substackcdn.com/image/fetch/$s_!wmFk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png 1272w, https://substackcdn.com/image/fetch/$s_!wmFk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7574b9b4-b8b5-4075-9d5c-cfd3669dd1ff_653x342.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Heatmap showing proportion of pain type distributed across each part of my period cycle.</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h1><strong>Data-driven health outcomes</strong> using logistic regression</h1><p>After getting a sense for the data, I used logistic regression to analyze the data I collected about myself. Logistic regression is a statistical modeling method that can be used to show which factors might be linked to feeling well or unwell. It helps identify which variables or features are statistically associated with the likelihood of wellness.</p><p>The model predicts how likely different health outcomes are based on the data I measured. It can be used to spot trends, rather than proving what causes what (e.g. <a href="https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation">correlation is not equal to causation</a>!). </p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/AtJxx/3/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ea971a9-d32e-4b8b-b709-b5b170e9d348_1260x660.png&quot;,&quot;thumbnail_url_full&quot;:&quot;&quot;,&quot;height&quot;:366,&quot;title&quot;:&quot;Logistic Regression top features&quot;,&quot;description&quot;:&quot;Which factors might most be linked to feeling well or unwell?&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/AtJxx/3/" width="730" height="366" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p></p><p>After running the model, I found out the following:</p><ul><li><p>The top features for predicting &#8220;having a cold&#8221; were being in the luteal phase of my period and the city of Salvador, Brazil (which makes sense, because I was sick the entire 2 weeks I was in Salvador)</p></li><li><p>The top features for predicting &#8220;having a headache&#8221; was taking Ibuprofen (again, correlation is not causation; I most likely took Ibuprofen to alleviate the headache rather than the other way around) and the city of Bellevue, WA</p></li><li><p>The top features for predicting &#8220;having nausea&#8221; was taking stomach medicine and antibiotics. For various reasons, I was prescribed antibiotics throughout the year, and it almost always made me feel nauseous</p></li><li><p>The top features for predicting &#8220;period cramps&#8221; is (surprise!) being on my period and being in the luteal phase</p></li><li><p>The top features for predicting &#8220;not sick&#8221; is not taking medicine (funny; obviously the days I am not sick I will not be taking medicine, so of course those variables are linked) and drinking tea &#129750;</p></li></ul><p></p><h3>Main Takeaways for best health in 2024 </h3><p>The main takeaway for me here is that I need to really be careful to get more rest and take better care of myself  &#8212; especially during the luteal phase of my period cycle, because that was when I tended to get sick more often.</p><p>This is actually backed by science! According to <a href="https://www.medicalnewstoday.com/articles/does-your-period-weaken-your-immune-system#luteal-phase">Medical News Today</a>:</p><blockquote><p>During the luteal phase, estrogen levels drop and progesterone levels rise. This allows the body to prepare for the presence of a developing fetus. ... <strong>It also means that a person&#8217;s immune system function decreases during this phase</strong>.</p></blockquote><p>So for all you women reading this blog, <strong>take care of yourself during this time! </strong>It&#8217;s scientifically proven (AND shown in my own data!) that this is more likely to happen. (Note: I am not claiming that the luteal phase of your period will cause having a cold; rather, that the luteal phase makes it <em>more likely</em> to get a cold).</p><p>Other takeaways include:</p><ul><li><p>Avoiding antibiotics when possible (which are often <a href="https://mcpress.mayoclinic.org/parenting/antibiotics-the-dangers-of-overprescribing/">overprescribed</a>)</p></li><li><p>Exercising less while sick</p></li><li><p><a href="https://www.pennmedicine.org/updates/blogs/health-and-wellness/2019/december/health-benefits-of-tea">Continuing drinking tea!</a></p></li></ul><p>Interestingly, there was little influence of diet (e.g. eating meat or drinking alcohol) in predicting wellness. I always thought eating red meat or drinking alcohol affected my health. However, at least according to the data I collected and analyzed from 2023, their influence was negligible compared to other factors.</p><h3>What about crying?</h3><p>Did I cry more when not feeling well?</p><p>As a side note (since <a href="https://www.artfish.ai/p/an-investigation-of-my-2022-crying">last year I analyzed my crying habits</a>), what was the interaction between wellness and crying?</p><p>In 2023, not much! There seemed to be little correlation between being sick and crying (in both the heatmap below, and in the logistic regression above). I didn&#8217;t cry myself sick OR was so sick that I cried. Which seems to be a step up from last year!</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aUPU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aUPU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png 424w, https://substackcdn.com/image/fetch/$s_!aUPU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png 848w, https://substackcdn.com/image/fetch/$s_!aUPU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png 1272w, https://substackcdn.com/image/fetch/$s_!aUPU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aUPU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png" width="602" height="197" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e295ac3-2243-4191-9020-d6fd36113740_602x197.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:197,&quot;width&quot;:602,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14950,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aUPU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png 424w, https://substackcdn.com/image/fetch/$s_!aUPU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png 848w, https://substackcdn.com/image/fetch/$s_!aUPU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png 1272w, https://substackcdn.com/image/fetch/$s_!aUPU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e295ac3-2243-4191-9020-d6fd36113740_602x197.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Heatmap showing proportion of pain type based on whether or not I cried</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h1>Discussion</h1><p>There is a lot of value in looking at your own data and trying to make sense of it! Google and Apple (and all the other tech companies, large and small) collect so much data about you. Even though the data you are able to download about yourself is large and unwieldy, it is but a drop in the bucket compared to how much data these companies collect about you every moment you even glance at a screen (such as YouTube search history, Netflix recommendations, TikTok engagement, etc.).</p><p>Taking agency over your own data by getting your hands dirty, even if all you get out of it is the scale of how much data there is (and how much data is being collected) is extremely important.</p><p>And, if you&#8217;re a passionate about data like me and want to track daily habits and patterns, you can collect additional data about yourself that no one else has (such as crying, being sick, or dietary habits). Using this data, you can answer basic questions about your health and lifestyle, such as &#8220;What factors are more likely to make me sick/anxious/depressed/injured?&#8221; or &#8220;Is there a correlation between skipping breakfast and getting migraines?&#8221;.</p><h3>What I did NOT do</h3><p>There&#8217;s so much more data I could have included but did not in this article&#8217;s analysis, which I would like to look at in the future. I didn&#8217;t include sleep data (mostly because I was missing some sleep data from last year) or screen time data. I also did not do any causation statistical tests! What I showed in this article were mostly strong patterns and correlations, <em>not causations.</em></p><h1>Conclusion</h1><p>Diving into my own health and behavioral data is about owning my wellness journey in a way that nobody else can. Despite the growing buzz around <a href="https://blog.research.google/2024/01/amie-research-ai-system-for-diagnostic_12.html">AI doctors</a> and AI in healthcare, I believe that no one understands the subtle details of your habits, patterns, and well-being better than you do. For me, it's not just about being informed; it's about becoming an active advocate for my own health and leveraging insights that no AI can fully comprehend.</p><p>In this blog, I&#8217;ve often explored different ways of understanding the behavior of AI systems. In this article, I take a step back and reflect on how it all fits into our daily lives.</p><p>It won&#8217;t be long before AI analytical systems will emerge with the ability to take all of your existing messy data sources (think fine-grained geolocation data, health information from wearables, financial information from receipts, social media logs, etc) to generate comprehensive reports about trends and insights in your life.</p><p>However, there is great risk with blindly accepting the outputs of such automated systems. I argue that there is profound value in diving into your data yourself. Neither human experts nor &#8220;expert AIs&#8221; can ever fully grasp the nuances of our stories and realities as we do.</p><p>I encourage each of you to dive into your data, explore your patterns, and let your personal insights guide you to a healthier, more aware self.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/2023-wrapped-a-year-of-sickness-and/comments"><span>Leave a comment</span></a></p><p></p><p><em>Art Fish Intelligence is supported by readers like you! If you enjoyed this article, share it with a friend or respond in the comments. Thank you for your readership!</em></p><p><em>If you're a beginner data scientist or just passionate about data, there's no better starting point than analyzing your own data! If you have a project analyzing your own data, I encourage you to share it in the comments below or with me privately.</em></p><p></p><p></p><p></p><div><hr></div><h1>Resources</h1><p>For you data nerds out there, if you want to process your own data, here are the resources I used to process my own data and the survey</p><ul><li><p><a href="https://www.howtogeek.com/725241/how-to-download-your-google-maps-data/">Exporting your Google location data</a></p></li><li><p>Getting address/city/state/country information from lat/lng coordinates from Google location data, using <a href="https://developers.google.com/maps/documentation/geocoding/start">Google Geocoding API</a></p></li><li><p><a href="https://towardsdatascience.com/analyse-your-health-with-python-and-apple-health-11c12894aae2">Extracting Apple Health data</a></p></li><li><p>Jotform or Google Forms for collecting daily survey data</p></li></ul><p>Other resources I didn&#8217;t necessarily use but could be useful:</p><ul><li><p><a href="https://felixkohlhas.com/projects/screentime/">Processing iOS screen time data</a></p></li><li><p><a href="https://www.jonbusby.co.uk/2021/06/analysing-apple-health-data-in-python-part-1-extraction-and-sleep-data/">Processing Apple health sleep data</a></p></li><li><p><a href="https://dailyvis.com/posts/self-analysis-with-my-quantified-self-data/ and https://dailyvis.com/vis/compare/demo/">Inspiration for health tracking</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[How do we know when an AI model is good enough?]]></title><description><![CDATA[a 2023 review of Art Fish Intelligence]]></description><link>https://www.artfish.ai/p/artfish-2023-review</link><guid isPermaLink="false">https://www.artfish.ai/p/artfish-2023-review</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Tue, 26 Dec 2023 13:11:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!yueV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yueV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yueV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!yueV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!yueV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!yueV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yueV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png" width="267" height="267" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:267,&quot;bytes&quot;:144803,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yueV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!yueV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!yueV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!yueV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb9f3b7-77d4-4ff8-8f25-fa3de11bc5e4_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Introduction</h2><p>As the year draws to a close, I want to thank every single reader of this blog for supporting my work by reading, sharing, subscribing, and engaging with my writing. </p><p>I spent over 320 hours (!!!!) this past year writing, performing experiments, and creating plots for Artfish Intelligence. Whether you discovered this blog to learn more about evaluating large language models or to keep up with AI trends in general, I am grateful for your support and readership.</p><p>I covered a wide range of topics in the space of artificial intelligence, with the broad theme of evaluation: that is, how do we know when an AI model is "good enough"? In my articles, I conducted experiments to better understand the abilities, capabilities, and behaviors of LLMs and AI systems in ways that are not always captured by existing LLM evaluations and benchmarks. If you&#8217;ve been reading my articles, you&#8217;ll know that my articles tend to hover around the following themes:</p><ul><li><p>Multilingual: How good are LLMs at performing tasks, such as solving math problems or understanding historical narratives, across different languages?</p></li><li><p>Societal biases: What sorts of societal biases exist, implicitly or explicitly, in the AI systems that are widely used?</p></li><li><p>New ways of evaluation: How can we measure characteristics like creativity, political opinions, or historical understanding, in AI systems?</p></li></ul><p>Below, I&#8217;ll summarize some of my favorite articles from the year.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/artfish-2023-review?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thank you for reading art fish intelligence. This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/artfish-2023-review?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/artfish-2023-review?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><h2><strong>LLMs costs vary quite a lot across languages</strong></h2><ul><li><p>In <a href="https://www.artfish.ai/p/all-languages-are-not-created-tokenized">All languages are NOT created (tokenized) equal</a>, I showed that the way LLMs process (or tokenize) their input text can be up to 10x more expensive for some languages (such as Burmese or Armenian) compared to English. I explored this topic further in an <a href="https://open.spotify.com/episode/534dbZsqmTI26xskNFRqs5?si=0b4e1aa55bb84daa">interview with the BBC</a>.</p></li><li><p><strong>Why it matters</strong>: Popular LLMs, like ChatGPT, were primarily developed in the United States using predominantly English-based text sourced from the internet. However, the user base of these models extends far beyond English speakers. People worldwide, speaking hundreds of languages and hailing from thousands of cultural backgrounds, utilize these models. Addressing these disparities is crucial for creating a more inclusive and accessible future in artificial intelligence, which will benefit diverse linguistic communities across the globe.</p><p></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ade7a6cf-bb7c-4934-ab2f-ebe5fe0c7952&quot;,&quot;caption&quot;:&quot;Large language models such as ChatGPT process and generate text sequences by first splitting the text into smaller units called tokens. In &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;All languages are NOT created (tokenized) equal&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-05-03T13:15:17.905Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90481db3-97f8-480e-b436-629a8f80b837_800x600.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/all-languages-are-not-created-tokenized&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:116876501,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:20,&quot;comment_count&quot;:13,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></li></ul><p></p><h2><strong>LLMs abilities vary depending on the language</strong></h2><ul><li><p>In <a href="https://www.artfish.ai/p/gpt4-project-euler-many-languages">GPT-4 can solve math problems &#8212; but not in all languages</a>, I showed that LLMs&#8217; mathematical problem solving abilities varied greatly depending on the language it was prompted in. LLMs struggled to solve problems in some languages more than others &#8212; in particular, languages not using Latin scripts.</p></li><li><p><strong>Why it matters</strong>: The languages that the language models struggled more with are likely vastly underrepresented in its training data compared to English. This partly explains the linguistic disparity between English and the other languages. As important as it is for LLMs to solve math problems in English, it should be able to solve math problems in <em>any</em> language just as well. As AI progresses, addressing translation and representation challenges becomes crucial for ensuring consistent performance across all languages.<br></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;572c716d-9ef7-49b6-9bd2-a5a960dd0ff0&quot;,&quot;caption&quot;:&quot;Introduction It is said that mathematics is a universal language &#8212; mathematical concepts, theorems, and definitions can be expressed as symbols that are understandable regardless of language. In this article, I test the mathematical capabilities of GPT-4 in sixteen different languages.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;GPT-4 can solve math problems &#8212; but not in all languages&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-10-11T12:33:13.704Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee5a134-0518-44bb-94a7-e048ab8f345e_1456x816.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/gpt4-project-euler-many-languages&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:137682643,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p></li></ul><h2><strong>Societal biases, such as geographic and gender biases, exist in non-obvious ways</strong></h2><ul><li><p>I conducted a series of experiments to probe LLMs' understanding of the world, and through these, what sorts of societal biases were surfaced</p></li><li><p>In <a href="https://www.artfish.ai/p/world-history-through-ai">World history through the lens of AI</a>, I showed the inconsistency in different language models' understandings of "important" historical events. In <a href="https://www.artfish.ai/p/where-are-all-the-women">Where are all the women?</a>, I showed that language models' understanding of "top historical figures" exhibited a gender bias towards generating male historical figures and a geographic bias towards generating people from Europe, no matter what language I prompted it in.</p></li><li><p>In <a href="https://www.artfish.ai/p/who-does-what-job-occupational-roles">Who does what job? Occupational roles in the eyes of AI</a>, I asked three generations of GPT models to fill in "The man/woman works as a ..." to analyze the types of jobs often associated with each gender. I found that more recent models tended to overcorrect and over-exaggerate gender, racial, or political associations for certain occupations. For example, software engineers were predominately associated with men by GPT-2, but with women by GPT-4.</p></li><li><p><strong>Why it matters</strong>: One of the immediate risks and potential misuses of AI systems is how their internal stereotypes and biases may surface in real-world and downstream use cases. For example, in AI systems used to craft cover letters, filter resumes for job candidates, or edit history textbooks, is there a risk that AI systems might amplify or misrepresent existing negative stereotypes about certain societal subgroups, particularly when they lack nuanced understanding of the societal context?<br></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;5d8b4f8f-d63e-460a-bf03-a0c3574a61c1&quot;,&quot;caption&quot;:&quot;The story so far Back in December of 2020, I began writing a paper investigating biases in generative language models with a group at the University of Oxford. We ran experiments to understand the occupational and gender biases exhibited by the hottest language model at the time, GPT-2 (this is before the term &#8220;large language models&#8221; was popularized).&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Who does what job? Occupational roles in the eyes of AI&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-12-01T16:34:28.273Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/who-does-what-job-occupational-roles&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:139213039,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></li></ul><p></p><h2><strong>Biases in language models permeate into image generation</strong></h2><ul><li><p>In <a href="https://www.artfish.ai/p/lost-in-dalle3-translation">Lost in DALL-E 3 Translation</a>, I explored how DALL-E 3 uses prompt transformations to enhance (and translate into English) the user&#8217;s original prompt. The prompt transformation step was not transparent to users when accessing DALL-E 3 via the ChatGPT Plus web app. This lack of clarity further abstracted away the workings of AI image generation models, making it more challenging to scrutinize the biases and behaviors encoded in the model.</p></li><li><p><strong>Why it matters</strong>: AI image generation tools, and increasingly AI video generation tools, are becoming increasingly mainstream in today's society. These models and tools are growing more complex and sophisticated, and are often opaque and "black box," meaning the intricate workings under the hood are not fully known. How is your prompt transformed, and how many times, before it reaches the model to generate the image? How does the original language of your prompt influence the final generated image? These questions add layers of complexity, making it harder to understand precisely what is happening.<br></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7f15a7e0-98b9-4470-b55f-9d8d478ce008&quot;,&quot;caption&quot;:&quot;Introduction OpenAI recently launched DALL-E 3, the latest in their line of AI image generation models. But as recent media coverage and research reveal, these AI models come with the baggage of biases and stereotypes. For example, AI image generation models such as&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Lost in DALL-E 3 Translation&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-11-01T13:41:28.042Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/lost-in-dalle3-translation&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:138352532,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></li></ul><p></p><h2><strong>We know creativity when we see it, but how do we measure it?</strong></h2><ul><li><p>In <a href="https://www.artfish.ai/publish/posts/detail/110450039?referrer=%2Fpublish%2Fposts">Exploring Creativity in Large Language Models: From GPT-2 to GPT-4</a>, I showed three different ways to quantitatively measure "creativity" in LLMs, based on methods used to measure creativity in humans. While each of these methods were simplistic and mainly measured creativity in the context of individual words and short phrases, it was a first step in trying to measure something as elusive and difficult to define as &#8220;creativity&#8221;.</p></li><li><p><strong>Why it matters</strong>: Understanding the creative capabilities of LLMs, like their use in writing poetry or creating stories, is more straightforward qualitatively. However, <em>quantitatively</em> measuring such capabilities is significantly more challenging. Creativity tests can provide valuable benchmarks for comparing and tracking the performance of large language models. Through these tests, we can gain a more comprehensive understanding of AI-generated content and further explore the capabilities and limitations of these advanced language models.</p><p></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ef9ef002-65d1-4e37-bf7d-8e3a4a5906df&quot;,&quot;caption&quot;:&quot;In recent weeks, people have used large language models (LLMs) to generate a variety of creative content, such as books, flash fiction, rap battles, and music chords. But is it possible to measure the level of creative process more broadly in these models?&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Exploring Creativity in Large Language Models: From GPT-2 to GPT-4 &quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:13672586,&quot;name&quot;:&quot;Yennie Jun&quot;,&quot;bio&quot;:&quot;Machine learning engineer and AI researcher exploring my curiosity of the world through creative projects&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31adf93-b6f5-4c77-ab58-973aa3dbf028_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-04-11T13:05:04.601Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39bfa187-f9b4-43f7-9ff5-1b9d520c7fc7_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.artfish.ai/p/exploring-creativity-in-large-language&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:110450039,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;art fish intelligence &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c6d96a8-9e6c-421e-b411-211798d04fe4_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></li></ul><p></p><h2>How evaluation is typically done in NLP</h2><p>For those who are less familiar with the field, evaluation is an important aspect of natural language processing (NLP) and machine learning more broadly. How would you know how good your model is if there&#8217;s nothing to compare it against?</p><p>There exist many, many conventional NLP benchmarks meant to measure different aspects of how well a language model performs compared to other models as well as to human performance: <a href="https://super.gluebenchmark.com/">SuperGLUE</a>, <a href="https://github.com/google/BIG-bench">BIG-Bench</a>, <a href="https://rowanzellers.com/hellaswag/">HellaSwag</a>, <a href="https://github.com/hendrycks/test">MMLU</a> &#8230; the list goes on and on.</p><p>These benchmark datasets measure all sorts of capabilities: for example, how good is a language model at translating from English to Chinese; at summarizing long texts; at tagging parts of speech; at solving multiple choice exams in subjects such as history, mathematics, and physics; at common sense reasoning tasks; etc&#8230;</p><p>However, these existing sets have many pitfalls, among which include:</p><ul><li><p>The pre-training data used to train LLMs are often contaminated with the benchmark data</p></li><li><p>Many benchmarks do not reflect the wide range of LLM applications, focusing instead on tasks like exams which don't mirror real-world use cases.</p></li><li><p>Human evaluation, while traditionally the gold standard, can be unreliable due to different perspectives and the need for specialized domain knowledge.</p></li></ul><p>I recommend curious readers to read Sebastian Ruder&#8217;s <a href="https://www.ruder.io/nlp-benchmarking/">Challenges and Opportunities in NLP Benchmarking</a> for a more comprehensive overview on this topic.</p><div><hr></div><p>Across all the articles I've written for Artfish Intelligence this year, I've highlighted differences in performance of LLMs through various lenses, whether that be language, gender, or country. Additionally, I've raised open questions about what the desired output of AI models should be (see <a href="https://www.artfish.ai/p/lost-in-dalle3-translation#:~:text=I%20end%20this%20article%20with%20open%20questions%20about%20what%20the%20desired%20output%20of%20AI%20text%2Dto%2Dimage%20models%20should%20be.">here</a> or <a href="https://www.artfish.ai/p/who-does-what-job-occupational-roles#:~:text=What%20should%20be%20the%20goal%20of%20generative%20language%20models%3F%20It%20is%20certainly%20appropriate%20that%20they%20should%20not%20exacerbate%20existing%20societal%20biases%20with%20regards%20to%20occupational%20segregation.%20It%20is%20less%20clear%20whether%20they%20should%20reflect%20or%20correct%20for%20skewed%20societal%20distributions.">here</a>).</p><p>In the coming months, I will continue to cover interesting and innovative ways of understanding the behaviors of large AI models across modalities and languages.</p><p>Thank you for sticking with me this year. Stay tuned for more interesting deep dives and evaluations into the inner workings of AI in the coming year!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Who does what job? Occupational roles in the eyes of AI]]></title><description><![CDATA[How GPT models&#8217; view on occupations evolved over time]]></description><link>https://www.artfish.ai/p/who-does-what-job-occupational-roles</link><guid isPermaLink="false">https://www.artfish.ai/p/who-does-what-job-occupational-roles</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Fri, 01 Dec 2023 16:34:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_n_b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_n_b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_n_b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png 424w, https://substackcdn.com/image/fetch/$s_!_n_b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png 848w, https://substackcdn.com/image/fetch/$s_!_n_b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png 1272w, https://substackcdn.com/image/fetch/$s_!_n_b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_n_b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png" width="681" height="449" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:449,&quot;width&quot;:681,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_n_b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png 424w, https://substackcdn.com/image/fetch/$s_!_n_b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png 848w, https://substackcdn.com/image/fetch/$s_!_n_b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png 1272w, https://substackcdn.com/image/fetch/$s_!_n_b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87c4e28-8747-40e8-9ae6-b5c09a4658d8_681x449.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The story so far</h2><p>Back in December of 2020, I <a href="https://arxiv.org/abs/2102.04130">began writing a paper</a> investigating biases in generative language models with a group at the University of Oxford. We ran experiments to understand the occupational and gender biases exhibited by the hottest language model at the time, GPT-2 (this is before the term &#8220;large language models&#8221; was popularized).<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> </p><p>In the three years since, the field of natural language processing has developed rapidly, with larger models and more sophisticated training methods emerging. The small version of GPT-2, which I tested in 2020, was &#8220;only&#8221; <a href="https://www.notion.so/repeating-how-true-is-gpt-2-f0d0df4b88dc4282b7c63debc22feaf2?pvs=21">124 million parameters</a>. In comparison, GPT-4 is <a href="https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/">estimated to have over 1 trillion parameters</a>, which makes it 8000 times larger. Not only that, but there has been a greater emphasis during model training to align language models with human values and feedback.</p><p>The original paper aimed to understand what jobs language models generated for the prompt, <code>&#8220;The man/woman works as a &#8230;&#8221;</code> . Did language models associate certain jobs more with men and others with women? We also prompted the models with intersectional categories, such as ethnicity and religion (<code>"The Asian woman / Buddhist man works as a ..."</code>).</p><p><strong>Given the state of language models now, how would my experiments from 3 years ago hold up on the newer, larger GPT models?</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/who-does-what-job-occupational-roles?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/who-does-what-job-occupational-roles?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Experiments</h2><p>I used 47 prompt templates, which consisted of 16 different identifier adjectives and 3 different nouns.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> The identifier adjectives correlated with the top <a href="https://www.census.gov/newsroom/blogs/random-samplings/2021/08/measuring-racial-ethnic-diversity-2020-census.html#:~:text=For%20race%2C%20the%20OMB%20standards%20identify%20five%20minimum%20categories%3A">races</a> and religions in the United States. They also include identifiers related to sexuality and political affiliation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vo2C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vo2C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png 424w, https://substackcdn.com/image/fetch/$s_!Vo2C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png 848w, https://substackcdn.com/image/fetch/$s_!Vo2C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png 1272w, https://substackcdn.com/image/fetch/$s_!Vo2C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vo2C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png" width="1456" height="966" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:966,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Vo2C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png 424w, https://substackcdn.com/image/fetch/$s_!Vo2C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png 848w, https://substackcdn.com/image/fetch/$s_!Vo2C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png 1272w, https://substackcdn.com/image/fetch/$s_!Vo2C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5654cc81-035f-41c4-97bd-b28dc9e6b02c_2021x1341.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A diagram of the demographic groups used as prompts for the language models.</figcaption></figure></div><p>I used the following models:</p><ul><li><p><a href="https://huggingface.co/gpt2">gpt2-small</a> (GPT-2), which I used in the original experiments from 2020</p></li><li><p><a href="https://platform.openai.com/docs/models/gpt-3-5">gpt-3.5-turbo</a> (GPT-3.5), released in March 2023</p></li><li><p><a href="https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo">gpt-4-1106-preview</a>, released in November 2023</p></li></ul><p>I ran each prompt 1000 times for each language model using default settings (e.g. &#8220;out of the box&#8221;). Then, I analyzed the occupations generated by each language model for each of the prompts.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h2>Result 1: Newer models generate similar levels of gendered job diversity</h2><p><strong>One of the original findings in 2020 was that GPT-2 generated</strong> <strong>a more diverse set of occupations for men than for women.</strong></p><p>The following figure shows the number of unique jobs generated by each model (after filtering out jobs that occurred infrequently).<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GZRI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GZRI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png 424w, https://substackcdn.com/image/fetch/$s_!GZRI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png 848w, https://substackcdn.com/image/fetch/$s_!GZRI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png 1272w, https://substackcdn.com/image/fetch/$s_!GZRI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GZRI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png" width="625" height="327" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:327,&quot;width&quot;:625,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GZRI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png 424w, https://substackcdn.com/image/fetch/$s_!GZRI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png 848w, https://substackcdn.com/image/fetch/$s_!GZRI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png 1272w, https://substackcdn.com/image/fetch/$s_!GZRI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f97c06f-e457-49a7-b673-083b337c1d92_625x327.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Number of unique jobs generated by each model for &#8220;men&#8221; and &#8220;women&#8221; categories. GPT-2 generated more diverse occupations for men than women. GPT-3.5 and GPT-4 generated a similar number of jobs for both genders.</figcaption></figure></div><p>Indeed, <strong>GPT-2 generated more types of jobs for men than for women.</strong> </p><p>On the other hand, the more recent GPT-3.5 and GPT-4 models generated a smaller diversity of jobs overall. Additionally, these models <strong>generated a similar number of unique jobs for men and women</strong>. In terms of the overall number of unique jobs generated for men and women, the numbers were nearly at gender parity.</p><p></p><h2>Result 2: Male-dominated jobs &#8594; Female-dominated jobs</h2><p><strong>Another finding of the original paper was that GPT-2 generated stereotypical generations:</strong></p><blockquote><p>[M]en are associated with manual jobs such as laborer, plumber, truck driver, and mechanic, and with professional jobs such as software engineer, developer and private investigator. </p><p>Women are associated with domestic and care-giving roles such as babysitter, maid, and social worker. Furthermore, over 90% of the returns for &#8216;prostitute&#8217; were women, and over 90% of returns for &#8216;software engineer&#8217; were men.</p></blockquote><p>The following figures show the top occupations generated by each language model, sorted by whether they tended to be more male or female dominated. The occupations on the left-hand side are those the language model often associated with men, and the occupations on the right-hand side are those often associated with women.</p><p><strong>One of the most interesting findings is for the &#8220;software engineer&#8221; occupation, which was mostly associated with men in GPT-2&#8217;s generated outputs. The occupation neared gender parity in GPT-3.5&#8217;s generated outputs, and became overwhelmingly associated with women in GPT-4&#8217;s generated outputs.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G_ui!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G_ui!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png 424w, https://substackcdn.com/image/fetch/$s_!G_ui!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png 848w, https://substackcdn.com/image/fetch/$s_!G_ui!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png 1272w, https://substackcdn.com/image/fetch/$s_!G_ui!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G_ui!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png" width="924" height="348" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee575f66-7533-4847-83ab-64524b68ae5c_924x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:348,&quot;width&quot;:924,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G_ui!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png 424w, https://substackcdn.com/image/fetch/$s_!G_ui!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png 848w, https://substackcdn.com/image/fetch/$s_!G_ui!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png 1272w, https://substackcdn.com/image/fetch/$s_!G_ui!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee575f66-7533-4847-83ab-64524b68ae5c_924x348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Occupations most frequently generated by GPT-2, showing male versus female dominated jobs.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Puil!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Puil!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png 424w, https://substackcdn.com/image/fetch/$s_!Puil!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png 848w, https://substackcdn.com/image/fetch/$s_!Puil!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png 1272w, https://substackcdn.com/image/fetch/$s_!Puil!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Puil!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png" width="924" height="348" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:348,&quot;width&quot;:924,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Puil!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png 424w, https://substackcdn.com/image/fetch/$s_!Puil!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png 848w, https://substackcdn.com/image/fetch/$s_!Puil!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png 1272w, https://substackcdn.com/image/fetch/$s_!Puil!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96f64c70-7e1c-4f61-8bc2-a8f48bb75506_924x348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Occupations most frequently generated by GPT-3.5, showing male versus female dominated jobs.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eARB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eARB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png 424w, https://substackcdn.com/image/fetch/$s_!eARB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png 848w, https://substackcdn.com/image/fetch/$s_!eARB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png 1272w, https://substackcdn.com/image/fetch/$s_!eARB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eARB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png" width="924" height="348" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:348,&quot;width&quot;:924,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eARB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png 424w, https://substackcdn.com/image/fetch/$s_!eARB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png 848w, https://substackcdn.com/image/fetch/$s_!eARB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png 1272w, https://substackcdn.com/image/fetch/$s_!eARB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c62cbdb-b19f-40c1-b3fd-799da9b6a3f9_924x348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Occupations most frequently generated by GPT-4, showing male versus female dominated jobs.</figcaption></figure></div><p>Some observations:</p><ul><li><p>The &#8220;software engineering&#8221; role had the largest shift &#8212; from being mostly associated with men by GPT-2 to being mostly associated with women by GPT-4. </p></li><li><p>Other professional roles, such as &#8220;journalist&#8221;, also became increasingly associated with women by the newer models.</p></li><li><p>There were no significant occupations that shifted the other direction (e.g. associated with men by GPT-2, associated with women by GPT-4).</p></li><li><p>Some religious roles such as &#8220;monk&#8221; and &#8220;priest&#8221; remained male-dominated across all three models. </p></li><li><p>Some occupations such as &#8220;nurse&#8221; remained female-dominated across all three models.</p></li></ul><p>I compared generated outputs of the language models to the <a href="https://www.bls.gov/cps/cpsaat11.htm">U.S. Bureau of Labor Statistic&#8217;s 2022 survey of employed persons by detailed occupation</a>. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9RUq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9RUq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png 424w, https://substackcdn.com/image/fetch/$s_!9RUq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png 848w, https://substackcdn.com/image/fetch/$s_!9RUq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png 1272w, https://substackcdn.com/image/fetch/$s_!9RUq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9RUq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png" width="852" height="273" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:273,&quot;width&quot;:852,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9RUq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png 424w, https://substackcdn.com/image/fetch/$s_!9RUq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png 848w, https://substackcdn.com/image/fetch/$s_!9RUq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png 1272w, https://substackcdn.com/image/fetch/$s_!9RUq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda71b4b6-0757-46f7-ab60-ff78e2b34932_852x273.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Real and AI-generated gender breakdown of &#8220;software engineer&#8221; and &#8220;journalist&#8221; occupations, compared to the 2022 U.S. Labor Bureau data.</figcaption></figure></div><p>According to the Labor Bureau data, software engineering is still a predominantly male-dominated occupation. GPT-2 associated a similar amount of men and women with software engineering, comparable to the real-world statistics. GPT-3.5 associated twice as many women with software engineering, compared to GPT-2. And GPT-4, the newest model, associated women primarily with software engineering.</p><p>On the other hand, journalists were fairly gender parity according to the U.S. Labor Bureau data in 2022. Similar to the shift with the &#8220;software engineer&#8221; role, with each subsequent newer model, a larger portion of women were associated with the job.</p><p>What is happening here? <strong>The newer GPT models tended to associate larger percentages of women with certain professional occupations.</strong></p><p>The below figure includes the gender-neutral &#8220;person&#8221; category for several jobs. In general, jobs that GPT-2 associated more with <em>women</em> (such as &#8220;therapist&#8221; and &#8220;social worker&#8221;) were associated more with the &#8220;person&#8221; category by GPT-4. Jobs that GPT-2 associated more with <em>men </em>(such as &#8220;politician&#8221; and &#8220;mechanic&#8221;) were associated more with women and the &#8220;person&#8221; category by GPT-4. <strong>The newer GPT models tended to associate certain jobs, which GPT-2 had associated with a particular gender, as more gender neutral.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qosI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qosI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png 424w, https://substackcdn.com/image/fetch/$s_!qosI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png 848w, https://substackcdn.com/image/fetch/$s_!qosI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png 1272w, https://substackcdn.com/image/fetch/$s_!qosI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qosI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png" width="852" height="565" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0b921ea-86fd-4814-87be-56894678d25e_852x565.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:565,&quot;width&quot;:852,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qosI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png 424w, https://substackcdn.com/image/fetch/$s_!qosI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png 848w, https://substackcdn.com/image/fetch/$s_!qosI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png 1272w, https://substackcdn.com/image/fetch/$s_!qosI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b921ea-86fd-4814-87be-56894678d25e_852x565.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gender proportions of GPT-2/3.5/4&#8217;s generated outputs for a subset of occupations.</figcaption></figure></div><p></p><h2>Result 3: Exclusive occupations for each gender</h2><p>To have another sense of how the models changed over time, I was curious to know if there were certain occupations the models generated <strong>only</strong> for one subgroup of prompts/people. Here, I&#8217;ll highlight a few of the most common occupations exclusive to certain subgroups.</p><p><strong>Common jobs attributed exclusively to &#8220;person&#8221;:</strong></p><p>I expected these jobs to be more gender-neutral.</p><ul><li><p>GPT-2: freelancer, worker, laborer, slave</p></li><li><p>GPT-3.5: customer service representative</p></li><li><p>GPT-4: mediator</p></li></ul><p><strong>Common jobs attributed exclusively to &#8220;woman&#8221;:</strong></p><ul><li><p>GPT-2: none</p></li><li><p>GPT-3.5: yoga instructor, priestess, missionary</p></li><li><p>GPT-4: midwife, biochemist</p></li></ul><p>GPT-2 did not predict any occupations exclusively for women &#8230;</p><p><strong>Common jobs attributed exclusively to &#8220;man&#8221;:</strong></p><ul><li><p>GPT-2: butcher, fisherman</p></li><li><p>GPT-3.5: janitor, gardener</p></li><li><p>GPT-4: none</p></li></ul><p>And on the flip side, GPT-4 did not predict occupations exclusively for men! This flip from GPT-2 and women is fascinating, if nothing else.</p><p>In case you missed it, one of the most popular occupations generated by GPT-2 exclusively for the &#8220;person&#8221; category was &#8220;slave&#8221;. Below is the breakdown for which entities GPT-2 generated this output. This is one of the many reasons language reasons are so problematic! (Luckily, GPT-3.5 and GPT-4 did not generate &#8220;slave&#8221; as an occupation for any of the prompts, so &#8230; I guess that&#8217;s progress?)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vt_2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vt_2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png 424w, https://substackcdn.com/image/fetch/$s_!Vt_2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png 848w, https://substackcdn.com/image/fetch/$s_!Vt_2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png 1272w, https://substackcdn.com/image/fetch/$s_!Vt_2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vt_2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png" width="715" height="382" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:382,&quot;width&quot;:715,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Vt_2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png 424w, https://substackcdn.com/image/fetch/$s_!Vt_2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png 848w, https://substackcdn.com/image/fetch/$s_!Vt_2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png 1272w, https://substackcdn.com/image/fetch/$s_!Vt_2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabaa8f64-b163-416f-8e0f-d5afb4577743_715x382.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Why language models can be problematic. GPT-2 generated &#8220;The [x] person works as a slave&#8221; for various different demographic groups.</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/who-does-what-job-occupational-roles?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.artfish.ai/p/who-does-what-job-occupational-roles?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h2>Result 4: Shifts in racial groups for certain jobs</h2><p>Similar to gender, there were shifts in the GPT models&#8217; associations of occupations with different racial groups.</p><p>GPT-4 tended to increase the association of Asian and Black workers with both the &#8220;software engineer&#8221; and &#8220;journalist&#8221; jobs, even when these values were quite different from the real-world data. In fact, GPT-2 associated each race pretty equally for the &#8220;software engineer&#8221; job. It is in the newer models that we see more drastic shifts favoring certain races.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Oc0S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Oc0S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png 424w, https://substackcdn.com/image/fetch/$s_!Oc0S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png 848w, https://substackcdn.com/image/fetch/$s_!Oc0S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png 1272w, https://substackcdn.com/image/fetch/$s_!Oc0S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Oc0S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png" width="851" height="273" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:273,&quot;width&quot;:851,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Oc0S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png 424w, https://substackcdn.com/image/fetch/$s_!Oc0S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png 848w, https://substackcdn.com/image/fetch/$s_!Oc0S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png 1272w, https://substackcdn.com/image/fetch/$s_!Oc0S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc34d8125-4ceb-403e-b231-0198341cc7ea_851x273.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Real and AI-generated racial breakdown of &#8220;software engineer&#8221; and &#8220;journalist&#8221; occupations, compared to the 2022 U.S. Labor Bureau data.</figcaption></figure></div><p></p><h2>Results 5: Exclusive occupations for religion</h2><p><strong>The original experiments from 2020 found that GPT-2 inferred a very strong association between practicing a religion and working in a religious profession.</strong> That is, the prompt &#8220;The Buddhist man works as a&#8230;&#8221; resulted in 4% of generated jobs to be &#8220;monks&#8221;. </p><p>This association is more pronounced in the newer GPT-3.5 and GPT-4 models, <strong>both of which predicted over 95% of Buddhist men to work as monks.</strong></p><p>This association held true for the other religions tested as well, in which religious subgroups were strongly associated with their respective religious roles (Christian ministers and pastors, Hindu priests and priestesses, Muslim imams, and Jewish rabbis).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ATrS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ATrS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png 424w, https://substackcdn.com/image/fetch/$s_!ATrS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png 848w, https://substackcdn.com/image/fetch/$s_!ATrS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png 1272w, https://substackcdn.com/image/fetch/$s_!ATrS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ATrS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png" width="667" height="414" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:414,&quot;width&quot;:667,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ATrS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png 424w, https://substackcdn.com/image/fetch/$s_!ATrS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png 848w, https://substackcdn.com/image/fetch/$s_!ATrS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png 1272w, https://substackcdn.com/image/fetch/$s_!ATrS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e94e1df-05eb-403d-b163-2e6a0137e0b0_667x414.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Proportion of religious jobs generated by language models.</figcaption></figure></div><p>While the majority of Buddhist people do not work as monks, nor do the majority of Jewish people work as rabbis, the language models tended to make this association when the religion was specified in the prompt. GPT-3.5 and GPT-4 exhibited a greater association between the religion and working in a religious profession, especially for the Buddhist, Muslim, and Jewish religions. </p><p></p><h2>Result 6: Political polarization of certain occupations</h2><p>Previously, researchers have written about the <a href="https://arxiv.org/abs/2305.08283">political biases of language models</a>. Language models tend to reflect the political leanings present in its training data. My own previous experiments found that <a href="https://www.artfish.ai/p/does-ai-have-political-opinions">GPT-3 tended to have more of a liberal political bias</a>.</p><p>In comparing three generations of GPT models, I observed that there was a shift in the occupations associated with conservative and liberal people.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L_M3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L_M3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png 424w, https://substackcdn.com/image/fetch/$s_!L_M3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png 848w, https://substackcdn.com/image/fetch/$s_!L_M3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png 1272w, https://substackcdn.com/image/fetch/$s_!L_M3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L_M3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png" width="852" height="565" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:565,&quot;width&quot;:852,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L_M3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png 424w, https://substackcdn.com/image/fetch/$s_!L_M3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png 848w, https://substackcdn.com/image/fetch/$s_!L_M3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png 1272w, https://substackcdn.com/image/fetch/$s_!L_M3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2fe0cf-cd0e-401c-9a50-aaed23a52505_852x565.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Proportions of liberal and conservative occupations of GPT-2/3.5/4&#8217;s generated outputs, for a subset of occupations.</figcaption></figure></div><p>&#8220;Politician&#8221; and &#8220;banker&#8221; were examples of occupations that GPT-2 associated almost exclusively with liberal people, but GPT-4 associated almost exclusively with conservative people. Similarly, GPT-4&#8217;s generated outputs associate &#8220;Social worker&#8221; exclusively with liberal people, even when the earlier GPT-2 model did not do so. </p><p><strong>The newer GPT-4 model tended to associate certain occupations almost exclusively with liberal or conservative people.</strong> These sorts of occupations could prove to be problematic in downstream use cases, especially in the context of a world that is becoming increasingly politically polarized.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h1>Discussion</h1><p>The experiments in this article showed that the occupations GPT-2 associated with various demographic groups were quite distinct from those associated by GPT-3.5 and GPT-4. It makes sense that each model would associate different subgroups with different occupations and that generated outputs would change over time, as the models increase in size, improve, evolve, and train on new data. </p><p>However, for a subset of the occupations, the shift was made clear when comparing proportional changes from GPT-2 to GPT-3.5 to GPT-4. The newer models tended to overcorrect and over-exaggerate gender, racial, or political associations for certain occupations. This was seen in how:</p><ul><li><p>Software engineers were predominately associated with men by GPT-2, but with women by GPT-4.</p></li><li><p>Software engineers were associated with each race mostly equally by GPT-2, but mostly with Black and Asian workers by GPT-4.</p></li><li><p>GPT-2 exhibited an associated between the religion and working in a religious profession; GPT-3.5 and GPT-4 exaggerated this association manyfold.</p></li><li><p>Politicians and bankers were predominately associated with liberal people by GPT-2, but with conservative people by GPT-4.</p></li></ul><p>These patterns became more pronounced when compared with U.S. Census Bureau data, particularly for software engineers. </p><p>I am not advocating for language model outputs to perfectly mirror real-world occupation distributions. In fact, promoting increased representation in media for jobs traditionally dominated by one gender, such as nursing or engineering, is crucial for challenging stereotypes.</p><p>However, it's important to acknowledge the underlying trend in how these language models' job associations with certain demographic groups evolved. While software engineering increasingly aligned with women in newer models, this trend didn't hold universally. For instance, nursing remained predominantly associated with women.</p><p>This raises questions: Are there more (visible) women in software engineering in the training data, influencing these associations? Or, are there political or business motives belonging to the companies training the models or the human annotators labeling the training data, which aim to link certain demographic groups with specific occupations?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/who-does-what-job-occupational-roles?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/who-does-what-job-occupational-roles?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Back in 2020, when I began probing GPT-2 to uncover its biases regarding occupations and different demographic groups, I had no idea that generative language models would become so big.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><p>While conducting the original experiments, we grappled with the same questions about what it is that language models should represent and generate. We concluded the original paper with the following statement:</p><blockquote><p>What should be the goal of generative language models? It is certainly appropriate that they should not exacerbate existing societal biases with regards to occupational segregation. It is less clear whether they should reflect or correct for skewed societal distributions.</p></blockquote><p>These questions are less about what is technologically feasible, and more about what is socially and culturally demanded. They are still relevant today and will likely continue to be so.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p></p><div><hr></div><p></p><div><hr></div><h1>Appendix: breakdown of specific roles</h1><h4>Breakdown by gender</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fsj2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fsj2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png 424w, https://substackcdn.com/image/fetch/$s_!fsj2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png 848w, https://substackcdn.com/image/fetch/$s_!fsj2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png 1272w, https://substackcdn.com/image/fetch/$s_!fsj2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fsj2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png" width="924" height="565" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:565,&quot;width&quot;:924,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fsj2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png 424w, https://substackcdn.com/image/fetch/$s_!fsj2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png 848w, https://substackcdn.com/image/fetch/$s_!fsj2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png 1272w, https://substackcdn.com/image/fetch/$s_!fsj2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f83978e-5eeb-46c3-a908-3e3752260732_924x565.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Breakdown by race</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KL89!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KL89!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png 424w, https://substackcdn.com/image/fetch/$s_!KL89!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png 848w, https://substackcdn.com/image/fetch/$s_!KL89!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png 1272w, https://substackcdn.com/image/fetch/$s_!KL89!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KL89!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png" width="924" height="565" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:565,&quot;width&quot;:924,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KL89!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png 424w, https://substackcdn.com/image/fetch/$s_!KL89!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png 848w, https://substackcdn.com/image/fetch/$s_!KL89!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png 1272w, https://substackcdn.com/image/fetch/$s_!KL89!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabd8750c-76bf-4362-b7db-efb5055cfc26_924x565.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Breakdown by religion</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k-n1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k-n1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png 424w, https://substackcdn.com/image/fetch/$s_!k-n1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png 848w, https://substackcdn.com/image/fetch/$s_!k-n1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png 1272w, https://substackcdn.com/image/fetch/$s_!k-n1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k-n1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png" width="842" height="420" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b4a52568-ec47-419b-b5dc-723c202958d0_842x420.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:420,&quot;width&quot;:842,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k-n1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png 424w, https://substackcdn.com/image/fetch/$s_!k-n1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png 848w, https://substackcdn.com/image/fetch/$s_!k-n1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png 1272w, https://substackcdn.com/image/fetch/$s_!k-n1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4a52568-ec47-419b-b5dc-723c202958d0_842x420.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Breakdown by sexuality</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XZJu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XZJu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png 424w, https://substackcdn.com/image/fetch/$s_!XZJu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png 848w, https://substackcdn.com/image/fetch/$s_!XZJu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png 1272w, https://substackcdn.com/image/fetch/$s_!XZJu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XZJu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png" width="842" height="565" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:565,&quot;width&quot;:842,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XZJu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png 424w, https://substackcdn.com/image/fetch/$s_!XZJu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png 848w, https://substackcdn.com/image/fetch/$s_!XZJu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png 1272w, https://substackcdn.com/image/fetch/$s_!XZJu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c2497-3374-416c-a29a-18ae847e830e_842x565.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/who-does-what-job-occupational-roles/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/who-does-what-job-occupational-roles/comments"><span>Leave a comment</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The GPT-3 paper had been released but the model had not been publicly available. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Some methodological/data differences in this article compared to the original paper: (1) In the original paper, we generated 7000 generations per category. However, in this article, I generated 1000 generations per category for cost purposes. (2) In this article, I included a few additional categories related to sexuality, namely &#8220;trans&#8221;, &#8220;bisexual&#8221;, and &#8220;straight&#8221;. In this article, I also included the neutral &#8220;person&#8221; (in addition to man and woman). (3) In the original paper, we also prompted the model using popular male and female first names from different continents, but I did not do so in this article. (4) In the original paper, we conducted a systematic comparison of model outputs to real-world US Labor Bureau occupational data.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Oftentimes, a model generated an occupation only one time that it would never generate again. I filtered out the jobs that were generated only a single time by each model.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>In fact, I&#8217;d never even heard of &#8220;generative language models&#8221; nor knew what they were until I began working on the project.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Lost in DALL-E 3 Translation]]></title><description><![CDATA[Generating AI images in multiple languages leads to different results]]></description><link>https://www.artfish.ai/p/lost-in-dalle3-translation</link><guid isPermaLink="false">https://www.artfish.ai/p/lost-in-dalle3-translation</guid><dc:creator><![CDATA[Yennie Jun]]></dc:creator><pubDate>Wed, 01 Nov 2023 13:41:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!URkY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!URkY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!URkY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png 424w, https://substackcdn.com/image/fetch/$s_!URkY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png 848w, https://substackcdn.com/image/fetch/$s_!URkY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png 1272w, https://substackcdn.com/image/fetch/$s_!URkY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!URkY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png" width="1409" height="940" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:940,&quot;width&quot;:1409,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1357171,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!URkY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png 424w, https://substackcdn.com/image/fetch/$s_!URkY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png 848w, https://substackcdn.com/image/fetch/$s_!URkY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png 1272w, https://substackcdn.com/image/fetch/$s_!URkY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0d5cc2e-3fa3-4c80-9e9b-99cf18a1d638_1409x940.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Images generated using DALL-E 3 in the six languages for the prompt &#8220;an image of a person&#8221;.</figcaption></figure></div><h2>Introduction</h2><p>OpenAI recently launched <a href="https://openai.com/blog/dall-e-3-is-now-available-in-chatgpt-plus-and-enterprise">DALL-E 3</a>, the latest in their line of AI image generation models.</p><p>But as <a href="https://restofworld.org/2023/ai-image-stereotypes/">recent media coverage</a> and <a href="https://arxiv.org/abs/2303.11408">research</a> reveal, these AI models come with the baggage of biases and stereotypes. For example, AI image generation models such as Stable Diffusion and Midjourney tend to amplify existing stereotypes about <a href="https://www.bloomberg.com/graphics/2023-generative-ai-bias/">race, gender</a>, and <a href="https://restofworld.org/2023/ai-image-stereotypes/">national identity</a>. </p><p>Most of these studies, however, primarily test the models using English prompts. This raises the question: how would these models respond to non-English prompts?</p><p>In this article, I delve into DALL-E 3's behavior with prompts from diverse languages. Drawing from the themes of my <a href="https://www.artfish.ai/p/all-languages-are-not-created-tokenized">previous works</a>, I offer a multilingual perspective on the newest AI image generation model.</p><h2>How DALL-E 3 works: Prompt Transformations</h2><p>Unlike previous AI image generation models, this newest version of the DALL-E model does not directly generate what you type in. Instead, DALL-E 3 incorporates <strong>automatic prompt transformations</strong>, meaning that it <strong>transforms your original prompt into a different, more descriptive version.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GBdi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GBdi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png 424w, https://substackcdn.com/image/fetch/$s_!GBdi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png 848w, https://substackcdn.com/image/fetch/$s_!GBdi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png 1272w, https://substackcdn.com/image/fetch/$s_!GBdi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GBdi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png" width="792" height="484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:484,&quot;width&quot;:792,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:306593,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GBdi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png 424w, https://substackcdn.com/image/fetch/$s_!GBdi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png 848w, https://substackcdn.com/image/fetch/$s_!GBdi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png 1272w, https://substackcdn.com/image/fetch/$s_!GBdi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a682fbc-cd35-413e-ad6f-405a5c029212_792x484.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example of prompt transformation from OpenAI&#8217;s paper detailing the caption improvement process: <a href="https://cdn.openai.com/papers/dall-e-3.pdf">Improving Image Generation with Better Captions.</a></figcaption></figure></div><p>According to the <a href="https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf">DALL-E 3 System Card</a>, there were a few reasons for doing this:</p><ul><li><p><a href="https://cdn.openai.com/papers/dall-e-3.pdf">Improving captions</a> to be more descriptive</p></li><li><p>Removing public figure names</p></li><li><p>Specifying more diverse descriptions of generated people (e.g. before prompt transformations, generated people tended to be primarily white, young, and female)</p></li></ul><p>So, the image generation process looks something like this:</p><ol><li><p>You type your prompt into DALL-E 3 (available through ChatGPT Plus)</p></li><li><p>Your prompt is modified under the hood into four different transformed prompts</p></li><li><p>DALL-E 3 generates an image based off of each of the transformed prompts</p></li></ol><p>Adding this sort of prompt transformation is fairly new to the world of image generation. By adding the prompt modification, the mechanisms of how the AI image generation works under the hood becomes even more abstracted away from the user. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/lost-in-dalle3-translation?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/lost-in-dalle3-translation?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Prompt Transformations in multiple languages</h2><p>Most research studying biases in text-to-image AI models focus on using English prompts. However, little is known these models&#8217; behavior when prompted in non-English languages. Doing so many surface potential language-specific or culture-specific behavior.</p><p>I asked DALL-E 3 to generate images using the following English prompts:</p><ul><li><p><code>&#8220;An image of a man&#8221;</code></p></li><li><p><code>&#8220;An image of a woman&#8221;</code></p></li><li><p><code>&#8220;An image of a person&#8221;</code></p></li></ul><p>I used GPT-4 (without DALL-E 3) to translate the phrases into the following languages: Korean, Mandarin, Burmese, Armenian, and Zulu.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>Then, I used DALL-E 3 to generate 20 images per language, resulting in 120 images per prompt across the 6 languages. When saving the generated images from ChatGPT Plus, the image filename was automatically saved to the text of the transformed prompt. In the rest of the article, I analyze these transformed prompts.</p><h4><strong>Metadata extraction</strong></h4><p><strong>In my prompts, I never specified a particular culture, ethnicity, or age. However, the transformed prompt often included such indicators.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DIe2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DIe2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png 424w, https://substackcdn.com/image/fetch/$s_!DIe2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png 848w, https://substackcdn.com/image/fetch/$s_!DIe2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png 1272w, https://substackcdn.com/image/fetch/$s_!DIe2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DIe2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png" width="1104" height="573" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:573,&quot;width&quot;:1104,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:39768,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DIe2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png 424w, https://substackcdn.com/image/fetch/$s_!DIe2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png 848w, https://substackcdn.com/image/fetch/$s_!DIe2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png 1272w, https://substackcdn.com/image/fetch/$s_!DIe2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1024f6a-a48e-4a16-a74f-f644008492eb_1104x573.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example of a prompt transformation, annotated with which part of the sentence refers to art style, age, ethnicity, and gender.</figcaption></figure></div><p>From the transformed prompt, I extracted metadata such as art style (&#8220;illustration&#8221;), age (&#8220;middle-aged&#8221;), ethnicity (&#8220;African&#8221;), and gender identifier (&#8220;woman&#8221;). 66% of transformed prompts contained ethnicity markers and 58% contained age marker.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.artfish.ai/subscribe?"><span>Subscribe now</span></a></p><h2>Observation 1: All prompts are transformed into English</h2><p>No matter what language the original prompt was in, <strong>the modified prompt was always transformed into English.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KImC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KImC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png 424w, https://substackcdn.com/image/fetch/$s_!KImC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png 848w, https://substackcdn.com/image/fetch/$s_!KImC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png 1272w, https://substackcdn.com/image/fetch/$s_!KImC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KImC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png" width="955" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/811d288c-6b11-4017-910d-faa562ae4718_955x673.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:955,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:466552,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KImC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png 424w, https://substackcdn.com/image/fetch/$s_!KImC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png 848w, https://substackcdn.com/image/fetch/$s_!KImC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png 1272w, https://substackcdn.com/image/fetch/$s_!KImC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F811d288c-6b11-4017-910d-faa562ae4718_955x673.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A screenshot of ChatGPT Plus showing an example of the original Korean prompt for &#8220;An image of a person&#8221; modified into four distinct prompt transformations in English</figcaption></figure></div><p>I found this behavior surprising &#8212; while I was expecting the prompt to be transformed into a more descriptive one, I was not expecting translation into English to also occur.</p><p>The majority of AI generation models, such as Stable Diffusion and Midjourney, are primarily trained and tested in English. In general, these models tend to have lower performance when <a href="https://philippstelzel.medium.com/midjourney-tested-in-foreign-languages-ac60053bcadb#:~:text=Midjourney%20understands%20commands%20in%20other,does%20not%20really%20understand%20languages.">generating images from non-English prompts</a>, leading to some users translating their prompts from their native language into English. However, doing risks losing the nuance of that native language.</p><p>However, to my knowledge, none of these other models automatically translate all prompts into English. Adding this additional step of translation under-the-hood (and, I&#8217;m sure, unbeknownst to most users, as it is not explicitly explained when using the tool) adds more opacity to an already black-box tool.</p><h2>Observation 2: The language of the original prompt affects the modified prompt</h2><p>The prompt transformation step also seemed to incorporate unspecified metadata about the language of the original prompt.</p><p>For example, when the original prompt was in Burmese, <strong>even though the prompt did not mention anything about the Burmese language or people, the prompt transformation often mentioned a Burmese person</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-rHP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-rHP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png 424w, https://substackcdn.com/image/fetch/$s_!-rHP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png 848w, https://substackcdn.com/image/fetch/$s_!-rHP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png 1272w, https://substackcdn.com/image/fetch/$s_!-rHP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-rHP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png" width="1103" height="461" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:461,&quot;width&quot;:1103,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30807,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-rHP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png 424w, https://substackcdn.com/image/fetch/$s_!-rHP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png 848w, https://substackcdn.com/image/fetch/$s_!-rHP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png 1272w, https://substackcdn.com/image/fetch/$s_!-rHP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ea4a70a-acb0-4165-b332-0d29eb81865c_1103x461.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example of a prompt in Burmese for &#8220;image of a man&#8221; which is transformed by DALL-E 3 into a descriptive prompt about a Burmese man.</figcaption></figure></div><p>This was not the case for all languages and the results varied per language. For some languages, the transformed prompt was more likely to mention the ethnicity associated with that language. For example, when the original prompt was in Zulu, the transformed prompt mentioned an African person more than 50% of the time (compared to when the original prompt was in English, an African person was mentioned closer to 20% of the time).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XUEO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XUEO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png 424w, https://substackcdn.com/image/fetch/$s_!XUEO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png 848w, https://substackcdn.com/image/fetch/$s_!XUEO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png 1272w, https://substackcdn.com/image/fetch/$s_!XUEO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XUEO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png" width="1072" height="465" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:465,&quot;width&quot;:1072,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XUEO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png 424w, https://substackcdn.com/image/fetch/$s_!XUEO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png 848w, https://substackcdn.com/image/fetch/$s_!XUEO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png 1272w, https://substackcdn.com/image/fetch/$s_!XUEO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa066dac7-19f3-4268-b05c-1e29725aab81_1072x465.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Percentages of ethnicity generated by DALL-E 3 for all combined prompts (image of a person/man/woman), for each language.</figcaption></figure></div><p>I do not aim to pass value judgment on whether this behavior is right or wrong, nor am I prescribing what should be an expected behavior. Regardless, I found it interesting that DALL-E 3&#8217;s behavior varied so much across the original prompt language. For example, when the original prompt was in Korean, there were no mentions of Korean people in DALL-E 3&#8217;s prompt transformations. Similarly, when the original prompt was in English, there were no mentions of British people in DALL-E 3&#8217;s prompt transformations.</p><h2>Observation 3: Even with neutral prompts, DALL-E 3 generates gendered prompts</h2><p>I mapped the person identifier nouns in DALL-E 3&#8217;s prompt transformations to one of three buckets: female, male, or neutral:</p><ul><li><p>woman, girl, lady &#8594; &#8220;female&#8221;</p></li><li><p>man, boy, male doctor &#8594; &#8220;male&#8221;</p></li><li><p>athlete, child, teenager, individual, person, people &#8594; &#8220;neutral&#8221;</p></li></ul><p>Then, I compared the original prompt (&#8220;person/man/woman") to the transformed prompt (&#8220;neutral/male/female&#8221;):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jqBM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jqBM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png 424w, https://substackcdn.com/image/fetch/$s_!jqBM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png 848w, https://substackcdn.com/image/fetch/$s_!jqBM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png 1272w, https://substackcdn.com/image/fetch/$s_!jqBM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jqBM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png" width="715" height="275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:275,&quot;width&quot;:715,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jqBM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png 424w, https://substackcdn.com/image/fetch/$s_!jqBM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png 848w, https://substackcdn.com/image/fetch/$s_!jqBM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png 1272w, https://substackcdn.com/image/fetch/$s_!jqBM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bddfb96-fb1f-4813-96d8-cf8c8a6bcb7e_715x275.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Given the original prompt (&#8220;An image of a person/man/woman&#8221;), the percentage of times the transformed prompt contained gendered individuals. </figcaption></figure></div><p>It is no surprise that the original prompt of &#8220;an image of a man&#8221; resulted in mostly male identifiers (and same for women). However, I found it interesting that <strong>when using the gender-neutral prompt &#8220;An image of a person&#8221;, DALL-E 3 transformed the prompt to include gendered (e.g. woman, man) terms 75% of the time. </strong>DALL-E 3 generated transformed prompts including female individuals slightly more often (40%) than male individuals (35%). Less than a quarter of neutral prompts resulted in prompt transformations mentioning gender-neutral individuals.</p><h2>Observation 4: Women are often described as young, whereas men&#8217;s ages are more diverse</h2><p>Sometimes, DALL-E 3 included an age group (young, middle-aged, or elderly) to describe the individual in the modified prompt.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p><strong>In instances where the prompt mentioned a female individual, descriptions of age tended to skew younger.</strong> Specifically, 35% of transformed prompts described female individuals as "young," which is more than twice the frequency of descriptions labeling them as "elderly" (13%), and over four times as often as "middle-aged" (7.7%). This indicates a significant likelihood that if a woman is mentioned in the prompt transformation, she will also be described as being young.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iCKW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iCKW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png 424w, https://substackcdn.com/image/fetch/$s_!iCKW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png 848w, https://substackcdn.com/image/fetch/$s_!iCKW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png 1272w, https://substackcdn.com/image/fetch/$s_!iCKW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iCKW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png" width="655" height="338" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:338,&quot;width&quot;:655,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iCKW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png 424w, https://substackcdn.com/image/fetch/$s_!iCKW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png 848w, https://substackcdn.com/image/fetch/$s_!iCKW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png 1272w, https://substackcdn.com/image/fetch/$s_!iCKW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51fe18d-585b-4419-bbd0-ae63832b4e80_655x338.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The number of transformed prompts that mention age groups, separated by the gender of the individual mentioned in the prompt.</figcaption></figure></div><p>Here are a few examples of prompt transformations:</p><pre><code><code>Illustration of a young woman of Burmese descent, wearing a fusion of modern and traditional attire

Photo of a young Asian woman with long black hair, wearing casual clothing, standing against a cityscape background

Watercolor painting of a young woman with long blonde braids, wearing a floral dress, sitting by a lakeside, sketching in her notebook

Oil painting of a young woman wearing a summer dress and wide-brimmed hat, sitting on a park bench with a book in her lap, surrounded by lush greenery</code></code></pre><p>On the other hand, prompt transformations mentioning male individuals showed a more balanced distribution across the age groups. This could be indicative of persistent cultural and societal views that value youth in women, while considering men attractive and successful regardless of their age.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><h2>Observation 5: Variations in person age depends on original prompt language</h2><p>The age group varied depending on the language of the original prompt as well. The transformed prompts were more likely to describe individuals as younger for certain languages (e.g. Zulu) and less likely for other languages (e.g. Burmese).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jq0y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jq0y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png 424w, https://substackcdn.com/image/fetch/$s_!jq0y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png 848w, https://substackcdn.com/image/fetch/$s_!jq0y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png 1272w, https://substackcdn.com/image/fetch/$s_!jq0y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jq0y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png" width="949" height="465" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:465,&quot;width&quot;:949,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jq0y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png 424w, https://substackcdn.com/image/fetch/$s_!jq0y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png 848w, https://substackcdn.com/image/fetch/$s_!jq0y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png 1272w, https://substackcdn.com/image/fetch/$s_!jq0y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54ff04f0-3bfe-442b-a412-89d1d131b114_949x465.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The number of transformed prompts mentioning age groups for all prompts (an image of a man/woman/person), separated by the language of the original prompt.</figcaption></figure></div><h2>Observation 6: Variations in art style depends on individual gender</h2><p>I expected the art style (e.g. photograph, illustration) to be randomly distributed across age group, language, and individual gender. That is, I expected there to be a similar number of photographs of female individuals as photographs of male individuals.</p><p>However, this was not the case. In fact, there were more photographs of female individuals and illustrations of male individuals. The art style describing an individual did <em>not</em> seem to be distributed uniformly across genders, but rather, seemed to prefer certain genders over others.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r9Bj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r9Bj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png 424w, https://substackcdn.com/image/fetch/$s_!r9Bj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png 848w, https://substackcdn.com/image/fetch/$s_!r9Bj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png 1272w, https://substackcdn.com/image/fetch/$s_!r9Bj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r9Bj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png" width="662" height="338" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:338,&quot;width&quot;:662,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r9Bj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png 424w, https://substackcdn.com/image/fetch/$s_!r9Bj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png 848w, https://substackcdn.com/image/fetch/$s_!r9Bj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png 1272w, https://substackcdn.com/image/fetch/$s_!r9Bj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e1a310b-8f77-4f60-ac99-140a40e402e3_662x338.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The number of transformed prompts mentioning each art style, separated by the gender of the individual mentioned in the prompt.</figcaption></figure></div><h2>Observation 7: Repetition of tropes, from young Asian women to elderly African men</h2><p>In my experiments, there were 360 unique demographic descriptions in the prompt transformations (e.g. age/ethnicity/gender combinations). While many combinations only occurred a few times (such as &#8220;young Burmese woman&#8221; or &#8220;elderly European man&#8221;), certain demographic descriptions appeared more frequently than others.</p><p>One common description was &#8220;elderly African man&#8221;, which appeared 11 times. Looking at some of the resulting generated images revealed variations a man with  similar facial expressions, poses, accessories, and clothing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nhpc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nhpc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png 424w, https://substackcdn.com/image/fetch/$s_!Nhpc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png 848w, https://substackcdn.com/image/fetch/$s_!Nhpc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png 1272w, https://substackcdn.com/image/fetch/$s_!Nhpc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nhpc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png" width="990" height="423" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:423,&quot;width&quot;:990,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nhpc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png 424w, https://substackcdn.com/image/fetch/$s_!Nhpc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png 848w, https://substackcdn.com/image/fetch/$s_!Nhpc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png 1272w, https://substackcdn.com/image/fetch/$s_!Nhpc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4c5e50-6ac4-4dfa-9a8d-b97b8dfb4287_990x423.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A subset images whose transformed prompt contained the phrase &#8220;elderly African man&#8221;.</figcaption></figure></div><p>Even more common was the description &#8220;young Asian woman&#8221;, which appeared 23 times. Again, many of the facial expressions, facial features, poses, and clothing are similar, if not nearly identical, to each other. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V4Ct!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V4Ct!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png 424w, https://substackcdn.com/image/fetch/$s_!V4Ct!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png 848w, https://substackcdn.com/image/fetch/$s_!V4Ct!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png 1272w, https://substackcdn.com/image/fetch/$s_!V4Ct!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V4Ct!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png" width="990" height="423" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:423,&quot;width&quot;:990,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V4Ct!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png 424w, https://substackcdn.com/image/fetch/$s_!V4Ct!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png 848w, https://substackcdn.com/image/fetch/$s_!V4Ct!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png 1272w, https://substackcdn.com/image/fetch/$s_!V4Ct!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddcdd2fd-7999-4dde-bbec-405a25ffb70d_990x423.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A subset images whose transformed prompt contained the phrase &#8220;young Asian woman&#8221;.</figcaption></figure></div><p>This phenomenon captures the essence of bias that permeates our world. When we observe the faces of <a href="https://www.rollingstone.com/music/music-news/k-pop-has-so-many-lookalikes-that-its-government-stepped-in-796791/">Korean K-Pop stars</a> or <a href="https://zhuanlan.zhihu.com/p/622175815?fbclid=IwAR06YQQjpd5B8ZBOLF1f3rug_3mO4kTQu2bSrPNR1u_DkYRSyK04DtNrfEo">Chinese idols</a>, there is a striking similarity in their facial structures. This lack of variance perpetuates a specific beauty standard, narrowing the range of accepted appearances. </p><p>Similarly, in the case of AI-generated images, the narrow interpretations of demographic descriptions such as "elderly African men" and "young Asian women" contribute to harmful stereotypes. These models, by repeatedly generating images that lack diversity in facial features, expressions, and poses, are solidifying a limited and stereotyped view of how individuals from these demographics should appear. This phenomenon is especially concerning because it not only reflects existing biases but also has the potential to amplify them, as these images are consumed and normalized by society.</p><h2>But how does DALL-E 3 compare to other image generation models?</h2><p>I generated images in the 6 languages for the prompt &#8220;an image of a person&#8221; using two other popular text-to-image AI tools: <a href="https://www.midjourney.com/app/">Midjourney</a> and <a href="https://stability.ai/stable-diffusion">Stable Diffusion XL</a>. </p><p>For images generated using Midjourney, non-English prompts were likely to generate images of landscapes rather than humans (although, let&#8217;s be fair, the English images are pretty creepy). For some of the languages, such as Burmese and Zulu, the generated images contained vague (and perhaps a bit inaccurate) cultural representations or references to the original prompt language.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hheh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hheh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png 424w, https://substackcdn.com/image/fetch/$s_!hheh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png 848w, https://substackcdn.com/image/fetch/$s_!hheh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png 1272w, https://substackcdn.com/image/fetch/$s_!hheh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hheh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png" width="1097" height="879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:879,&quot;width&quot;:1097,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1399211,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hheh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png 424w, https://substackcdn.com/image/fetch/$s_!hheh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png 848w, https://substackcdn.com/image/fetch/$s_!hheh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png 1272w, https://substackcdn.com/image/fetch/$s_!hheh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8417c1bc-9792-45fa-81f3-cf64359e1912_1097x879.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Images generated using <a href="https://www.midjourney.com/app/">Midjourney</a> in the six languages for the prompt &#8220;an image of a person&#8221;.</figcaption></figure></div><p>Similar patterns were observed in the images generated using Stable Diffusion XL. Non-English prompts were more likely to generate images of landscapes. The Armenian prompt only generated what looks like carpet patterns. Prompts in Chinese, Burmese, and Zulu generated images with vague references to the original language. (And again, the images generated using the English prompt were pretty creepy).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Cwv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Cwv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png 424w, https://substackcdn.com/image/fetch/$s_!2Cwv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png 848w, https://substackcdn.com/image/fetch/$s_!2Cwv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png 1272w, https://substackcdn.com/image/fetch/$s_!2Cwv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Cwv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png" width="1087" height="851" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:851,&quot;width&quot;:1087,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1476447,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Cwv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png 424w, https://substackcdn.com/image/fetch/$s_!2Cwv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png 848w, https://substackcdn.com/image/fetch/$s_!2Cwv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png 1272w, https://substackcdn.com/image/fetch/$s_!2Cwv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5080c04d-0d21-4144-80e0-348a7bca9c43_1087x851.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Images generated using <a href="https://stability.ai/stable-diffusion">Stable Diffusion XL</a> in the six languages for the prompt &#8220;an image of a person&#8221;. I used <a href="https://playgroundai.com/">Playground AI</a> to use the model.</figcaption></figure></div><p>In a way, DALL-E 3&#8217;s prompt transformations served as a way to artificially introduce more variance and diversity into the image generation process. At least DALL-E 3 consistently generated human figures across all six languages, as instructed. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/lost-in-dalle3-translation?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/lost-in-dalle3-translation?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h1>Discussion and concluding remarks</h1><blockquote><p>Automatic prompt transformations present considerations of their own: they may alter the meaning of the prompt, potentially carry inherent biases, and may not always align with individual user preferences.<br>&#8212; <a href="https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf">DALL-E 3 System Card</a></p></blockquote><p>In this article, I explored how DALL-E 3 uses prompt transformations to enhance the user&#8217;s original prompt. During this process, the original prompt is not only made more descriptive, but also translated into English. It is likely that additional metadata about the original prompt, such as its language, is used to construct the transformed prompt, although this is speculative as the DALL-E 3 System Card does not detail this process.</p><p>My testing of DALL-E 3 spanned six different languages, but it is important to note that this is not an exhaustive examination given the hundreds of languages spoken worldwide. However, it is an important first step in systematically probing AI image generation tools in languages other than English, which is an area of research I have not seen explored much.</p><p>The prompt transformation step was not transparent to users when accessing DALL-E 3 via the ChatGPT Plus web app. This lack of clarity further abstracts the workings of AI image generation models, making it more challenging to scrutinize the biases and behaviors encoded in the model.</p><p>However, in comparison to other AI image generation models, DALL-E 3 was <em>overall</em> <em>more</em> <em>accurate</em> in following the prompt to generate a person and <em>overall</em> <em>more</em> <em>diverse</em> in generating faces of many ethnicities (due to the prompt transformations). Therefore, while there might have been limited diversity within certain ethnic categories in terms of facial features, the overall outcome was a higher diversity (albeit <em>artificially induced</em>) in the generated images compared to other models.</p><p>I end this article with open questions about what the desired output of AI text-to-image models should be. These models, typically trained on vast amounts of internet images, can inadvertently perpetuate societal biases and stereotypes. As these models evolve, we must consider whether we want them to reflect, amplify, or mitigate these biases, especially when generating images of humans or depictions of sociocultural institutions, norms, and concepts. It is important to think carefully about the potential normalization of such images and their broader implications.</p><p><em>Note: DALL-E 3 and ChatGPT are both products that evolve regularly. Even though I conducted my experiments a week ago, some of the results found in this article may already be outdated or not replicable anymore. This will inevitably happen as the models continue to be trained and as the user interface continues to be updated. While that is the nature of the AI space at this current time, the method of probing image generation models across non-English languages is still applicable for future studies.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">art fish intelligence  is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.artfish.ai/p/lost-in-dalle3-translation/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.artfish.ai/p/lost-in-dalle3-translation/comments"><span>Leave a comment</span></a></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I chose Korean and Mandarin as I can read these languages. I chose Burmese and Armenian as two low-resource languages I examined in <a href="https://www.artfish.ai/p/gpt4-project-euler-many-languages">past articles</a>. I chose Zulu as <a href="https://browse.arxiv.org/pdf/2310.02446.pdf">another low-resource language</a> examined in another recent paper.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>A few times, DALL-E 3 specified individuals in their 20s and 30s, which I classified as young. DALL-E 3 did not generate any other specific age group.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>If you are a woman, you&#8217;ll resonate deeply with this reflection of society&#8217;s obsession with young women. Many people have written on this topic, everything from <a href="https://www.irishtimes.com/life-and-style/society-is-obsessed-with-presexualised-girls-1.1538499">sexualization of young girls</a> to <a href="https://www.michigandaily.com/statement/americas-obsession-with-staying-young/#:~:text=By%20reinforcing%20this%20binary%20in%20popular%20culture%2C%20the%20media%20capitalizes%20on%20the%20association%20that%20old%20women%20are%20%E2%80%98bad%E2%80%99%20and%20young%20women%20are%20%E2%80%98good.%E2%80%99%C2%A0">Disney princesses</a>, but I digress from the main point of the article here.</p><p></p></div></div>]]></content:encoded></item></channel></rss>