Thursday, April 11, 2024

Man Creatively Sneaks Onto Delta Flight, But Gets Caught

https://onemileatatime.com/news/man-creatively-sneaks-onto-delta-flight/

The man reportedly intended to fly with Southwest to Austin. He had a “buddy pass,” which was presumably given to him by a Southwest employee, to be able to fly with the airline on a space available basis. So he cleared security with that standby pass, though unfortunately the Southwest flight was full, meaning he wouldn’t be able to take the flight.

Desperate to get to Austin, the man then used a different strategy. He went to the gate for the Delta flight to Austin, and used his phone to take pictures of the boarding passes of other passengers without their knowledge. No one noticed this at the time, but rather this was only uncovered using security camera footage after the fact.

He then boarded the aircraft using the barcode on another passenger’s boarding pass. Now, he did sort of think this through — he didn’t simply take the seat associated with that boarding pass, since he knew it would be occupied. Instead, he boarded and then tried to hide in the lavatory. The goal was to let everyone else board, and then take whatever empty seat was left.

The problem is, there were no empty seats on the flight. So when he emerged from the lavatory and the plane began taxiing, flight attendants realized something was wrong, and the plane returned to the gate. The aircraft was met by police, where the traveler “admitted he had made a mistake and was only trying get home.” Police say the man is being held on a federal detainer at the Salt Lake County Metro Jail. The flight ended up departing around 30 minutes late.

Tuesday, March 12, 2024

Inside North Korea's Forced-Labor Program

https://www.newyorker.com/magazine/2024/03/04/inside-north-koreas-forced-labor-program-in-china

In late 2023, an investigator hired by my team visited a Chinese plant called Donggang Xinxin Foodstuff. He found hundreds of North Korean women working under a red banner that read, in Korean, “Let’s carry out the resolution of the 8th Congress of the Workers’ Party.” Soon afterward, the investigator visited a nearby plant called Donggang Haimeng Foodstuff, and found a North Korean manager sitting at a wooden desk with two miniature flags, one Chinese and one North Korean. The walls around the desk were mostly bare except for two portraits of the past North Korean leaders Kim Il Sung and Kim Jong Il. The manager took our investigator to the workers’ cafeteria to eat a North Korean cold-noodle dish called naengmyeon, and then gave him a tour of the processing floor. Several hundred North Korean women dressed in red uniforms, plastic aprons, and white rubber boots stood shoulder to shoulder at long metal tables under harsh lights, hunched over plastic baskets of seafood, slicing and sorting products by hand. “They work hard,” the manager said. The factory has exported thousands of tons of fish to companies that supply major U.S. retailers, including Walmart and ShopRite.

Thursday, January 18, 2024

Let´s play money making game.

I've recently been playing through the NES Legend of Zelda; a game I played a lot as a kid.  In the preinternet era I somehow spent enough time in this game to know every secret, every hidden door, and could beat the game in a single play through.

 



However, one aspect never really spent much time on was the "money making game" aka, the basic gambling game where you choose one of three options and randomly either won or lost rupees.  I didn't spend much time playing it as a kid, because I had the distinct memory that it wasn't a good choice, ie, it had a negative expected value (I'm not sure I would have phrased it quite like that as an 8 year old, but I digress).  Another side note, while I didn't play it much, the idea of a random gambling minigame inside an unrelated game always stuck with me, and directly sparked the inclusion of a similar concept in the game pirate2, which you're undoubtedly familiar with.

Anyway, this is turning into a long post which is just a couple links, but I was wondering if the MMG was in fact a bad value or if there was any secrets there I didn't know about.  So, I started search for the answer, and came across these two very labor intensive analyses of the game, which come from totally different angles.

The first is a look at the assembly code of the game itself to see how the RNG worked, and what exactly the distribution of negative and positive payoffs was.

The second was just a guy who played the game 500 times and kept track of the outcomes.

They both reached the same conclusion: The right rupee is the worst choice and the middle rupee is the best choice.  In fact, the middle rupee does have a positive expected value, and so you can expect to make money playing it, in the long run.

Friday, December 22, 2023

Credit card debt collection

https://www.bitsaboutmoney.com/archive/the-waste-stream-of-consumer-finance/

This was one of those articles I had bookmarked for months, and when I finally started reading it seemed like a lot of stuff I already knew.  But, it got pretty interesting towards the end.  I'd recommend reading the whole thing, but I'll quote some longer parts I found interesting here.

The rights of debtors are observed by both primary lenders and eventual debt buyers mostly in the breach. One of those rights is to a written “debt verification”, with specified information in it, and (surprisingly, if you haven’t worked in this field) despite that being the law many debts are sold in such a fashion that the buyer couldn’t produce a responsive verification even if they wanted to. That isn’t even a political claim; it’s just the engineering reality of which columns are in their CSV file.

The former advocate in me will observe that the single most effective method for resolving debts is carefully sending a series of letters invoking one’s rights under the FDCPA (and other legislation) to a debt collector who is operationally incapable of respecting those rights, then threatening them with legal or regulatory action when they inevitably infringe upon them in writing, leading to them abandoning further attempts at collection.

This effectively makes paying consumer debts basically optional in the United States, contingent on one being sufficiently organized and informed. That is likely a surprising result to many people. Is the financial industry unaware of this? Oh no. Issuing consumer debt is an enormously profitable business. The vast majority of consumers, including those with the socioeconomic wherewithal to walk away from their debts, feel themselves morally bound and pay as agreed.

Why are debt collectors so bad at debt collection? Partially it is because credit card issuers are large national institutions with large, automated processes sitting atop a legacy of corporate acquisitions, IT migrations, and similar that makes availability of non-critical information extremely fragmentary. They then want to dump that complexity through a very small pipe (CSV files) onto the debt collection industry.

And this

The FDCPA and state legislation provides for automatic damages for illegal behavior from collectors, the incidence of illegal behavior is extremely high, and a debt collector with a high school education and three months of experience will frequently commit three federal torts in a few minutes of talking to a debtor then follow up with a confirmation of the same in writing. (You think I am exaggerating. Reader, I am not. “If you don’t pay me I will sue you and then Immigration will take notice of that and yank your green card” contains three separate causes of action: (frequently) a false threat to file a suit where that is not actually a business practice of the firm, a false alleged affiliation with a government agency, and a false alleged consequence for debt nonpayment not provided for in law.)

As a result, private companies compiled databases of (public in the U.S.) court filings and organized them by Social Security number, address, and similar to allow debt collectors to identify which debtors are aware of their legal rights. In principle, a debt collector could do anything they wanted with that fact, like being extra careful to follow the law in contacting them. But the economics of debt collection do not counsel careful, individualized consideration of credit card debt.

I will bet you that, in practice, they simply avoid collecting against anyone who demonstrates ability and financial resources to enforce their rights. This is one for the history books of borked equilibriums. We devoted substantial efforts to pro-consumer legislation to address abuse of (mostly) poor people. We gated redress behind labor that is abundantly available in the professional managerial class and scarce outside of it, like writing letters and counting to 30 days. (People telling me they were incapable of doing these two things is why I started ghostwriting letters for debtors.) We now have literal computer programs exempting heuristically identified professional managerial class members from debt collection, inclusive of their legitimate debts, so that debt collectors can more profitably conserve their time to do abusive and frequently illegal shakedowns of the people the legislation was meant to benefit.

 

 

Friday, December 15, 2023

The Business of Theme Parks, How Much Money Do They Make?

Despite our extended rambling, we’d like to think that you can leave with some concrete implications.  In summary, here are some of the major ones.

  • Mega theme parks such as Disney and Universal find it difficult to be profitable in the first few years after opening, and have a very low project ROI.  But it doesn’t matter.
  • Don’t seek to compete with Disney or Universal, unless you are them, or you can line up tremendous sources of financing.  Look for creative sources of financing.
  • For investors, indoor parks are promising because they generate enough in sales productivity to be appropriate for urban retail environments.
  • Regional and superregional parks are appropriate when land prices are extremely low, and the market does not have similar types of developments.  They can be built relatively cheaply.  And once built, these parks can become the landmark attraction of a market for decades.
  • Waterparks are often a more efficient alternative to regional parks, in that they occupy less space but have a similar return profile.
  • It’s important to define the return measure for attractions.  Very often, the partner/developer/government cares less about financial returns and more about employment, tourism, and GDP impacts.  Governments can be a good partner.  On the other hand, don’t benchmark these kinds of developments if you’re a purely private operation.


https://www.theparkdb.com/blog/the-business-of-theme-parks-part-i-how-much-money-do-they-make/


https://www.theparkdb.com/blog/the-business-of-theme-parks-part-ii-how-much-do-they-cost-and-earn/

 

Tuesday, November 28, 2023

The McDonald's theory of why everyone thinks the economy sucks

https://www.natesilver.net/p/the-mcdonalds-theory-of-why-everyone

The point is simply this: it’s very easy to spend a lot more these days on fast food in ways that don’t necessarily show up in inflation data. Three years ago, I might have walked down the block and ordered a Super Duper Double Cheeseburger, fries and a Diet Coke for $9.67 before tax. Now, because Uber Eats and the restaurant have correctly determined that I’m lazy and they can price-discriminate against me and I fell for their viral marketing campaign, I’ll have them deliver me a Triple Super Duper Cheeseburger Deluxe with grilled onions, plus fries and a Diet Coke — at a price of $24.25 before tax.

And don’t think this experience — or the Idaho man’s experience — is atypical. Getting fast food delivered is pretty damned expensive. In-store, fast food can be expensive too if you start messing with upgrades, add-ons and specialty items. This is why McDonald’s revenues are still rocketing up even as inflation has slowed down.

Thursday, September 28, 2023

An analysis of radio song play frequency for 102.9 WMGK, in 2023

Background

The other day I happened to be reading through my post about how often the classic rock station in Philly, WMGK, plays different songs.  I noticed that that post was made 10 years ago, on August 26, 2013.  That made me wonder how things may have changed since then and how much work it would be to scrape their recently played page again to collect the data.  I was sure my old code wouldn't work, but figured I'd take a look at the page and see how hard it'd be to write something new.

It didn't take long to realize this was the one gift every scraper lives in hope of: Their front end was just hitting an open API to get the data.  The network tools showed the request:
https://nowplaying.bbgi.com/WMGKFM/list?limit=200&offset=0

Which returns a nicely formatted JSON list of the 200 most recently played songs.  That would make it trivial to collect this data, but it got even better.  If you noticed the offset parameter there, this was pretty much just letting me get data directly out of their database.  I tested it, and sure enough, I could just page through the data to go back as far as I tested (a few months).  This meant I could get the full 60 days worth of data, for the exact same time period as I used in 2013, instantly.

I whipped up some code real quick to do the scraping, and store the data in a basic SQLite database.  I want to mention here that I'm increasingly a fan of using SQLite for storing data for these types of quick projects.  Don't get me wrong, I still wished I had a real database a dozen times when writing queries in this project, and wouldn't use SQLite for an actual application, but the alternative here is CSV files, not a full Postgres database.  And when compared to CSV files, there's no contest.  Being able to write SQL, and being able to decide to expand the scope and add a few more tables later on, while still having those relations represented is very nice.  And from the other side, using SQLite means you still have a file you can easily share like a CSV file.  With the expected audience of this post, I probably don't have to spend any more time arguing that databases are good, so I'll stop here.

I also want to say that while I was doing the scraping, I was very paranoid they would realize what I was doing and cut me off at any moment.  That is, until I realized there was no way anyone in the world cared about this data besides me, and there was no one watching a dashboard live seeing the spike in queries and immediately launching into action to prevent the scraping of their priceless WMGK recently played data.


So what does WMGK play?

With that all out of the way, how does WMGK in 2023 compare with WMGK in 2013?  I was able to look at the exact same date range as in 2013, June 26th to August 24th, or 60 days.

In my previous post I compared what WMGK played with what the most popular songs by those bands were on Last.fm a lot.  I'm not doing any of that here.  You can refer to the graphs from that post if you want to see what the most popular songs by bands are, but here I'm focused on how WMGK has changed in 10 years.

I would assume that they are playing more recent bands, and perhaps have retired some of the older bands they used to play.  Philly still has a Rock station in WMMR though, so I would think their might be a more older bias still on the Classic Rock station.  Also WMGK (and WMMR) were bought in 2016 by the Beasley Media Group, which I assume is ran by Pam from The Office.  Might the new owners prefer more variety?  Conglomerates buying and consolidating industries usually results in a better product, right?

Coverage

To review, in 2013 WMGK played 14,286 songs during the 60 day period, of which 924 were unique.  But of those unique songs, there was a clear trend where a few dozen got played daily or every other day, and then they occasionally threw in a "deep" cut (which likely wasn't that deep).  The best summary of that fact, was that just 172 songs represented half of those 14,286 plays over 60 days.

In 2023 WMGK played 15,190 songs during the same 60 day period (6/26/2023 to 8/24/2023), of which 1060 were unique.  So far so good, that is better than last time, albeit not much.  Really the question is how distributed are those songs.  Is it the same as last time, where the same handful of songs get played constantly?  In a word, yes.

Here's the coverage breakdown.

2013
2023
Coverage Songs
Coverage Songs
10.04% 28
10.30% 31
25.27% 77
25.02% 79
50.23% 172
50.03% 168
75.13% 279
75.10% 275
90.03% 373
90.01% 381
100.00% 924
100.00% 1060

To review what I mean by "coverage" this is the number of songs you'd need to represent x% of plays.  In other words, there's a 25% chance that any random song you heard on WMGK was one of the top 79 songs they play.  If anything, it's striking how similar those numbers are.  The only noticeable difference is a slightly longer tail of songs they played once.

Top Songs

So, things aren't looking good for 2023 WMGK, but let's look closer.  What were the top songs played in 2023 vs 2013?  The first table is sorted by top songs in 2013, and the second is top songs in 2023.

Plays/30 days (2013) Band Song Plays/30 days (2023)
27.5 Warren Zevon Werewolves Of London 0
27 Cars Just What I Needed 26.5
27 Blue Oyster Cult Burnin' For You 22
27 Steve Miller Band Rock 'n Me 26
26.5 Supertramp The Logical Song 23.5
26.5 David Bowie Changes 18
26.5 Pink Floyd Another Brick In The Wall 21.5
26.5 Electric Light Orchestra Do Ya 0
26 J. Geils Band Centerfold 25.5
26 War Low Rider 25.5




Plays/30 days (2013) Band Song Plays/30 days (2023)
0 Survivor Eye Of The Tiger 27.5
0 Outfield Your Love 27.5
4.5 Allman Brothers Band Ramblin' Man 26.5
27 Cars Just What I Needed 26.5
1 Eddie Money Take Me Home Tonight (Be My Baby) 26.5
14 Pete Townshend Let My Love Open The Door 26
27 Steve Miller Band Rock 'n Me 26
16.5 Fleetwood Mac Go Your Own Way 25.5
22 Guess Who American Woman 25.5
26 J. Geils Band Centerfold 25.5

Obviously, I expected changes, but it is surprising how many songs went from 0 plays in 2013 to top 10 song in 2023 or vice versa.  While I don't have anything against Eye of the Tiger, it doesn't strike me as a particularly "Classic Rock" song.  Admittedly, there's some association with Philly due to Rocky, but still, it's an odd top song.

However, it's time to discuss the biggest travesty I discovered while looking through the 2023 data.  You may have noticed that Warren Zevon's Werewolves of London went from the #1 song, with 27.5 average plays in 30 days, down to 0 plays across 60 days.  That is bad enough, however, one of the new bands to be played in 2023 was Kid Rock.  He had a single play for a song called "All Summer Long", which I decided to listen to to see if it qualified as Classic Rock.  Well I'm guessing most people reading this are ahead of me on the punchline, but that song is the one were he "samples" the Werewolves of London intro.

So, to review: If you were listening to 102.9 WMGK, the Classic Rock station from June 26th 2023 to August 24th 2023, and you heard the Werewolves of London intro, we can say with absolute certainty that what you were actually hearing was Kid Rock.  Our ancestors weep.

I'm hesitant to even link to the Kid Rock song, but if you aren't familiar with it, here it is.  After you watch 5 seconds of that, please switch to the Werewolves of London video, and tell me that isn't an objectively better video.

Top Bands

Well, I don't even know if there's any point in going on, but for what it's worth, here's the top bands.

Plays (2013) Band Plays (2023) Change
356.5 Rolling Stones 251.5 -105
334.5 Led Zeppelin 252 -82.5
283 Beatles 69.5 -213.5
183 Pink Floyd 147.5 -35.5
177.5 Who 82 -95.5
169.5 Van Halen 130 -39.5
164 Queen 135.5 -28.5
154.5 Journey 161.5 7
143 Cars 71.5 -71.5
138.5 Billy Joel 178 39.5








Plays (2013) Band Plays (2023) Change
334.5 Led Zeppelin 252 -82.5
356.5 Rolling Stones 251.5 -105
138.5 Billy Joel 178 39.5
105 Elton John 168.5 63.5
105.5 Aerosmith 167 61.5
154.5 Journey 161.5 7
135.5 Tom Petty & The Heartbreakers 160 24.5
25.5 U2 148 122.5
183 Pink Floyd 147.5 -35.5
77.5 Fleetwood Mac 147 69.5

More similar than the top songs.  Biggest changes are the dropping of Beatles and the rise of U2.  Sorry, I can't think straight after that Kid Rock/Werewolves of London debacle.  But let's perservere

Here's a graph of the top 50 bands.

And here's the 2013 version, for comparison.

Chart is pretty similar, although the top bands in 2023 are only getting 250 plays/month vs 350 for in 2013.  The rest of the top tier bands got 150 plays/month in both 2013 and 2023.

Plays per hour of day

Here's a quick plays per hour chart.

And then the 2023 version.

Similar, but less of a clear pattern.

Top songs by band

This is the part of the post where I looked at a bunch of individual bands last time.  I'm not really going to do that here, we already know MGK doesn't play a variety from any band, they just pick 2 or 3 hits and only play those.  But I will post one comparison for one band.

That band is Yes, who admittedly are maybe not the best radio band, but they absolutely have plenty of Classic Rock hits.  In 2013 MGK played I've Seen All Good People 22 times/month, and then occasional plays from "deep cuts" like Roundabout and a few others.

In 2023 MGK played only 2 Yes songs, one was Roundabout with 1.5 plays per month (down from 3.5 in 2013).  The other got 25.5 average plays in 30 days, and that song was Owner of a Lonely Heart.

Now I like Owner of a Lonely Heart, just like I like every Yes song, but it is not the first song that comes to mind when I think of Classic Rock.  At this point, I just want to say that I did go to WMGK's site and confirm that they still refer to themselves as a "Classic Rock" station, which they do.  I just can't fathom the decision making here.  I'd like to present this dramatization of how that decision may have been made.

"Ok, we need our single song for our Yes allotment for our Classic Rock station that we can play every single day"

"What if instead of playing the same Yes song every day we just choose from a handful of the best Yes songs?"

"You're fired, who's got my pick?"

"What era do people most associate with both 'classic rock' and the band Yes?  The 80s right? So let's play the most 80s Yes song, Owner of a Lonely Heart"

"Perfect, let's go to lunch" 

Changes in band makeup

Another thing I wanted to look at was what bands from 2013 were no longer played in 2023, and what bands were new in 2023.  I don't think I ever gave this stat in my prior post, but in 2013 there were 172 distinct bands played.  In 2023 that number has gone up to 293.  Looking at a diff of the two, there aren't many that disappeared; a few that look like they did are just due to formatting changes, like replacing & with and, which I could clean up, but I'm not.

Skimming through the diff, the only bands I notice going away in 2023 are: Jefferson Airplane, The Animals, Yardbirds, and Zombies.  Which are are all very 60s era bands, so I guess that makes sense.  There's a ton of new 90s bands, which I can't say I agree with for a Classic Rock station, but that is far from the top problem.

"One hit" wonders

Finally, I attempted to answer the question, how many bands do they just play one song (or nearly one song) from?  So to start with some high level stats.  There were 293 distinct bands, and of those 78 were only played 1 time, leaving 215 bands that were played multiple times.  Of those, there were 45 bands with at least 3 plays, yet only 1 song ever played.  In addition to those 45 bands with only 1 song played multiple times, there were 33 more bands with multiple songs played, but where the top song was at least 80% of those plays.  Finally, there are only 68 bands with at least 3 plays where the top song wasn't most (> 50%) of those plays.

I eyeballed the list of some of these groups, and two stood out to me from the list of bands where MGK played just 1 song by a band: Alice Cooper (School's Out), and Thin Lizzy (The Boys Are Back In Town).  If we expand that list out to include bands with multiple songs, but where the top song was at least 80% of the plays then we get a lot more, including:

Alice In Chains Man In The Box
Neil Young Rockin' In The Free World
Yes Owner Of A Lonely Heart
Allman Brothers Band Ramblin' Man
Electric Light Orchestra Evil Woman
Van Morrison Moondance
Rush Tom Sawyer

Keep in mind these were just the bands that stood out to me as bands where I know they have a bunch of hits that could be played besides just one song.


Closing

I think the data speaks for itself (with the help of my multiple page essay you just read through): WMGK has not gotten better from 2013 to 2023, and in fact, has gotten much worse in some key areas.  I'm curious what 2033 holds.

Also, for the record, I don't think I've listened to WMGK, or any radio station since my 2013 post.  So maybe I have no room to talk, maybe the people listening to the radio in 2023 want to hear the same 80 songs on repeat every day forever.

 

Further Reading

As I mentioned above, I did a similar analysis in 2013, which you probably should have read before this post.

In the extraordinarily unlikely scenario that you want to scrape some WGMK recently played songs for yourself, the code is up on Github.

In the slightly less unlikely scenario you just want to play around with the data here is an unorganized spreadsheet with 10 tabs worth of data I used for this post.  And here is the full SQLite database which includes some additional time than just the 60 days I focused on in this post.

Wednesday, August 30, 2023

See Random Wikipedia Articles

https://wetzel.dev/tools/wikipedia.html

I made this page that shows random pages from Wikipedia.  You may know Wikipedia has a random article link already, which I've always liked.  But my version has two advantages:

  1. It shows quick summaries of articles and an image for each article so you can quickly skim through them.
  2. It allows you to set a popularity filter.  This uses the number of inbound links to that page as a proxy for how popular it is, so you can filter out huge pool of random-town-in-Wales articles.

The second point is the real reason I made it.  The actual random article link just gives so much uninteresting stubs.  I feel like using number of inbound links works well for only seeing somewhat meaningful articles.

Monday, July 31, 2023

Enshittification

https://pluralistic.net/2023/01/21/potemkin-ai/

When a platform starts, it needs users, so it makes itself valuable to users. Think of Amazon: for many years, it operated at a loss, using its access to the capital markets to subsidize everything you bought. It sold goods below cost and shipped them below cost. It operated a clean and useful search. If you searched for a product, Amazon tried its damndest to put it at the top of the search results.

This was a hell of a good deal for Amazon's customers. Lots of us piled in, and lots of brick-and-mortar retailers withered and died, making it hard to go elsewhere. Amazon sold us ebooks and audiobooks that were permanently locked to its platform with DRM, so that every dollar we spent on media was a dollar we'd have to give up if we deleted Amazon and its apps. And Amazon sold us Prime, getting us to pre-pay for a year's worth of shipping. Prime customers start their shopping on Amazon, and 90% of the time, they don't search anywhere else.

That tempted in lots of business customers – Marketplace sellers who turned Amazon into the "everything store" it had promised from the beginning. As these sellers piled in, Amazon shifted to subsidizing suppliers. Kindle and Audible creators got generous packages. Marketplace sellers reached huge audiences and Amazon took low commissions from them.

This strategy meant that it became progressively harder for shoppers to find things anywhere except Amazon, which meant that they only searched on Amazon, which meant that sellers had to sell on Amazon.

That's when Amazon started to harvest the surplus from its business customers and send it to Amazon's shareholders. Today, Marketplace sellers are handing 45%+ of the sale price to Amazon in junk fees. The company's $31b "advertising" program is really a payola scheme that pits sellers against each other, forcing them to bid on the chance to be at the top of your search.

Searching Amazon doesn't produce a list of the products that most closely match your search, it brings up a list of products whose sellers have paid the most to be at the top of that search. Those fees are built into the cost you pay for the product, and Amazon's "Most Favored Nation" requirement sellers means that they can't sell more cheaply elsewhere, so Amazon has driven prices at every retailer.

 

Wednesday, June 14, 2023

The time adding a limit to a Postgres query made it slower

This is something I ran into at work.  I wouldn't usually post stuff like this here, but I thought it was pretty interesting.  It involves optimizations to SQL queries, so best to stop reading here if that's not something you're interested in.  I've cleaned up the example below and made it generic.


Out database is Postgres 13 and we have two tables.  One is named offers which has a primary key of offer_id and a foreign key column named plan_id.  The other table is named plans and has a plan_id primary key and then a column named plan_name.  Many offers can have the same one plan_id.  There are indexes on the primary and foreign key columns.  We want to find all the offers that have a particular plan_name.


This was the starting query:
select * from offers where plan_id in (
  select plan_id from plans where plan_name = 'Example Plan'
)
order by offer_id desc
limit 5


It took about 200 seconds to run.

If I removed the order by and ran it again, it only took 14 ms for me.  So far so good, sorting can be slow, although offer_id is the primary key, and has an index, so sorting by it shouldn't be that slow.

Here's where things get interesting though.  If I removed the limit, and keep the order by:
select * from offers where plan_id in (
  select plan_id from plans where plan_name = 'Example Plan'
)
order by offer_id desc

That only takes 323 ms.

To be clear: Adding a limit to this query causes it to go from less than a second to 3 minutes!

It turns out this is a thing.  If you google "Postgres query slower with limit" you'll find lots of confused Stack Overflow posts.  The problem is when Postgres's statistics are wrong and the query planner assumes the limit will make one approach faster, but actually there are far less rows meeting your criteria than it thought, and it takes much longer.

In this case, Postgres thinks "oh I have an index on offer_id, let me just sort by that column and scan through all those rows until I find 5 rows that match that plan_id filter, that'll be fast cause there's plenty of rows that match that filter and I need to sort them anyway, so I'm accomplishing 2 things with one sort" (Postgres has a problem with run on sentences).  However, Postgres only thinks that's a good plan because it misestimated how many rows match the filter, thinking there are thousands when there are only a dozen or so.  So it wastes a lot of time scanning through rows in reverse chronological order until it finds 5.  A better plan is to use the index on the plan_id to find all the rows that match that, and then sort those and take the newest 5.  But that is only a better plan because we know how few rows actually match the filter, and Postgres has a bad estimate for that.

The "solution" is to force Postgres to not use that plan.  Postgres does not have query plan hints, so we have to use hacks to force Postgres to not take the slow approach.  There are various suggestions to use subqueries or CTEs and put the order by in the subquery, and the limit on the outer query.  So something like this
with all_results as (
  select * from offers
  join plans using (plan_id)
  where plan_name = 'Example Plan'
  order by offer_id desc
)
select * from all_results
limit 5

But it seems Postgres 13 is smarter than that, and still optimizes the combined query.  It does seem a bit faster than the first query, but still took 2 minutes to run.

However, a much hackier solution, which also seems to be fastest and simplest to understand is to "modify" the column you are ordering on so that it can't use the index.  So bizarrely, this query
select * from offers where plan_id in (
  select plan_id from plans where plan_name = 'Example Plan'
)
order by offer_id + 0 desc
limit 5

Ran in 272 ms.  Removing the + 0 causes it to go back to 3+ minutes.

This article goes into more detail, and is where I got the + 0 idea came from.

I should also mention that the "real" solution to this is to increase the amount of statistics on the columns Postgres is having a hard time estimating, but I tested this and it didn't seem to fix this case.  Removing the index on offer_id would also work, although that's not an option here with it being the primary key, it also feels like a bad idea to remove the index entirely rather than just forcing Postgres not to use it on a particular query where we know it's a bad idea.