Some companies do outsource their “AI” to India, but automated checkout tech is actually good enough to be used in production now. A plain white background with separated fruits like this is exactly the environment where it works best.
automated checkout tech is actually good enough to be used in production now
not really.
amazon’s just walk out is the leader in this area, and it came out recently that the bulk of transactions, 7 in 10, are offloaded for manual review in india
amazon of course denied the claim, but so in vague corporate speak, and failed to provide figures to counter the 7-in-10. they also did confirm that they’re scaling back just walk out. i don’t think those things would be the case if this technology worked as they were hoping.
Just because Amazon, king of scams, is doing an AI scam, that doesn’t mean that the underlying technology is impossible to use with minimal errors (it’s AI, it’s made of statistics, there will always be some errors).
Anyways, “just walk out” works in a different way than the fruit recognition in the OP or the checkout machines I was talking about. Image recognition of a discrete item over a white background (or a checkered background) is like, the literal ideal case for image recognition accuracy. This is as opposed to blurry store cameras looking at an entire aisle from 20 feet away and trying to guess what item the customer is taking off the shelf. It’s an entirely different problem space in every way that matters.
Anyways, even ignoring theoretical arguments, I know it’s production-ready because it’s currently beong used in production. There are dozens of stores in Calofornia right now that use checkout machines with a camera that points down towards a plain background “pad”. You place the item on the pad and it selects the most likely item in the store based on what it sees. I’ve seen a live demo of these machines where you take ~10-15 pictures of an item from different angles/rotations/positions and add it to the list of recognizable items, and the machine was able to diatinguish between that item and others accurately. This was in a very candid and scam-unlikely environment (OpenSauce) and by my evaluation this is easily consistent with other known-good image recognition applications.
[Sorry, double posted, my mobile connection is pretty bad rn]
Just because Amazon, king of scams, is doing an AI scam, that doesn’t mean that the underlying technology is impossible to use with minimal errors (it’s AI, it’s made of statistics, there will always be some errors).
Anyways, “just walk out” works in a different way than the fruit recognition in the OP or the checkout machines I was talking about. Image recognition of a discrete item over a white background (or a checkered background) is like, the literal ideal case for image recognition accuracy. This is as opposed to blurry store cameras looking at an entire aisle from 20 feet away and trying to guess what item the customer is taking off the shelf. It’s an entirely different problem space in every way that matters.
Anyways, even ignoring theoretical arguments, I know it’s production-ready because it’s currently beong used in production. There are dozens of stores in Calofornia right now that use checkout machines with a camera that points down towards a plain background “pad”. You place the item on the pad and it selects the most likely item in the store based on what it sees. I’ve seen a live demo of these machines where you take ~10-15 pictures of an item from different angles/rotations/positions and add it to the list of recognizable items, and the machine was able to diatinguish between that item and others accurately. This was in a very candid and scam-unlikely environment (OpenSauce) and by my evaluation this is easily consistent with other known-good image recognition applications.
it’s AI, it’s made of statistics, there will always be some errors
7 in 10 required manual review
This is as opposed to blurry store cameras looking at an entire aisle from 20 feet away and trying to guess what item the customer is taking off the shelf. It’s an entirely different problem space in every way that matters.
which is why that wasn’t the setup of just walk out
every location was quite literally purpose built with the express goal of making the just walk out technology as accurate as it possibly could be
You place the item on the pad and it selects the most likely item in the store based on what it sees
this is a completely different problem
nobody’s placing the berry or berries they decide to eat or not eat in a separate area before placing them in their mouth
Yes, that’s what I’ve been trying to explain. And no, JWO was not built to be accurate, it was built to be convenient. That’s a very different incentive that will lead to skipping alternatives that are less convenient but more accurate-- like the checkout kiosks I’ve been talking about. I’m not defending JWO and it’s obviously both a harder problem and one that’s not managed well, focusing on optics over accuracy.
nobody’s placing the berry or berries they decide to eat or not eat in a separate area before placing them in their mouth
That’s not necessary, they’re already placed in a nearly ideal environment by the person setting up the berry bowl. Notice how the “bowl” is a white square with each fruit placed in a way where they’re separated by the whitespace. You wouldn’t even need to train a model on the whole bowl, you could just do an image region detection --> object recognition pipeline. The hardest part about the berry bowl would by far be determining the person taking the fruit! (In fact, I wouldn’t be surprised if that was manually reviewed, with that few instances to look at.)
I’m not deliberately misinterpreting you, but I think I found where the disconnect is:
jwo is a different problem than the separate checkout kiosk you’re describing
jwo is the same problem as is in the image
I don’t think this is the case. The berry box is somewhere between JWO and checkout kiosks, in that the density of items is small, and the background is clear, but there are multiple items at the same time. I’m seeing the items as discrete enough that it’s more similar to checkout kiosks, but you’re seeing it as more similar to JWO (now I understand why you keep bringing JWO up).
it was built to be accurate within the boundary of “no checkout step”
Was it? I mean in some sense yes, but I feel like it was primarily built for Amazon’s image, to give the appearance of it working well. That’s why they’re secretly hiring people and claiming it’s AI, after all. If they weren’t doing it for their image they wouldn’t even need to pretend that it was AI.
unless somebody moves or jostles them while taking some fruit
you’re essentially making the exact same naive assumptions about the operating environment that led to jwo’s failures
I suppose I am, but it appears that the person in the image is also making that same assumption (to the extent that the image is real-- it is satire after all). Having multiple items in the box would decrease the accuracy not only because of items touching, but also because the person could cover the box while jostling all the items’ positions. You’d have to count every item before and after their interaction, and they could take 0, 1, or more items. It’s definitely not as simple as I was thinking, you’re right. Still easier than JWO imo but not as easy as the kiosks.
facial recognition is a thoroughly solved problem, at least in terms of the accuracy that we’re aiming for here
It’s not clear to me whether it’s easy to take fruit from the bowl without showing your face. It’s certainly possible, but it depends on where the people are approaching from whether it’s likely.
That was for automated checkout. Video people counters have been around for years. I’ve worked for companies that used them to count customers by department.
this isn’t counting people. this is working out which item or items people pick up from a shelf and decide to keep, if any. that isn’t just similar to the automated checkout problem: it’s the same exact problem. if anything, this iteration of it is more challenging because a blueberry is a fair amount smaller than a tin of beans.
amazon spent a lot of money on trying to do this and then found out the technology doesn’t exist and outsourced it to india
But that’s not a bowl. It’s more like a box. No, it’s ok. I’ll get on the call at 10pm.
Some companies do outsource their “AI” to India, but automated checkout tech is actually good enough to be used in production now. A plain white background with separated fruits like this is exactly the environment where it works best.
not really.
amazon’s just walk out is the leader in this area, and it came out recently that the bulk of transactions, 7 in 10, are offloaded for manual review in india
amazon of course denied the claim, but so in vague corporate speak, and failed to provide figures to counter the 7-in-10. they also did confirm that they’re scaling back just walk out. i don’t think those things would be the case if this technology worked as they were hoping.
Just because Amazon, king of scams, is doing an AI scam, that doesn’t mean that the underlying technology is impossible to use with minimal errors (it’s AI, it’s made of statistics, there will always be some errors).
Anyways, “just walk out” works in a different way than the fruit recognition in the OP or the checkout machines I was talking about. Image recognition of a discrete item over a white background (or a checkered background) is like, the literal ideal case for image recognition accuracy. This is as opposed to blurry store cameras looking at an entire aisle from 20 feet away and trying to guess what item the customer is taking off the shelf. It’s an entirely different problem space in every way that matters.
Anyways, even ignoring theoretical arguments, I know it’s production-ready because it’s currently beong used in production. There are dozens of stores in Calofornia right now that use checkout machines with a camera that points down towards a plain background “pad”. You place the item on the pad and it selects the most likely item in the store based on what it sees. I’ve seen a live demo of these machines where you take ~10-15 pictures of an item from different angles/rotations/positions and add it to the list of recognizable items, and the machine was able to diatinguish between that item and others accurately. This was in a very candid and scam-unlikely environment (OpenSauce) and by my evaluation this is easily consistent with other known-good image recognition applications.
The Amazon shop is a lot more complicated than a few berries on a white shelf.
not in the ways that matter, and small, organic items like individual berries are far harder to account for than standardized product packaging
Could be or could be the berries are put in the same arrangement each day and it’s just tracking which black blob disappears.
pretty sure items on a shop shelf are in the same arrangement each day
That’s not necessarily true-- in fact, two similarly packaged items that are otherwise different might actually be harder to tell apart when packaged.
which is why just walk out also had rfid tokens on all their products
you can’t do that with a strawberry unless you like your fruit crunchy
They had RFID? Yeah that seems like a superior option in most cases (some produce being an obvious exception)
[Sorry, double posted, my mobile connection is pretty bad rn]
Just because Amazon, king of scams, is doing an AI scam, that doesn’t mean that the underlying technology is impossible to use with minimal errors (it’s AI, it’s made of statistics, there will always be some errors).
Anyways, “just walk out” works in a different way than the fruit recognition in the OP or the checkout machines I was talking about. Image recognition of a discrete item over a white background (or a checkered background) is like, the literal ideal case for image recognition accuracy. This is as opposed to blurry store cameras looking at an entire aisle from 20 feet away and trying to guess what item the customer is taking off the shelf. It’s an entirely different problem space in every way that matters.
Anyways, even ignoring theoretical arguments, I know it’s production-ready because it’s currently beong used in production. There are dozens of stores in Calofornia right now that use checkout machines with a camera that points down towards a plain background “pad”. You place the item on the pad and it selects the most likely item in the store based on what it sees. I’ve seen a live demo of these machines where you take ~10-15 pictures of an item from different angles/rotations/positions and add it to the list of recognizable items, and the machine was able to diatinguish between that item and others accurately. This was in a very candid and scam-unlikely environment (OpenSauce) and by my evaluation this is easily consistent with other known-good image recognition applications.
7 in 10 required manual review
which is why that wasn’t the setup of just walk out
every location was quite literally purpose built with the express goal of making the just walk out technology as accurate as it possibly could be
this is a completely different problem
nobody’s placing the berry or berries they decide to eat or not eat in a separate area before placing them in their mouth
Yes, that’s what I’ve been trying to explain. And no, JWO was not built to be accurate, it was built to be convenient. That’s a very different incentive that will lead to skipping alternatives that are less convenient but more accurate-- like the checkout kiosks I’ve been talking about. I’m not defending JWO and it’s obviously both a harder problem and one that’s not managed well, focusing on optics over accuracy.
That’s not necessary, they’re already placed in a nearly ideal environment by the person setting up the berry bowl. Notice how the “bowl” is a white square with each fruit placed in a way where they’re separated by the whitespace. You wouldn’t even need to train a model on the whole bowl, you could just do an image region detection --> object recognition pipeline. The hardest part about the berry bowl would by far be determining the person taking the fruit! (In fact, I wouldn’t be surprised if that was manually reviewed, with that few instances to look at.)
jwo is a different problem than the separate checkout kiosk you’re describing
jwo is the same problem as is in the image
it was built to be accurate within the boundary of “no checkout step”
at this point it feels like you’re deliberately misinterpreting me
unless somebody moves or jostles them while taking some fruit
you’re essentially making the exact same naive assumptions about the operating environment that led to jwo’s failures
if “just track which one disappeared” was a valid solution to the problem, jwo wouldn’t have failed
facial recognition is a thoroughly solved problem, at least in terms of the accuracy that we’re aiming for here
I’m not deliberately misinterpreting you, but I think I found where the disconnect is:
I don’t think this is the case. The berry box is somewhere between JWO and checkout kiosks, in that the density of items is small, and the background is clear, but there are multiple items at the same time. I’m seeing the items as discrete enough that it’s more similar to checkout kiosks, but you’re seeing it as more similar to JWO (now I understand why you keep bringing JWO up).
Was it? I mean in some sense yes, but I feel like it was primarily built for Amazon’s image, to give the appearance of it working well. That’s why they’re secretly hiring people and claiming it’s AI, after all. If they weren’t doing it for their image they wouldn’t even need to pretend that it was AI.
I suppose I am, but it appears that the person in the image is also making that same assumption (to the extent that the image is real-- it is satire after all). Having multiple items in the box would decrease the accuracy not only because of items touching, but also because the person could cover the box while jostling all the items’ positions. You’d have to count every item before and after their interaction, and they could take 0, 1, or more items. It’s definitely not as simple as I was thinking, you’re right. Still easier than JWO imo but not as easy as the kiosks.
It’s not clear to me whether it’s easy to take fruit from the bowl without showing your face. It’s certainly possible, but it depends on where the people are approaching from whether it’s likely.
No, facial recognition works (unfortunately), it’s just not good enough to look at an entire shopping cart and know what’s in it lol
That was for automated checkout. Video people counters have been around for years. I’ve worked for companies that used them to count customers by department.
this isn’t counting people. this is working out which item or items people pick up from a shelf and decide to keep, if any. that isn’t just similar to the automated checkout problem: it’s the same exact problem. if anything, this iteration of it is more challenging because a blueberry is a fair amount smaller than a tin of beans.
Good point.