Duplicate entries when using pagination

Simon · April 4, 2022, 7:01am

I’m getting duplicate entries after retrieving all the records for a given contract address using the /NFT/{address}/owners or /NFT/{address} endpoints. Both endpoints result in 104399 entries but after I pull and query, I’m finding roughly 5000 dupes. Can someone verify? I think the moralis sync may need to be run again or something. I’m expecting only unique token_id’s.

The contract address is: 0x50f5474724e0ee42d9a4e711ccfb275809fd6d4a

Here are a few token_id’s for example: ‘13554’, ‘154410’, ‘32997’ …etc

Thanks

cryptokid · January 10, 2022, 9:02pm

it looks like I got 104396 unique token ids

we know about having duplicates sometimes, and also sometimes some few token_ids may be missing

we are working on fixing this problem

Simon · January 10, 2022, 10:10pm

That’s interesting. The web admin interface is showing 104399 entries (https://admin.moralis.io/web3Api) which is also what I’m getting from the api endpoint. I wonder why you would be seeing 104396? I’m also wondering why you’re seeing all unique token_id’s. I’ve definitely pulled all the entries a few times and I’m getting the dupes each time. Something is off. Did you query for some of those specific tokens? (‘13554’, ‘154410’, ‘32997’). Do you know if that field is set a unique in your table?

cryptokid · January 10, 2022, 10:14pm

I did a second test earlier and I got only 2 duplicates: 143583, 48312 and 104397 unique token_ids

I used this endpoint: https://deep-index.moralis.io/api/v2/nft/0x50f5474724e0ee42d9a4e711ccfb275809fd6d4a/owners?chain=eth&format=decimal&order=

Simon · January 11, 2022, 1:25am

The web interface is now showing a total of 104390 entries for 0x50f5474724e0ee42d9a4e711ccfb275809fd6d4a on the /nft/{address}/owners. That’s less than what both of us got earlier. Any ideas on what is going on?

I’ve re-factored my script multiple times but still get duplicates. They may not exist in your DB but I think they API is feeding them out somehow. I suspect the offset could be the issue. I’d be willing to send you my script to take a look at. It essentially calls the endpoint and decrements the offset (total entries) by 500 until the local db count is equal to the remote count.

cryptokid · January 11, 2022, 8:30am

I usually increment the offset and not decrement it.

We rerun a sync for that specific contract meanwhile that may correspond to that lower number this time.

Simon · January 16, 2022, 2:20pm

I’ve done some further investigation. I wrote a script to call the /nft/{address}/owners endpoint recursively decrementing the offset by 500 until it reaches 0 and store locally each record, along with the offset position for each record.

On the first run through, I found approx 153 duplicate entries out of the 105347 entries for the given contract.

I truncated my local db and tried again and found different duplicates than the first time. Here’s an example of the tokens and the corresponding offset position the entry was found at.

TokenID Offset
29297 55847
29297 55347

42063 54347
42063 53847

31523 54347
31523 53847

I then ran a single api call on those specific offset positions. Here was the result:

29297 55847 -> not found
29297 55347 -> found

42063 54347 -> not found
42063 53847 -> found

31523 54347 -> not found
31523 53847 -> found

My conclusion, along with your checking from earlier is that these duplicates do not exist in your table but are possibly being fed by whatever mechanism you have in place feeding the api results. I believe my server received responses with those token id’s matching those offset positions.

If the data processing error (call or insert) was on my side, I would expect a more uniform set of errors. These duplicates appear to occur randomly and to come from the server (moralis).

I’ll continue to investigate on my side and keep you updated on any new tests or findings. …an ‘order by’ param on this endpoint would make debugging much easier.

Also, I decrement the offset because it appears that the higher the offset number, the older the result set. Pulling the oldest first allows for a cleaner delta going forward after the initial sync.

Simon · January 16, 2022, 2:26pm

Another thought occurred to me. It’s possible that the total count of entries increases on the moralis side while my sync is running and that my offset needs to adjust for this. I would think this shouldn’t be a factor if the default order by for the endpoint is by block_number_minted.DESC.

Again, an order by param on the endpoint would allow an easier debug.

cryptokid · January 16, 2022, 2:38pm

Hi, it looks like there is a problem with the pagination mechanism. There is the right data in the database but when doing pagination I can get random duplicates from one run to another. We will investigate. There is an order, but it seems that it is not on an unique element.

Simon · January 20, 2022, 12:57am

Any updates on this?

cryptokid · January 20, 2022, 8:04am

we are working on fixing it

NomadDev · January 24, 2022, 3:27am

Hi, I think the issue I just posted is likely related. I am putting here so both can be solved at the same time.

YosephKS · January 24, 2022, 3:30am

Yeah looks like same issue, just saw recently your thread, we’re working on fixing it

Simon · March 22, 2022, 10:46pm

It’s been almost 2 months. Any update on this? You’d think a company focused on data would prioritize such a glaring issue as a broken pagination system. You can imagine how many people are getting duplicates or inaccurate data and have no idea why and how many other projects are at a standstill until this issue is addressed.

cryptokid · March 23, 2022, 11:02am

we are working on a new backend infrastructure now

cryptokid · March 23, 2022, 12:07pm

did you try to use the cursor?

Simon · April 3, 2022, 11:53pm

Yeah, I tried the “cursor” that you guys put in after taking away the “sort by” option and it didn’t work. The system for pulling data via API is fundamentally broken and it doesn’t seem to be much of a priority for you guys. As a data service provider, I would think this would gain much more attention. Virtually every project being built utilizing your API service will not work correctly or give false or missing data. Are you aware of the how this will impact new devs jumping into your ecosystem? There’s no disclaimer, no email notice, nothing. Plenty of moralis emails and youtube videos on using the javascript calls though. Could this be because your model relies on monetizing devs per call? I mean, an API call would allow people to actually retrieve, store and manipulate data on their servers so I could see how this is directly opposed to your pay. Sorry, I guess I’m just really at a loss for how such a gaping flaw in your system just sits in place, frustrating people that don’t spend the 3 weeks (as I did) writing tests (remotely) to pinpoint how your system is failing.

Please forward this to your highest dev or manager. I feel like your responses are just stopgap replies. …did you try the cursor? Seriously?

alex · April 4, 2022, 12:49am

Is this only an issue with this endpoint or contract? I haven’t noticed any duplicates in my queries but I’m not working with anywhere near the same amount of data.

Simon · April 4, 2022, 2:03am

Start at the top of this thread

cryptokid · April 4, 2022, 6:58am

I forwarded this problem to the team, with that particular contract with ~100k token ids