Duplicate entries when using pagination

Iā€™ve done some further investigation. I wrote a script to call the /nft/{address}/owners endpoint recursively decrementing the offset by 500 until it reaches 0 and store locally each record, along with the offset position for each record.

On the first run through, I found approx 153 duplicate entries out of the 105347 entries for the given contract.

I truncated my local db and tried again and found different duplicates than the first time. Hereā€™s an example of the tokens and the corresponding offset position the entry was found at.

TokenID Offset
29297 55847
29297 55347

42063 54347
42063 53847

31523 54347
31523 53847

I then ran a single api call on those specific offset positions. Here was the result:

29297 55847 -> not found
29297 55347 -> found

42063 54347 -> not found
42063 53847 -> found

31523 54347 -> not found
31523 53847 -> found

My conclusion, along with your checking from earlier is that these duplicates do not exist in your table but are possibly being fed by whatever mechanism you have in place feeding the api results. I believe my server received responses with those token idā€™s matching those offset positions.

If the data processing error (call or insert) was on my side, I would expect a more uniform set of errors. These duplicates appear to occur randomly and to come from the server (moralis).

Iā€™ll continue to investigate on my side and keep you updated on any new tests or findings. ā€¦an ā€˜order byā€™ param on this endpoint would make debugging much easier.

Also, I decrement the offset because it appears that the higher the offset number, the older the result set. Pulling the oldest first allows for a cleaner delta going forward after the initial sync.

Another thought occurred to me. Itā€™s possible that the total count of entries increases on the moralis side while my sync is running and that my offset needs to adjust for this. I would think this shouldnā€™t be a factor if the default order by for the endpoint is by block_number_minted.DESC.

Again, an order by param on the endpoint would allow an easier debug.

Hi, it looks like there is a problem with the pagination mechanism. There is the right data in the database but when doing pagination I can get random duplicates from one run to another. We will investigate. There is an order, but it seems that it is not on an unique element.

Any updates on this?

we are working on fixing it

Hi, I think the issue I just posted is likely related. I am putting here so both can be solved at the same time.

Yeah looks like same issue, just saw recently your thread, weā€™re working on fixing it :raised_hands:

1 Like

Itā€™s been almost 2 months. Any update on this? Youā€™d think a company focused on data would prioritize such a glaring issue as a broken pagination system. You can imagine how many people are getting duplicates or inaccurate data and have no idea why and how many other projects are at a standstill until this issue is addressed.

we are working on a new backend infrastructure now

did you try to use the cursor?

Yeah, I tried the ā€œcursorā€ that you guys put in after taking away the ā€œsort byā€ option and it didnā€™t work. The system for pulling data via API is fundamentally broken and it doesnā€™t seem to be much of a priority for you guys. As a data service provider, I would think this would gain much more attention. Virtually every project being built utilizing your API service will not work correctly or give false or missing data. Are you aware of the how this will impact new devs jumping into your ecosystem? Thereā€™s no disclaimer, no email notice, nothing. Plenty of moralis emails and youtube videos on using the javascript calls though. Could this be because your model relies on monetizing devs per call? I mean, an API call would allow people to actually retrieve, store and manipulate data on their servers so I could see how this is directly opposed to your pay. Sorry, I guess Iā€™m just really at a loss for how such a gaping flaw in your system just sits in place, frustrating people that donā€™t spend the 3 weeks (as I did) writing tests (remotely) to pinpoint how your system is failing.

Please forward this to your highest dev or manager. I feel like your responses are just stopgap replies. ā€¦did you try the cursor? Seriously?

Is this only an issue with this endpoint or contract? I havenā€™t noticed any duplicates in my queries but Iā€™m not working with anywhere near the same amount of data.

Start at the top of this thread

I forwarded this problem to the team, with that particular contract with ~100k token ids

I tested with cursor now and it seems to work fine:

import requests
import time

ids = {}
def get_nft_owners(offset, cursor):
    print("offset", offset)
    url = 'https://deep-index.moralis.io/api/v2/nft/0x50f5474724e0ee42d9a4e711ccfb275809fd6d4a?chain=eth&format=decimal'
    if cursor:
      url = url + "&cursor=%s" % cursor

    print("api_url", url)
    headers = {
        "Content-Type": "application/json",
        "X-API-Key": "API_KEY"
    }
    statusResponse = requests.request("GET", url, headers=headers)
    data = statusResponse.json()
    try:
        print("nr results", len(data['result']))
        for x in data['result']:
            ids[int(x['token_id'])] = 1
    except:
        print(repr(data))
        print("exiting")
        raise SystemExit

    cursor = data['cursor']
    print(data['page'], data['total'])
    return cursor


cursor = None
for j in range(0, 211):
    print("nr unique token_ids at offset", j*500, "=>", len(ids))
    cursor = get_nft_owners(j*500, cursor)
    print()
    time.sleep(1.1)


print("nr unique token_ids", len(ids))

There still seems to be an issue with pagination (or general data queries). I wrote a script to call the ā€œ{address}/nft/transfersā€ endpoint, store the results and store the cursor and repeat until my local db has the same ā€œtotalā€ as the moralis endpoint shows. This works (most of the time) but as I try to call the endpoint (hourly) to look for new entries, they donā€™t show in the results

When I initially ran the script it showed a final total of 333050 entries. I stored the cursor and setup the hourly cron. Within a day, the remote total was higher than my local total but the api results are empty using the last given cursor. See below. You can test this here using the cursor and endpoint below:

https://deep-index.moralis.io/api-docs/#/account/getNFTTransfers

The last cursor received:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJvcmRlciI6IkRFU0MiLCJvZmZzZXQiOjMzMzUwMCwibGltaXQiOjUwMCwidG9rZW5fYWRkcmVzcyI6IjB4NTBmNTQ3NDcyNGUwZWU0MmQ5YTRlNzExY2NmYjI3NTgwOWZkNmQ0YSIsInBhZ2UiOjY2Nywid2hlcmUiOnt9LCJrZXkiOiI5MDQ4OTM2LjEzNS4xMzUuMCIsImlhdCI6MTY1MTE2NDQxOX0.oJ4qlZ4vejZPF9gfuT58Z1vjH1jy_jfGhCjm_apn0fg

The Contract Address:
0x50f5474724e0ee42d9a4e711ccfb275809fd6d4a

array:6 [
ā€œtotalā€ => 334981
ā€œpageā€ => 667
ā€œpage_sizeā€ => 500
ā€œcursorā€ => ā€œā€
ā€œresultā€ => []
ā€œblock_existsā€ => true
]

You can see that it says page 667. We know that the max result set is 500 so 667 x 500 = 333500 but the total shown (above) is 334981.

Something appears to be wrong (still) with the moralis api cursor system.

you donā€™t have to store the cursor, you will get a new cursor every time you start the query again without a cursor

So when I get to the end and there are no more responses I can call the endpoint again without a cursor and it will reply with any new results?

Cursor is used to get the next page of results. Omitting it will start your query from the beginning or the first page.

It wonā€™t give you new results compared to your previous queries, you are just calling it again which may or may not have different results or data.

Right. Thatā€™s what I thought. Iā€™m trying to get the newest entries. So, I get all the entries, an hour goes by and new entries are added (on the moralis side) and I want to get those new entries.