4

Paging and sorting large ammount of data ?

asked 2013-08-01 07:24:38 +0800

sitansu gravatar image sitansu
2254 13
http://java91.blogspot.in...

updated 2013-11-06 15:10:04 +0800

I would like some advice on how to tackle paging and sorting through large amounts of data.

At the moment, I retrieve all our user data and stick it in a listbox. This is great for now, but when we have 100 000 users, I don't want to populate a listbox with 100 000 users, nor have a collection on the server with 100 000 users on it.

What would you recommend is a good approach to do this?

Should I keep a ConnectedRecordset and pass that to the Listbox. Does ZK have the capability to manage the data and the connection?

~ OR ~

Would it be better to manually look for page events, and repopulate the listbox with the second set of data, only pulling say 10 rows at a time?

Also, when I want to allow a user to sort by a column, then surely one needs to requery the data from the database.

What is the correct way to challenge this? Please offer advice or links to articles.

Thanks,

delete flag offensive retag edit

2 Answers

Sort by ยป oldest newest most voted
1

answered 2013-11-07 15:27:05 +0800

chillworld gravatar image chillworld flag of Belgium
5367 4 9
https://github.com/chillw...

updated 2013-11-21 06:48:05 +0800

I'm working on a easy to use abstractBigListModel where you can sort and page at the same time. The stuff works great but the problem is that the listbox is populated with empty objects (difference totalsize and size page). This is still bad implementation so I don't post the code yet here. If you are working with mvvm you can user paging and mvvm like this in zk :

< paging pageSize="@load(vm.pageSize)" totalSize="@load
(vm.totalSize)"activePage="@save(vm.activePage)" detailed="true"/>
< listbox id="loggingslb" model="@load(vm.logging)" width="100%" multiple="true" >

and in your vm :

private int pageSize = 25; (with the getters and setters)   
private int activePage = 0;
private String sortField="date";
private Sort.Direction sortDirection = Sort.Direction.DESC;

public long getTotalSize() {
    return fooDAO.count(QueryObject.getLoggingPredicate());
}

public List<UrlEntry> getLogging() {
    return Lists.newArrayList(fooDAO.findAll(QueryObject.getPredicate(), new PageRequest(activePage, pageSize, new Sort(sortDirection,sortField))));
}

the queryObject is a simple class that creates for me a predicate of what people want to search (the variablen are binded in mine vm, but you don't need the predicate if you do a findAll(PageRequest) ):

 private String account="";
 private Date beginDate;
 private Date endDate;
 private String description="";

 public Predicate getPredicate() {
    BooleanBuilder booleanBuilder = new BooleanBuilder();
    if(StringUtils.hasText(account)) {
         booleanBuilder.and(QUrlEntry.urlEntry.account.toUpperCase().like("%"+account.toUpperCase()+"%"));
    }
    if (beginDate!=null) {
        booleanBuilder.and(QUrlEntry.urlEntry.date.after(new DateMidnight(beginDate).toDate()));
    }
    if (endDate!=null) {
        booleanBuilder.and(QUrlEntry.urlEntry.date.before(new DateMidnight(endDate).plusDays(1).toDate()));
    }
    if(StringUtils.hasLength(description)) {
        booleanBuilder.and(QUrlEntry.urlEntry.url.toUpperCase().like("%"+description.toUpperCase()+"%"));
    }
    return booleanBuilder.getValue();
}

I hope this helps you.

Greetz chill.

edit :

I have changed the code a littl ebit for more performance (query of totalsize can take a while and that info is also in the paging, and take note of the feedback):

private int totalSize = 0;
private ListModelList listModelList = new ListModelList();

public ListModelList<UrlEntry> getLogging() {
    logger.debug("getting new page");
    listModelList.clear();
    Page page = urlEntryDAO.findAll(loggingVmQueryObject.getLoggingPredicate(), new PageRequest(activePage, pageSize, new Sort(sortDirection,sortField)));
    totalSize = (int)page.getTotalElements();
    listModelList.addAll(Lists.newArrayList(page.iterator()));
    BindUtils.postNotifyChange(null, null, this, "totalSize");
    return listModelList;
}

public long getTotalSize() {
    return totalSize;
}
link publish delete flag offensive edit

Comments

good idea, only be aware, that even if you are returning a list, it will still create a ListModelList internally. to avoid this small overhead, you could implement your own ListModel. Luckily the selections from a previous "page" are copied over to the new list model, whenever you assign a new page.

cor3000 ( 2013-11-20 11:37:59 +0800 )edit

thx cor3000

chillworld ( 2013-11-20 12:12:49 +0800 )edit
1

answered 2013-11-12 23:55:38 +0800

msrednsi gravatar image msrednsi
106 2

updated 2013-11-20 08:50:41 +0800

cor3000 gravatar image cor3000
6280 2 7

What i'm testing is a list that responds to get(index) by fetching a group of records from the database. I initialize the list with the size of the results (only count), a batch size (virtual page) and the query to perform. Then on get(i) i translate the index to a virtual page and the index inside that page, i fetch the records for that virtual page: a sql fetch with a offset and limit clauses; and i return the item at index i. The zk model knows nothing about the list, it is only a list of items; its clear that you have to use pagination on zk and the list model HAS NOT TO COPY the original list, but reuse it, then you can paginate big, big, big, big data from db; zk has to do nothing.

For example, i set ZK' page to 25 items, and the list fetches 200 items at a time (internal pages or batch) from database, by default it also stores 5 pages=1000 items max. in memory. So every 4 zk pages a fetch to db. is launched.

May happen that somebody delete some records, then a get(i) return null, be prepared!.

Note: I work with an ORM so: - i do a count distinct(pk) - i fetch first the pks - then the objects

It's important to do distinct for my case.

Here is my source i hope you can use it in some way:

package com.smartse.cayenne;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.ListIterator;
import java.util.Map;
import java.util.Map.Entry;
import java.util.NoSuchElementException;

import org.apache.cayenne.CayenneException;
import org.apache.cayenne.CayenneRuntimeException;
import org.apache.cayenne.access.DataContext;
import org.apache.cayenne.access.QueryLogger;
import org.apache.cayenne.access.ResultIterator;
import org.apache.cayenne.exp.ExpressionFactory;
import org.apache.cayenne.map.DbAttribute;
import org.apache.cayenne.map.ObjEntity;
import org.apache.cayenne.query.QueryMetadata;
import org.apache.cayenne.query.SelectQuery;
import org.apache.cayenne.util.Util;

/**
 * <p>A paginated list that holds pages of objects in memory and retrieves those not loaded on
 * demand. The loaded pages are cached in a linked hash map so it will hold a certain
 * number of pages, the minimum is 1 page, the max can be -1 so it becomes unlimited (old are discarded)</p>
 * 
 * <p>Things get complicated when two factors arise:</p>
 * <ul>
 * <li>A distinct clause is mandatory to avoid duplicated rows and so the number objects returned and rows retrieved be equal.</li>
 * <li>But some databases (oracle) does not like doing distinct on blob fields. For this reason, CAY do not use it and them the LIMIT, OFFSET sqls
 * clauses are not used. We can not paginate</li>
 * </ul>
 * 
 * <p>The solution that i will apply is to count distinct( pk_key ), and the retrive the data in two steps: first the pk keys, then the objects themselves included
 * in that pks: adding a in qualifier to the one on the query</p>
 * 
 * <p>Another solution would have been trying to override some db adaptors (posgres, h2, sqlserver) that supports distinct is blobs. We oracle would have been
 * taken out a the override is not easy (to much proctected/package visibility stuff involved)
 * 
 * <p>The list is readonly, also any method that implies load all the query in memory will fail, like indexOf, sublist, etc. 
 * Some of them could be implemented, it will do on demand</p>
 * 
 * <p>Note: Consider the number of pages kept and the index being accessed when iterating this list in parallel</p>  
 * 
 * @author manuel
 *
 * @param <E> the item to hold
 */
public class PaginatedList<E> implements List<E> {

    /**
     * Default number of pages to store
     */
    protected static final int DEFAULT_PAGES= 5;

    protected int pageSize;
    protected int pages;
    protected int size;
    protected Map<Integer, List<E>> elements;
    protected CAYObjectContext oc;
    protected ObjEntity rootEntity;
    protected SelectQuery query;

    protected DbAttribute pk;

    /**
     * Returns the page size, it was read from the query
     * 
     * @return the page size
     */
    public int getPageSize() {
        return pageSize;
    }



    /**
     * 
     * @return the number of pages this paginated list could have
     */
    public int getPages() {
        return pages;
    }

    /**
     * You can change the number of stored pages
     * @param p
     */
    public void setPages(int p){
        this.pages= p;
    }

    /**
     * 
     * @return used o.c.
     */
    public CAYObjectContext getOc() {
        return oc;
    }

    /**
     * 
     * @return the query we want to paginate and will use to fetch real objects
     */
    public SelectQuery getQuery() {
        return query;
    }



    /**
     * Creates a new list using a given ObjectxContext and query. Uses default number of pages. The query
     * is modified and used internally. Forget it.
     * 
     * @param oc ObjectxContext used by the list to fetch missing pages
     * @param query Main query used to retrieve data. Must have "pageSize" property set to
     *            a value greater than zero. This query is retained here and used during operation
     */
    public PaginatedList(CAYObjectContext oc, SelectQuery query) {
        this(oc, query, DEFAULT_PAGES);
    }

    /**
     * Creates a new list using a given ObjectxContext and query. The query is executed to compute its max size. Later only
     * on demand actions will query the database. The query
     * is modified and used internally. Forget it.
     * 
     * @param oc ObjectxContext used by the list to fetch missing pages
     * @param query Main query used to retrieve data. Must have "pageSize" property set to
     *            a value greater than zero. This query is retained here and used during operation
     * @param number of pages to store, older ones will be discarded. Valid from 0, store nothing to
     * -1, retain all pages in memory. A null value will use default value
     */
    public PaginatedList(CAYObjectContext oc, SelectQuery query, Integer pages) {

        QueryMetadata metadata = query.getMetaData(oc.getEntityResolver());
        if (metadata.getPageSize() <= 0) {
            throw new CayenneRuntimeException("Not a paginated query; page size: "
                    + metadata.getPageSize());
        }

        this.rootEntity= metadata.getObjEntity();

        Collection<DbAttribute> pks = rootEntity.getDbEntity().getPrimaryKeys();
        if (pks.size() != 1) {
            throw new IllegalArgumentException(
                    "Expected a single column primary key, instead got "
                            + pks.size()
                            + ". ObjEntity: "
                            + rootEntity.getName());
        }

        this.pk = pks.iterator().next();

        this.oc = oc;
        this.pageSize = metadata.getPageSize();

        this.query = query;
        //this.query.setPageSize(0);//noops must have it, or it will issue a select all columns and it will not do select of ids (pks)
        this.query.setFetchLimit(0);
        this.query.setFetchOffset(0);

        this.pages= pages == null ? DEFAULT_PAGES : pages;

        //compute sizes 
        this.size= oc.countPk(query);

        this.elements = Collections.synchronizedMap(new LinkedHashMap<Integer, List<E>>(this.pages));
    }


    /**
     * Resolves a virtual page
     */
    protected List<E> resolveInterval(int pageIndex) {

        List<Object> pks= this.fetchPks(pageIndex);
        //now fetch the real objects

        SelectQuery query = new SelectQuery( this.rootEntity, ExpressionFactory.inDbExp( this.pk.getName(), pks));

        @SuppressWarnings("unchecked")
        List<E> page= this.getObjectContext().getInternalObjectContext().performQuery(query);

        //store it, free old ones 
        this.elements.put( pageIndex, page );
        if( this.pages >= 0  ){
            //delete up to the desired cached pages, as the number can be changed, remove till we are inside limits
            Iterator<Entry<Integer, List<E>>> it=  this.elements.entrySet().iterator();
            while( it.hasNext() && this.elements.size() > this.pages ){
                it.next();
                it.remove();
            }
        }

        return page;
    }

    /**
     * Fetches the page of pks (only the integer value for example). Query fetchOffset and fetch limit are modified to reach the page
     * @param pageIndex
     * @return
     */
    private List<Object> fetchPks(int pageIndex) {

        List<Object> pks= new ArrayList<Object>( this.getPageSize() );

        int pageStartIndex = pageIndex * pageSize;
        this.getQuery().setFetchOffset(pageStartIndex);
        this.getQuery().setFetchLimit( this.pageSize ); //always page size
        this.getQuery().setDistinct(true);

        try { 
            long t1 = System.currentTimeMillis();
            //this cast may fail, but for now is ok: I could write a plain select query i think
            //doing this we bypass the paginated query interceptor
            //but the query must preserve the pageSize so we get pks not objects: but it will not alter the fetch (SelectTranslator:286)
            ResultIterator it = ((DataContext)this.getObjectContext().getInternalObjectContext()).performIteratedQuery( this.getQuery());
            try {

                while (it.hasNextRow()) {
                    pks.add(it.nextRow());
                }

                QueryLogger.logSelectCount(pks.size(), System
                        .currentTimeMillis()
                        - t1);
            }
            finally {
                it.close();
            }
        }
        catch (CayenneException e) {
            throw new CayenneRuntimeException("Error performing query.", Util
                    .unwindException(e));
        }    
        return pks;
    }



    /**
     * Returns zero-based index of the virtual "page" for a given array element index.
     */
    public int pageIndex(int elementIndex) {
        if (elementIndex < 0 || elementIndex > size()) {
            throw new IndexOutOfBoundsException("Index: " + elementIndex);
        }

        if (pageSize <= 0 || elementIndex < 0) {
            return -1;
        }

        return elementIndex / pageSize;
    }

    /**
     * Returns the object context.
     * 
     * @return ObjectContext
     */
    public CAYObjectContext getObjectContext() {
        return this.oc;
    }

    @Override
    public ListIterator<E> listIterator() {
        throw new UnsupportedOperationException();
    }

    @Override
    public ListIterator<E> listIterator(int index) {
        throw new UnsupportedOperationException();
    }

    /**
     * Return an iterator for this list. DataObjects are resolved a page (according to
     * getPageSize()) at a time as necessary - when retrieved with next().
     */
    @Override
    public Iterator<E> iterator() {
        // by virtue of get(index)'s implementation, resolution of ids into
        // objects will occur on pageSize boundaries as necessary.
        return new Iterator<E>() {

            int listIndex = 0;

            public boolean hasNext() {
                return (listIndex < elements.size());
            }

            public E next() {
                if (listIndex >= elements.size())
                    throw new NoSuchElementException("no more elements");

                return get(listIndex++);
            }

            public void remove() {
                throw new UnsupportedOperationException("remove not supported.");
            }
        };
    }

    /**
     * Sorry readonly
     */
    @Override
    public void add(int index, Object element) {

        throw new UnsupportedOperationException();
    }

    /**
     * Sorry readonly
     */
    @Override
    public boolean add(Object o) {
        throw new UnsupportedOperationException();
    }

    /**
     * Sorry readonly
     */
    public boolean addAll(Collection<? extends E> c) {
        throw new UnsupportedOperationException();
    }

    /**
     * Sorry readonly
     */
    public boolean addAll(int index, Collection<? extends E> c) {
        throw new UnsupportedOperationException();
    }

    /**
     * Sorry readonly
     */
    public void clear() {
        throw new UnsupportedOperationException();
    }

    /**
     * Not supported, used other means this will break the usefulness of this object
     */
    @Override
    public boolean contains(Object o) {
        throw new UnsupportedOperationException();
    }

    /**
     * Not supported, used other means this will break the usefullnes of this object
     */
    @Override
    public boolean containsAll(Collection<?> c) {
        throw new UnsupportedOperationException();
    }

    /**
     * Main purpose is to return the requested object by loading the virtual page if not already loaded. 
     * Note that if the size of pages is greater that pages, the first one will be discarded
     * @param index
     * @return
     */
    @Override
    public E get(int index) {
        synchronized (elements) {

            int pageIndex= this.pageIndex(index);
            int pageStartIndex = pageIndex * pageSize;
            int virtualIndex= index- pageStartIndex;

            if( this.elements.containsKey(pageIndex)){
                //return the object
                List<E> page= this.elements.get(pageIndex);

                return this.get(virtualIndex, page);

            }
            //load the page from db
            List<E> page= resolveInterval(pageIndex);
            //return value
            return this.get(virtualIndex, page);
        }
    }

    /**
     * <p>Gets the item by virtual index on the page</p>
     * 
     * <p>Important: if the virtual index is outside page bound we
     * silently return null instead of failing. Objects using this must
     * take this into account because, paginating objects may disappear from
     * d.b. an get less objects that we expect</p> 
     * 
     * @param virtualIndex relative to the page
     * @param page
     * @return object at virtual index
     */
    private E get(int virtualIndex, List<E> page ){

        if(virtualIndex < 0 || virtualIndex > page.size() -1  ){
            return null;
        }

        return page.get( virtualIndex );
    }

    /**
     * Not supported
     */
    @Override
    public int indexOf(Object o) {
        throw new UnsupportedOperationException();
    }

    /**
     * Caution! We will launch a first fetch to answer this
     */
    public boolean isEmpty() {

        return this.size() == 0;
    }

    @Override
    public int lastIndexOf(Object o) {
        throw new UnsupportedOperationException();
    }

    @Override
    public E remove(int index) {
       throw new UnsupportedOperationException();
    }

    @Override
    public boolean remove(Object o) {
        throw new UnsupportedOperationException();
    }

    @Override
    public boolean removeAll(Collection<?> c) {
        throw new UnsupportedOperationException();
    }

    @Override
    public boolean retainAll(Collection<?> c) {
        throw new UnsupportedOperationException();
    }

    @Override
    public E set(int index, Object element) {
        throw new UnsupportedOperationException();
    }

    /**
     * @see java.util.Collection#size()
     */
    @Override
    public int size() {
        return this.size;
    }

    /**
     * (non-Javadoc)
     * @see java.util.List#subList(int, int)
     */
    @Override
    public List<E> subList(int fromIndex, int toIndex) { 

        //return a standard list
        List<E> l= new ArrayList<E>( toIndex- fromIndex );
        for( int i= fromIndex; i< toIndex; i++){ //toIndex is exclusive
            l.add( this.get(i) );
        }

        return l;

    }

    public Object[] toArray() {
        throw new UnsupportedOperationException();
    }

    public <T> T[] toArray(T[] a) {
        throw new UnsupportedOperationException();
    }

}
link publish delete flag offensive edit
Your answer
Please start posting your answer anonymously - your answer will be saved within the current session and published after you log in or create a new account. Please try to give a substantial answer, for discussions, please use comments and please do remember to vote (after you log in)!

[hide preview]

Question tools

Follow
1 follower

RSS

Stats

Asked: 2013-08-01 07:24:38 +0800

Seen: 100 times

Last updated: Nov 21 '13

Support Options
  • Email Support
  • Training
  • Consulting
  • Outsourcing
Learn More