Search code examples
coldfusioncfloopcoldfusion-2016

Using ColdFusion (2016) to loop over spreadsheet data and group the output


I apologize in advance for the length of my question! I am trying to automate building a schedule of papers for our event. Paper and author data is provided in a spreadsheet (which my poor colleague currently uses to manually cut and paste, line by line, into a Word doc). This spreadsheet contains all the info I need to build the schedule, in consistently-named columns, but it can be in any order. Sort of like this (but the real paper titles won't be conveniently numbered):

Jack Doe - Co-Author - Penn State University - Aerodynamics - Aerodynamics Paper I    
John Doe - Co-Author - Penn State University - Acoustics - Acoustics Paper I
John Smith - Co-Author - University of VA - Acoustics - Acoustics Paper I
Jane Doe - Main Author - Penn State University - Acoustics - Acoustics Paper I
Bob Smith - Main Author - GA Tech - Acoustics - Acoustics Paper II
Jack Smith - Main Author - University of MD - Acoustics - Acoustics Paper III
Jill Smith - Co-Author - University of MD - Acoustics - Acoustics Paper III
Bob Doe - Main Author - Penn State University - Aerodynamics - Aerodynamics Paper I

My goal is to transmogrify the spreadsheet data such that papers are grouped and ordered by session (i.e. Acoustics, Aerodynamics), then by paper title (i.e. Acoustics Paper I, Acoustics Paper II), then by authors from each university. The catch is that the "main author" for any given paper must be listed first, along with co-authors (if any) from that same school coming next, followed by co-authors from other universities. The other co-authors can be in any order but must also be grouped by university.

So taking the original example, it should come out like this:

ACOUSTICS
Acoustics Paper I
Jane Doe, John Doe, Penn State University; John Smith, University of VA

Acoustics Paper II
Bob Smith, GA Tech

Acoustics Paper III
Jack Smith, Jill Smith, University of MD

AERODYNAMICS
Aerodynamics Paper I
Bob Doe, Jack Doe, Penn State University

I am almost there, but I can only get it to

ACOUSTICS
Acoustics Paper I
Jane Doe, Penn State University; John Doe, Penn State University; John Smith, University of VA;

Acoustics Paper II
Bob Smith, GA Tech;

Acoustics Paper III
Jack Smith, University of MD; Jill Smith, University of MD;

AERODYNAMICS
Aerodynamics Paper I
Bob Doe, Penn State University; Jack Doe, Penn State University;

We are using ACF 2016. What I am doing (my code is below) is reading the spreadsheet into a query object with cfspreadsheet. Then I group the output by session and then by title with nested cfoutputs.

Then, because I could not think of any other way to identify the main author for each paper and put them first, I loop over all authors for that paper and add a flag to identify them, and sort on that with arraySort. Note that I cannot simply sort by author type DESC, because there is another type, "presenting author," which I omitted for brevity (ha). And sometimes the main author can also be the presenting author, so that type would be "main author presenting author."

At any rate, I then loop over the sorted array.

Below is what I have tried so far. I am stuck on getting the university to only show once for each list of authors. I have tried putting another loop in my authorArray loop, but I do not know what to index or loop on, so it just ends up outputting the university name after every author name. I have tried using multidimensional arrays, and even using query of query to try and build a nice, ordered data structure. But I am evidently doing it wrong, because I keep ending up getting stumped by grouping the authors by their university.

I sure would appreciate any tips or hints! Please note that I cannot change the requirement to initially work with this spreadsheet. However, once I get it, I can do anything with the info that I need to get the desired output. So I am entirely open to making any changes or rethinking my entire approach. My code below is the closest I have gotten.

Thank you all very much in advance! Here is what I am using so far:

<cfoutput query="queryPapers" group="PrimarySession">
    #PrimarySession#
    <cfoutput group="Title">
       <p>#Title#</p>
        <cfset authorArray = arrayNew(1)>
        <cfoutput>
            <cfset authorStruct = structNew()>
            <cfset authorStruct.firstName = AuthorFirstName>
            <cfset authorStruct.lastName = AuthorLastName>
            <cfset authorStruct.institution = AuthorInstitution>
            <cfset authorStruct.authorType = AuthorType>

            <cfif findNoCase("Main", AuthorType)>
                <cfset authorStruct.authorMain = "A">
            <cfelse>
                <cfset authorStruct.authorMain = "B">
            </cfif>

            <cfset arrayAppend(authorArray, authorStruct)>

            <cfscript>
                arraySort(
                    authorArray,
                    function (e1, e2) {
                        return compare(e1.authorMain, e2.authorMain);
                    }
                );
            </cfscript>
        </cfoutput>

        <cfloop index="i" from="1" to="#arrayLen(authorArray)#">
            #authorArray[i].firstName# #authorArray[i].lastName#,
            #authorArray[i].institution#;&nbsp;
        </cfloop>
    </cfoutput>
</cfoutput>

Here is some actual output of the above code:

Dynamic Stall Investigations
  Sergey Smith,* University of Maryland;  Tobias Lersdorf, German University;  Pascal Marceau, University of Maryland;  

And I am trying to get to

Dynamic Stall Investigations
  Sergey Smith,* Pascal Marceau, University of Maryland;  Tobias Lersdorf, German University

Thanks very much for reading!


Solution

  • You're on the right track with your code but I think you're overcomplicating it a bit. You can simplify your nested <cfoutput> processing with the snippet below.

    <!--- Nested output loop for displaying required result --->
    <cfoutput query="queryPapers" group="PrimarySession">
        <strong>#Ucase(PrimarySession)#</strong><br />
        <cfoutput group="Title">
            <i>#Title#</i><br />
            <cfoutput group="AuthorInstitution">
                <cfoutput>
                    #AuthorFirstName# #AuthorLastName#,
                </cfoutput>
                #AuthorInstitution#; <!--- display institution once per group --->
            </cfoutput>
            <br /><br /> <!--- double-space after each title group --->
        </cfoutput>
    </cfoutput>
    
    1. The issue you had with displaying the university only once can be accomplished by adding another level of group nesting and displaying it in the footer of the extra nested group.

    2. The issue you have with making sure the Main Author is always first should be handled in your preprocessing. To do that, use your existing if/else logic ("A" for Main Author, "B" otherwise) and add it as an extra column to your query. This way you can resort by using it in your order by clause prior to your output loop.

    UPDATE

    So I realized after posting my first revision that there is a minor logic flaw. The reason it didn't surface is because when using the sample data above, the Main Author conveniently always belonged to University that was alphabetically first in the "Title" group. I realized this after seeing the additional sample output, I added these rows to my code sample and it also incorrectly displays as below.

    Dynamic Stall Investigations
    Sergey Smith, University of Maryland; Tobias Lersdorf, German University;  Pascal Marceau, University of Maryland;
    

    The solution is to use the existing authorMain column (A for main authors, B otherwise) and add another value A2 for non-main authors belonging to the same university as the main author. The tricky part is you have to inspect the value in another row to determine when to set A2. The best solution I could think of is to add the two blocks of code right after adding the initially populated authorMain column.

    <!--- Sort query so "Main Author" is first within PrimarySession and Title --->
    <cfquery name="queryPapers" dbtype="query">
        select * from queryPapers
        order by
            PrimarySession, 
            Title, 
            AuthorMain
    </cfquery>
    
    <!--- Loop through above and update NON "Main-Author" rows "A2" if they have same University as "A" rows  --->
    <cfset MainInstitution = "">
    <cfloop query="queryPapers">
        <cfif queryPapers.authorMain eq "A">
            <cfset MainInstitution = queryPapers.AuthorInstitution>
        <cfelse>
            <cfif MainInstitution eq queryPapers.AuthorInstitution>
                 <cfset QuerySetCell(queryPapers, "authorMain", "A2", queryPapers.currentRow)>
            </cfif>
        </cfif>
    </cfloop>
    

    First sort by PrimarySession, Title and AuthorMain and then loop through the recordset and update rows with A2 if the non-main author is in the same university as the main author by keeping track using the MainInstitution variable. This solution generates the proper result while allowing all other code to remain untouched.

    You can see the difference between the first revision and the second revision of my code which simulates the OP's scenario.