I'm trying to do web scraping with power bi where I'm using the data from the following site:
https://pt.wikipedia.org/wiki/Jogo_do_bicho
After passing the site URL, the data came organized in the following format:
![Screenshot 1][1] [1]: https://i.sstatic.net/HPjE7.png
where the number is an index related to the animal that has its specific thousand, how do I put everything organized in a column with all indices?
I have an example attached:
![Screenshot 2][2] [2]: https://i.sstatic.net/cxWbU.png
I'll try to add detail later but I think this will work:
let
Source = Web.Page(Web.Contents("https://pt.wikipedia.org/wiki/Jogo_do_bicho")){0}[Data],
ToLists = List.Skip(Table.ToColumns(Source),1),
#"Converted to Table" = Table.FromList(ToLists, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1"),
#"Added Custom" = Table.AddColumn(#"Expanded Column1", "Pivot", each if Text.Length([Column1]) = 2 then "Group" else "Animal"),
#"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1),
#"Integer-Divided Column" = Table.TransformColumns(#"Added Index", {{"Index", each Number.IntegerDivide(_, 2), Int64.Type}}),
#"Pivoted Column" = Table.Pivot(#"Integer-Divided Column", List.Distinct(#"Integer-Divided Column"[Pivot]), "Pivot", "Column1"),
#"Split Column by Delimiter" = Table.SplitColumn(#"Pivoted Column", "Animal", Splitter.SplitTextByDelimiter("#(lf)#(cr)", QuoteStyle.Csv), {"Animal", "Values"}),
#"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Animal", Text.Trim, type text}, {"Values", Text.Trim, type text}}),
#"Changed Type" = Table.TransformColumnTypes(#"Trimmed Text",{{"Group", Int64.Type}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Index"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Group", Order.Ascending}})
in
#"Sorted Rows"
Edit: The key here is to convert the table into a list of columns using Table.ToColumns
. This turns it into a list of lists that we can convert into a table and expand into one long column.
Once all of the columns are stacked into one single column, we want to separate the group id from the details, which we can do in this case by checking the length of the text and defining a custom column that labels each row with a different data category.
With that categorization of the rows in place, we want to pivot that new custom column but we want an index column so it knows what stays together. Add an index column and integer divide by two so you get 0,0,1,1,2,2,3,3,...
so that each pair gets its own unique ID. Now we can finally pivot.
Once pivoted, do any cleanup you feel like, e.g., splitting columns, trimming whitespace, changing column types, removing unneeded columns, and sorting.