Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to go from json --> hp compatible df? #37

Open
mathidachuk opened this issue Jul 15, 2020 · 8 comments
Open

Add function to go from json --> hp compatible df? #37

mathidachuk opened this issue Jul 15, 2020 · 8 comments
Labels
question Further information is requested

Comments

@mathidachuk
Copy link
Member

mathidachuk commented Jul 15, 2020

So based on the org data I built the following function that converts the json data into a hierplane-compatible dataset. I think it maybe useful if say...someone wants to modify json tree data as a dataframe then convert it back to json + hierplane.

Let me know what you think about including it in the package.

p.s. tree.json = the org data

library(jsonlite)
library(dplyr)
library(hierplane)

source_json <- "tree.json" %>%
  read_json()


parse_json_node <- function(x, head_word) {

  children <- data.frame()

  n_children <- length(x$children)

  if (n_children > 0) {
    for (child in 1:n_children) {
      children <- bind_rows(children,
                            parse_json_node(x$children[[child]],
                                       head_word = x$word))
    }
  }

  out <- data.frame(
    parent_id = head_word,
    child_id = x$word,
    child = x$word,
    link = x$link,
    node_type = x$nodeType
  )

  if (!is.null(x$attributes)) {
    out$attribute1 <- x$attributes
  }

  out <- bind_rows(children, out)

  out

}

parse_json_tree <- function(x) {

  root_word <- x$root$word

  root_df <- data.frame(
    parent_id = root_word,
    child_id = root_word,
    child = root_word,
    node_type = "ROOT",
    link = "ROOT",
    attribute2 = NA # maybe attributes should be defaulted to two columns....
  )

  if (!is.null(x$root$attributes)) {
    root_df$attribute1 <- x$root$attributes
  }

  children_df <- lapply(x$root$children, parse_json_node, head_word = root_word) %>%
    bind_rows()

  styles <- list(
    node_type_to_style = x$nodeTypeToStyle,
    link_to_positions = x$linkToPosition,
    link_name_to_label = x$linkNameToLabel
  )

  names(styles$node_type_to_style)[
    grep("root", names(styles$node_type_to_style), ignore.case = T)
    ] <- "ROOT"

  list(df = bind_rows(root_df, children_df),
       styles = styles,
       title = x$text)

}

parsed_data <- parse_json_tree(source_json)
hierplane(hp_dataframe(.data = parsed_data$df,
                       title = parsed_data$title,
                       styles = parsed_data$styles))

@mathidachuk mathidachuk added the question Further information is requested label Jul 15, 2020
@tylerlittlefield
Copy link
Member

I do think a function for translating multiple types of files to hierplane ready data is important. Please take a look at data.tree and let me know what you think. It's a very popular package for this type of data and we can take advantage. For example, being able to parse multiple file formats:

csv -> data.frame in table format (?read.csv) -> data.tree (?as.Node.data.frame)
Newick -> ape phylo (?ape::read.tree) -> data.tree (?as.Node.phylo )
csv -> data.frame in network format (?read.csv) -> data.tree (c.f. ?FromDataFrameNetwork)
yaml -> list of lists (?yaml::yaml.load) -> data.tree (?as.Node.list)
json -> list of lists (e.g. ?jsonlite::fromJSON) -> data.tree (?as.Node.list)

If we have an hp_datatree function, we can automatically take advantage of csv/newick/yaml/json all at once!

@tylerlittlefield
Copy link
Member

Regarding this comment:

attribute2 = NA # maybe attributes should be defaulted to two columns....

I was also thinking about this... attributes feels like it should just be a single parameter and the user passes n column names to it.

@mathidachuk
Copy link
Member Author

I played with data.tree a little bit when I was building the build_tree and build_node functions, but it was not doing what I needed it to do at all unfortunately. Maybe you will have better luck getting it to work for our purposes.

I thought as.Node.data.frame would work for converting spacyr to a list that we need for example, but it didn't work for me. I was probably doing something wrong.

@tylerlittlefield
Copy link
Member

Well we will still need build_tree/build_node since hp_dataframe calls those, and that's fine IMO. And spacyr just works, so I don't think we need to touch that. I am saying that we might need to consider removing hp_dataframe as an export and make it an internal function. The data.tree package can just be something we make sure hierplane is compatible with, namely that a user can pass a data.tree object to hierplane and it's able to render.

@mathidachuk
Copy link
Member Author

Ok gotcha. I think it's a great idea to make sure we have a data.tree compatible function. I have some reservations about removing hp_dataframe tho. See #39

@mathidachuk
Copy link
Member Author

Also reminder that hp_spacyr also relies on the build_ functions. The build_ functions allows use to go from df --> hierarchical list structure.

@tylerlittlefield
Copy link
Member

I 100% agree which is why I don't think build functions should be touched, they just work. It's hp_dataframe that I am on the fence about because of the work required.

@mathidachuk
Copy link
Member Author

Maybe we show users how to construct a dataframe from data.tree and how to add link and node and attribute columns?? That can be an option. And then they can just use hp_dataframe????

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants