micone.main package

Submodules

micone.main.lineage module

Module that implements the Lineage class and methods to work with taxonomy data

micone.main.lineage.BaseLineage

alias of Lineage

class micone.main.lineage.Lineage(Kingdom: str = '', Phylum: str = '', Class: str = '', Order: str = '', Family: str = '', Genus: str = '', Species: str = '')[source]

Bases: Lineage

NamedTuple that stores the lineage of a taxon and methods to interact with it

Kingdom
Type:

str

Phylum
Type:

str

Class
Type:

str

Order
Type:

str

Family
Type:

str

Genus
Type:

str

Species
Type:

str

classmethod from_str(lineage_str: str, style: str = 'gg') Lineage[source]

Create Lineage instance from a lineage string

Parameters:
  • lineage_str (str) – Lineage in the form of a string

  • style ({'gg', 'silva'}, optional) – The style of the lineage string Default is ‘gg’

Returns:

Instance of the Lineage class

Return type:

Lineage

classmethod from_taxid(taxid: int) Lineage[source]

Create Lineage instance from taxid

Parameters:

taxid (int) – A valid NCBI taxonomy id

Returns:

Instance of the Lineage class

Return type:

“Lineage”

get_superset(level: str) Lineage[source]

Return a superset of the current lineage for the requested level

Parameters:

level (str) – The lowest Lineage field to be used to calculate the superset

Returns:

Lineage instance that is a superset of current instance

Return type:

Lineage

property name: Tuple[str, str]

Get the lowest populated level and name of the taxon

Returns:

Tuple containing (level, name)

Return type:

Tuple[str, str]

property taxid: Tuple[str, int]

Get the NCBI taxonomy id of the Lineage

Returns:

A tuple containing (taxonomy level, NCBI taxonomy id)

Return type:

Tuple[str, int]

to_dict(level: str) Dict[str, str][source]

Get the lineage in the form of a dictionary

Parameters:

level (str) – The lowest Lineage field to be used to populate the dictionary

to_str(style: str, level: str) str[source]

Return the string Lineage of the instance in requested ‘style’

Parameters:
  • style ({'gg', 'silva'}) – The style of the lineage string

  • level (str) – The lowest Lineage field that is to be populated

Return type:

str

micone.main.network module

Module that defines the Network object and methods to read, write and manipulate it

class micone.main.network.Network(nodes: List[str], links: List[Tuple[str, str, Dict[str, float]]], metadata: dict, cmetadata: dict, obs_metadata: DataFrame, children_map: Optional[dict] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False)[source]

Bases: object

Class that represents a network object

Parameters:
  • nodes (List[str]) – The list of nodes in the network

  • links (List[LinkDType]) – The list of links in the network Each link is a dict and must contain: ‘source’, ‘target’, ‘weight’, ‘pvalue’ as keys

  • metadata (dict) – The metadata for the whole network (general and experiment) Must contain ‘host’, ‘condition’, ‘location’, ‘experimental_metadata’, ‘pubmed_id’, ‘description’, ‘date’, ‘authors

  • cmetadata (dict) – The computational metadata for the whole network Must contain information as to how the network was generated

  • obs_metadata (pd.DataFrame) – The DataFrame containing taxonomy information for the nodes of the network If this contains an ‘Abundance’ column then it is incorporated into the network

  • children_map (dict, optional) – The dictionary that contains the mapping {obs_id => [children]}

  • interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation

  • interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3

  • pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05

  • pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction Use Network.pcorr_methods to get the list of supported methods

  • directed (bool, optional) – True if network is directed Default value is False

graph

The networkx graph representation of the network

Type:

Union[nx.Graph, nx.DiGraph]

Examples

>>> network = Network.load_data()
filter(pvalue_filter: bool, interaction_filter: bool) Network[source]

Filter network using pvalue and interaction thresholds

Parameters:
  • pvalue_filter (bool) – If True will use pvalue_threshold for filtering

  • interaction_filter (bool) – If True will use interaction_threshold for filtering

Returns:

The filtered Network object

Return type:

“Network”

get_adjacency_table(key: str) DataFrame[source]

Returns the adjacency table representation for the requested key This method does not support Graph

Parameters:

key (str) – The edge property to be used to construct the table

Returns:

The adjacency table

Return type:

pd.DataFrame

json(pvalue_filter: bool = False, interaction_filter: bool = False) str[source]

Returns the network as a JSON string

Parameters:
  • pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False

  • interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False

Returns:

The JSON string representation of the network

Return type:

str

The list of links in the network and their corresponding properties

classmethod load_data(interaction_file: str, meta_file: str, cmeta_file: str, obsmeta_file: str, pvalue_file: Optional[str] = None, children_file: Optional[str] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False) Network[source]

Create a Network object from files (interaction tables and other metadata)

Parameters:
  • interaction_file (str) – The tsv file containing the matrix of interactions

  • meta_file (str) – The json file containing the metadata for the whole network (general and experiment)

  • cmeta_file (str) – The json file containings the computational metadata for the whole network

  • obsmeta_file (str) – The csv file containing taxonomy information for the nodes of the network

  • pvalue_file (str, optional) – The tsv file containing the matrix of pvalues Default is None

  • children_file (str, optional) – The json file containing the mapping between observations and their children

  • interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation

  • interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3

  • pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05

  • pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction Use Network.pcorr_methods to get the list of supported methods

  • directed (bool) – True if network is directed Default value is False

Returns:

The instance of the Network class

Return type:

Network

classmethod load_elist(elist_file: str, meta_file: str, cmeta_file: str, obsmeta_file: str, children_file: Optional[str] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False) Network[source]

Create Network instance from an edge list and associated metadata

Parameters:
  • elist_file (str) – The csv file containing the list of edges and their associated metadata

  • meta_file (dict) – The file containing metadata for the whole network (general and experiment) Must contain ‘host’, ‘condition’, ‘location’, ‘experimental_metadata’, ‘pubmed_id’, ‘description’, ‘date’, ‘authors’

  • cmeta_file (dict) – The computational metadata for the whole network Must contain information as to how the network was generated

  • obsmeta_file (str) – The csv file contanining taxonomy information for the nodes of the network If this contains an ‘Abundance’ column then it is incorporated into the network

  • children_file (str, optional) – The json file that describes the mapping between {obs_id => [children]}

  • interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation

  • interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3

  • pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05

  • pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction

  • directed (bool, optional) – True if network is directed Default value is False

Returns:

The instance of the Network class

Return type:

Network

classmethod load_graph(graph: Union[Graph, DiGraph]) Network[source]

Load Network object from a networkx graph

Parameters:

graph (Union[nx.Graph, nx.DiGraph]) – The networkx graph of the network

Returns:

The instance of the Network class

Return type:

Network

classmethod load_json(fpath: Optional[str] = None, raw_data: Optional[dict] = None) Network[source]

Create a Network object from a network JSON file Either fpath or raw_data must be specified

Parameters:
  • fpath (str, optional) – The path to the network JSON file

  • raw_data (dict, optional) – The raw data stored in the network JSON file

Returns:

The instance of the Network class

Return type:

Network

property metadata: Dict[str, Any]

The metadata for the network

property nodes: List[Dict[str, Any]]

The list of nodes in the network and their corresponding properties

property pcorr_methods: List[str]

Returns list supported pvalue correction methods

write(fpath: str, pvalue_filter: bool = False, interaction_filter: bool = False) None[source]

Write network to file as JSON

Parameters:
  • fpath (str) – The path to the JSON file

  • pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False

  • interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False

micone.main.network_group module

Module that defines the NetworkGroup object and methods to read, write and manipulate it

class micone.main.network_group.NetworkGroup(networks: List[Network], id_field: str = 'taxid')[source]

Bases: Collection

Class that represents a group of network objects These network objects are intended to be visualized together

Parameters:
  • networks (List[Network]) – The collection of networks to be grouped key = context-id, value = Network

  • id_field (str) – The field to use while combining nodes Default value is “taxid”

graph

The networkx multi-graph representation of the network

Type:

Union[nx.MultiGraph, nx.MultiDiGraph]

combine_pvalues(cids: List[int]) NetworkGroup[source]

Combine pvalues of links in the cids using Brown’s p-value merging method

cidsList[int]

The list of context ids that are to be used in the merger

Returns:

The NetworkGroup that contains the merged pvalues

Return type:

merged_network

property contexts: List[Dict[str, Any]]

The contexts for the group of networks

filter(pvalue_filter: bool, interaction_filter: bool) NetworkGroup[source]

Filter network using pvalue and interaction thresholds

Parameters:
  • pvalue_filter (bool) – If True will use pvalue_threshold for filtering

  • interaction_filter (bool) – If True will use interaction_threshold for filtering

Returns:

The filtered NetworkGroup object

Return type:

“NetworkGroup”

get_adjacency_vectors(key: str) DataFrame[source]

Returns the adjacency matrix for each context as a pd.DataFrame

Parameters:

key (str) – The edge property to be used to contruct the vectors

Returns:

The DataFrame containing adjacency vectors as columns

Return type:

pd.DataFrame

get_consensus_network(cids: Optional[List[int]] = None, method: str = 'simple_voting', parameter: float = 0.0) NetworkGroup[source]

Get consensus network for the network defined by the cids

cidsOptional[List[int]]

The list of context ids that are to be used in the merger Default is None

methodstr, {“simple_voting”, “scaled_sum”}

Default value is simple_voting

parameterfloat

Default value is 0.0 (which is the union of all the links)

Returns:

The NetworkGroup that represents the consensus network

Return type:

consensus_network

json(pvalue_filter: bool = False, interaction_filter: bool = False) str[source]

Returns the network as a JSON string

Parameters:
  • pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False

  • interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False

Returns:

The JSON string representation of the network

Return type:

str

The list of links in the NetworkGroup and their corresponding properties

classmethod load_json(fpath: Optional[str] = None, raw_data: Optional[dict] = None, id_field: str = 'taxid') NetworkGroup[source]

Create a NetworkGroup object from network JSON file Either fpath or raw_data must be specified

Parameters:
  • fpath (str, optional) – The path to the network JSON file

  • raw_data (dict, optional) – The raw data stored in the network JSON file

Returns:

The instance of the NetworkGroup class

Return type:

NetworkGroup

property nodes: List[Dict[str, Any]]

The list of nodes in the NetworkGroup and their corresponding properties

to_network(method: str = 'mean') Network[source]
update_thresholds(interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05) None[source]

Update the thresholds on the networks

Parameters:
  • interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3

  • pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05

write(fpath: str, pvalue_filter: bool = False, interaction_filter: bool = False, split_files: bool = False) None[source]

Write network to file as JSON

Parameters:
  • fpath (str) – The path to the JSON file

  • pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False

  • interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False

  • split_files (bool) – If True will write networks into separate files Default value is False

micone.main.otu module

Module that defines the Otu objects and methods to manipulate it

class micone.main.otu.Otu(otu_data: Table, sample_metadata: Optional[DataFrame] = None, obs_metadata: Optional[DataFrame] = None)[source]

Bases: object

An object that represents the OTU counts table

Parameters:
  • otu_data (Table) – biom.Table object containing OTU data

  • sample_metadata (pd.DataFrame, optional) – pd.DataFrame containing metadata for the samples

  • obs_metadata (pd.DataFrame, optional) – pd.DataFrame containing metadata for the observations (OTUs)

otu_data

OTU counts table in the biom.Table format

Type:

biom.Table

Notes

All methods that manipulate the Otu object return new objects

collapse_taxa(level: str) Tuple[Otu, Dict[str, List[str]]][source]

Collapse Otu instance based on taxa

Parameters:

level (str) – The tax level of the collapsed table This will also be used as the prefix for the unique ids

Returns:

Collapsed Otu instance

Return type:

Tuple[Otu, dict]

filter(ids: Optional[Iterable[str]] = None, func: Optional[Callable[[ndarray, str, dict], bool]] = None, axis: str = 'observation') Otu[source]

Filter Otu instance based on ids or func

Parameters:
  • ids (Iterable[str], optional) – An iterable of ids to keep. If ids are not supplied then func must be supplied

  • func (Callable[[np.ndarray, str, dict], bool], optional) – A function that takes in (values, id_ind, md) and returns a bool If func is not supplied then ids must be supplied If both ids and func are supplied then ids are used

  • axis ({'sample', 'observation'}, optional) – The axis along which to filter the Otu instance Default value is ‘observation’

Returns:

Filtered Otu instance

Return type:

Otu

is_norm(axis: str = 'sample') bool[source]

Returns true if the Otu instance has been normalized

classmethod load_data(otu_file: str, meta_file: Optional[str] = None, tax_file: Optional[str] = None, dtype: str = 'biom', ext: Optional[str] = None) Otu[source]

Load data from files into the Otu class instance

Parameters:
  • otu_file (str) – The path to the OTU counts file

  • meta_file (str, optional) – The path to the sample metadata file

  • tax_file (str, optional) – The path to the taxonomy file

  • dtype ({'biom', 'tsv'}) – The type of OTU file that is input

  • ext (str, optional) – The extension of the file if other than supported extensions Supported extensions: - ‘tsv’ dtype: ‘tsv’, ‘txt’, ‘counts’ - ‘biom’ dtype: ‘biom’, ‘hdf5’

Returns:

An instance of the Otu class

Return type:

Otu

normalize(axis: str = 'sample', method: str = 'norm') Otu[source]

Normalize the OTU table along the provided axis

Parameters:
  • axis ({'sample', 'observation'}, optional) – Axis along which to normalize the OTU table Default is ‘sample’

  • method ({'norm', 'rarefy', 'css'}) – Normalization method to use

Returns:

Otu instance which is normalized along the given axis

Return type:

Otu

property obs_metadata: DataFrame

Lineage data for the observations (OTUs)

partition(axis: str, func: Callable[[str, dict], Hashable]) Iterable[Tuple[str, Otu]][source]

Partition the Otu instance based on the func and axis

Parameters:
  • axis (str) – The axis on which to partition

  • func (Callable[[str, dict], Hashable]) – The function that takes in (id, metadata) and returns a hashable

Returns:

An iterable of tuples - (‘label’, Otu)

Return type:

Iterable[Tuple[str, Otu]]

Notes

  1. To group by lineage “level” use:

    func = lambda id_ind, md: Lineage(**md).get_superset(level)

rm_sparse_obs(prevalence_thres: float = 0.05, abundance_thres: float = 0.01, obssum_thres: int = 100) Otu[source]

Remove observations with prevalence < prevalence_thres and abundance < abundance_thres

Parameters:
  • prevalence_thres (float) – Minimum fraction of samples the observation must be present in in order to be accepted

  • abundance_thres (float) – Minimum observation count fraction in a sample needed in order to be accepted

  • obssum_thres (int) – The theshold applied to the sum of observations for each row

Returns:

Otu instance with bad observations removed

Return type:

Otu

rm_sparse_samples(count_thres: int = 500) Otu[source]

Remove samples with read counts less than count_thres

Parameters:

count_thres (int, optional) – Counts threshold below which samples are rejected Default value is 500

Returns:

Otu instance with low count samples removed

Return type:

Otu

Raises:

ValueError – If Otu instance is normalized

property sample_metadata: DataFrame

Metadata for the samples

Return type:

pd.DataFrame

property tax_level: str

Returns the taxonomy level of the Otu instance

Returns:

The lowest taxonomy defined in the Otu instance

Return type:

str

write(base_name: str, fol_path: str = '', file_type: str = 'biom') None[source]

Write Otu instance object to required file_type

Parameters:
  • base_name (str) – The base name without extension to be used for the files

  • fol_path (str, optional) – The folder where the files are to be written Default is current directory

  • file_type ({'tsv', 'biom'}, optional) – The type of file data is to be written to Default is ‘biom’

Module contents