micone.main package

Submodules

micone.main.lineage module

Module that implements the Lineage class and methods to work with taxonomy data

micone.main.lineage.BaseLineage: alias of Lineage

class micone.main.lineage.Lineage(Kingdom: str = '', Phylum: str = '', Class: str = '', Order: str = '', Family: str = '', Genus: str = '', Species: str = '')[source]

Bases: Lineage

NamedTuple that stores the lineage of a taxon and methods to interact with it

Kingdom

Type:: str

Phylum

Type:: str

Class

Type:: str

Order

Type:: str

Family

Type:: str

Genus

Type:: str

Species

Type:: str

classmethod from_str(lineage_str: str, style: str = 'gg') → Lineage[source]

Create Lineage instance from a lineage string

Parameters:

lineage_str (str) – Lineage in the form of a string
style ({'gg', 'silva'}, optional) – The style of the lineage string Default is ‘gg’

Returns:

Instance of the Lineage class

Return type:

Lineage

classmethod from_taxid(taxid: int) → Lineage[source]

Create Lineage instance from taxid

Parameters:: taxid (int) – A valid NCBI taxonomy id
Returns:: Instance of the Lineage class
Return type:: “Lineage”

get_superset(level: str) → Lineage[source]

Return a superset of the current lineage for the requested level

Parameters:: level (str) – The lowest Lineage field to be used to calculate the superset
Returns:: Lineage instance that is a superset of current instance
Return type:: Lineage

property name: Tuple[str, str]

Get the lowest populated level and name of the taxon

Returns:: Tuple containing (level, name)
Return type:: Tuple[str, str]

property taxid: Tuple[str, int]

Get the NCBI taxonomy id of the Lineage

Returns:: A tuple containing (taxonomy level, NCBI taxonomy id)
Return type:: Tuple[str, int]

to_dict(level: str) → Dict[str, str][source]

Get the lineage in the form of a dictionary

Parameters:: level (str) – The lowest Lineage field to be used to populate the dictionary

to_str(style: str, level: str) → str[source]

Return the string Lineage of the instance in requested ‘style’

Parameters:

style ({'gg', 'silva'}) – The style of the lineage string
level (str) – The lowest Lineage field that is to be populated

Return type:

str

micone.main.network module

Module that defines the Network object and methods to read, write and manipulate it

class micone.main.network.Network(nodes: List[str], links: List[Tuple[str, str, Dict[str, float]]], metadata: dict, cmetadata: dict, obs_metadata: DataFrame, children_map: Optional[dict] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False)[source]

Bases: object

Class that represents a network object

Parameters:

nodes (List[str]) – The list of nodes in the network
links (List[LinkDType]) – The list of links in the network Each link is a dict and must contain: ‘source’, ‘target’, ‘weight’, ‘pvalue’ as keys
metadata (dict) – The metadata for the whole network (general and experiment) Must contain ‘host’, ‘condition’, ‘location’, ‘experimental_metadata’, ‘pubmed_id’, ‘description’, ‘date’, ‘authors
cmetadata (dict) – The computational metadata for the whole network Must contain information as to how the network was generated
obs_metadata (pd.DataFrame) – The DataFrame containing taxonomy information for the nodes of the network If this contains an ‘Abundance’ column then it is incorporated into the network
children_map (dict, optional) – The dictionary that contains the mapping {obs_id => [children]}
interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation
interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3
pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05
pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction Use Network.pcorr_methods to get the list of supported methods
directed (bool, optional) – True if network is directed Default value is False

graph

The networkx graph representation of the network

Type:: Union[nx.Graph, nx.DiGraph]

Examples

>>> network = Network.load_data()

filter(pvalue_filter: bool, interaction_filter: bool) → Network[source]

Filter network using pvalue and interaction thresholds

Parameters:

pvalue_filter (bool) – If True will use pvalue_threshold for filtering
interaction_filter (bool) – If True will use interaction_threshold for filtering

Returns:

The filtered Network object

Return type:

“Network”

get_adjacency_table(key: str) → DataFrame[source]

Returns the adjacency table representation for the requested key This method does not support Graph

Parameters:: key (str) – The edge property to be used to construct the table
Returns:: The adjacency table
Return type:: pd.DataFrame

json(pvalue_filter: bool = False, interaction_filter: bool = False) → str[source]

Returns the network as a JSON string

Parameters:

pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False
interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False

Returns:

The JSON string representation of the network

Return type:

str

property links: List[Dict[str, Any]]: The list of links in the network and their corresponding properties

classmethod load_data(interaction_file: str, meta_file: str, cmeta_file: str, obsmeta_file: str, pvalue_file: Optional[str] = None, children_file: Optional[str] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False) → Network[source]

Create a Network object from files (interaction tables and other metadata)

Parameters:

interaction_file (str) – The tsv file containing the matrix of interactions
meta_file (str) – The json file containing the metadata for the whole network (general and experiment)
cmeta_file (str) – The json file containings the computational metadata for the whole network
obsmeta_file (str) – The csv file containing taxonomy information for the nodes of the network
pvalue_file (str, optional) – The tsv file containing the matrix of pvalues Default is None
children_file (str, optional) – The json file containing the mapping between observations and their children
interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation
interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3
pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05
pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction Use Network.pcorr_methods to get the list of supported methods
directed (bool) – True if network is directed Default value is False

Returns:

The instance of the Network class

Return type:

Network

classmethod load_elist(elist_file: str, meta_file: str, cmeta_file: str, obsmeta_file: str, children_file: Optional[str] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False) → Network[source]

Create Network instance from an edge list and associated metadata

Parameters:

elist_file (str) – The csv file containing the list of edges and their associated metadata
meta_file (dict) – The file containing metadata for the whole network (general and experiment) Must contain ‘host’, ‘condition’, ‘location’, ‘experimental_metadata’, ‘pubmed_id’, ‘description’, ‘date’, ‘authors’
cmeta_file (dict) – The computational metadata for the whole network Must contain information as to how the network was generated
obsmeta_file (str) – The csv file contanining taxonomy information for the nodes of the network If this contains an ‘Abundance’ column then it is incorporated into the network
children_file (str, optional) – The json file that describes the mapping between {obs_id => [children]}
interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation
interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3
pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05
pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction
directed (bool, optional) – True if network is directed Default value is False

Returns:

The instance of the Network class

Return type:

Network

classmethod load_graph(graph: Union[Graph, DiGraph]) → Network[source]

Load Network object from a networkx graph

Parameters:: graph (Union[nx.Graph, nx.DiGraph]) – The networkx graph of the network
Returns:: The instance of the Network class
Return type:: Network

classmethod load_json(fpath: Optional[str] = None, raw_data: Optional[dict] = None) → Network[source]

Create a Network object from a network JSON file Either fpath or raw_data must be specified

Parameters:

fpath (str, optional) – The path to the network JSON file
raw_data (dict, optional) – The raw data stored in the network JSON file

Returns:

The instance of the Network class

Return type:

Network

property metadata: Dict[str, Any]: The metadata for the network

property nodes: List[Dict[str, Any]]: The list of nodes in the network and their corresponding properties

property pcorr_methods: List[str]: Returns list supported pvalue correction methods

write(fpath: str, pvalue_filter: bool = False, interaction_filter: bool = False) → None[source]

Write network to file as JSON

Parameters:

fpath (str) – The path to the JSON file
pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False
interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False

micone.main.network_group module

Module that defines the NetworkGroup object and methods to read, write and manipulate it

class micone.main.network_group.NetworkGroup(networks: List[Network], id_field: str = 'taxid')[source]

Bases: Collection

Class that represents a group of network objects These network objects are intended to be visualized together

Parameters:

networks (List[Network]) – The collection of networks to be grouped key = context-id, value = Network
id_field (str) – The field to use while combining nodes Default value is “taxid”

graph

The networkx multi-graph representation of the network

Type:: Union[nx.MultiGraph, nx.MultiDiGraph]

combine_pvalues(cids: List[int]) → NetworkGroup[source]

Combine pvalues of links in the cids using Brown’s p-value merging method

cidsList[int]: The list of context ids that are to be used in the merger

Returns:: The NetworkGroup that contains the merged pvalues
Return type:: merged_network

property contexts: List[Dict[str, Any]]: The contexts for the group of networks

filter(pvalue_filter: bool, interaction_filter: bool) → NetworkGroup[source]

Filter network using pvalue and interaction thresholds

Parameters:

pvalue_filter (bool) – If True will use pvalue_threshold for filtering
interaction_filter (bool) – If True will use interaction_threshold for filtering

Returns:

The filtered NetworkGroup object

Return type:

“NetworkGroup”

get_adjacency_vectors(key: str) → DataFrame[source]

Returns the adjacency matrix for each context as a pd.DataFrame

Parameters:: key (str) – The edge property to be used to contruct the vectors
Returns:: The DataFrame containing adjacency vectors as columns
Return type:: pd.DataFrame

get_consensus_network(cids: Optional[List[int]] = None, method: str = 'simple_voting', parameter: float = 0.0) → NetworkGroup[source]

Get consensus network for the network defined by the cids

cidsOptional[List[int]]: The list of context ids that are to be used in the merger Default is None
methodstr, {“simple_voting”, “scaled_sum”}: Default value is simple_voting
parameterfloat: Default value is 0.0 (which is the union of all the links)

Returns:: The NetworkGroup that represents the consensus network
Return type:: consensus_network

json(pvalue_filter: bool = False, interaction_filter: bool = False) → str[source]

Returns the network as a JSON string

Parameters:

pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False
interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False

Returns:

The JSON string representation of the network

Return type:

str

property links: List[Dict[str, Any]]: The list of links in the NetworkGroup and their corresponding properties

classmethod load_json(fpath: Optional[str] = None, raw_data: Optional[dict] = None, id_field: str = 'taxid') → NetworkGroup[source]

Create a NetworkGroup object from network JSON file Either fpath or raw_data must be specified

Parameters:

fpath (str, optional) – The path to the network JSON file
raw_data (dict, optional) – The raw data stored in the network JSON file

Returns:

The instance of the NetworkGroup class

Return type:

NetworkGroup

property nodes: List[Dict[str, Any]]: The list of nodes in the NetworkGroup and their corresponding properties

to_network(method: str = 'mean') → Network[source]

update_thresholds(interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05) → None[source]

Update the thresholds on the networks

Parameters:

interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3
pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05

write(fpath: str, pvalue_filter: bool = False, interaction_filter: bool = False, split_files: bool = False) → None[source]

Write network to file as JSON

Parameters:

fpath (str) – The path to the JSON file
pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False
interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False
split_files (bool) – If True will write networks into separate files Default value is False

micone.main.otu module

Module that defines the Otu objects and methods to manipulate it

class micone.main.otu.Otu(otu_data: Table, sample_metadata: Optional[DataFrame] = None, obs_metadata: Optional[DataFrame] = None)[source]

Bases: object

An object that represents the OTU counts table

Parameters:

otu_data (Table) – biom.Table object containing OTU data
sample_metadata (pd.DataFrame, optional) – pd.DataFrame containing metadata for the samples
obs_metadata (pd.DataFrame, optional) – pd.DataFrame containing metadata for the observations (OTUs)

otu_data

OTU counts table in the biom.Table format

Type:: biom.Table

Notes

All methods that manipulate the Otu object return new objects

collapse_taxa(level: str) → Tuple[Otu, Dict[str, List[str]]][source]

Collapse Otu instance based on taxa

Parameters:: level (str) – The tax level of the collapsed table This will also be used as the prefix for the unique ids
Returns:: Collapsed Otu instance
Return type:: Tuple[Otu, dict]

filter(ids: Optional[Iterable[str]] = None, func: Optional[Callable[[ndarray, str, dict], bool]] = None, axis: str = 'observation') → Otu[source]

Filter Otu instance based on ids or func

Parameters:

ids (Iterable[str], optional) – An iterable of ids to keep. If ids are not supplied then func must be supplied
func (Callable[[np.ndarray, str, dict], bool], optional) – A function that takes in (values, id_ind, md) and returns a bool If func is not supplied then ids must be supplied If both ids and func are supplied then ids are used
axis ({'sample', 'observation'}, optional) – The axis along which to filter the Otu instance Default value is ‘observation’

Returns:

Filtered Otu instance

Return type:

Otu

is_norm(axis: str = 'sample') → bool[source]: Returns true if the Otu instance has been normalized

classmethod load_data(otu_file: str, meta_file: Optional[str] = None, tax_file: Optional[str] = None, dtype: str = 'biom', ext: Optional[str] = None) → Otu[source]

Load data from files into the Otu class instance

Parameters:

otu_file (str) – The path to the OTU counts file
meta_file (str, optional) – The path to the sample metadata file
tax_file (str, optional) – The path to the taxonomy file
dtype ({'biom', 'tsv'}) – The type of OTU file that is input
ext (str, optional) – The extension of the file if other than supported extensions Supported extensions: - ‘tsv’ dtype: ‘tsv’, ‘txt’, ‘counts’ - ‘biom’ dtype: ‘biom’, ‘hdf5’

Returns:

An instance of the Otu class

Return type:

Otu

normalize(axis: str = 'sample', method: str = 'norm') → Otu[source]

Normalize the OTU table along the provided axis

Parameters:

axis ({'sample', 'observation'}, optional) – Axis along which to normalize the OTU table Default is ‘sample’
method ({'norm', 'rarefy', 'css'}) – Normalization method to use

Returns:

Otu instance which is normalized along the given axis

Return type:

Otu

property obs_metadata: DataFrame: Lineage data for the observations (OTUs)

partition(axis: str, func: Callable[[str, dict], Hashable]) → Iterable[Tuple[str, Otu]][source]

Partition the Otu instance based on the func and axis

Parameters:

axis (str) – The axis on which to partition
func (Callable[[str, dict], Hashable]) – The function that takes in (id, metadata) and returns a hashable

Returns:

An iterable of tuples - (‘label’, Otu)

Return type:

Iterable[Tuple[str, Otu]]

Notes

To group by lineage “level” use:
func = lambda id_ind, md: Lineage(**md).get_superset(level)

rm_sparse_obs(prevalence_thres: float = 0.05, abundance_thres: float = 0.01, obssum_thres: int = 100) → Otu[source]

Remove observations with prevalence < prevalence_thres and abundance < abundance_thres

Parameters:

prevalence_thres (float) – Minimum fraction of samples the observation must be present in in order to be accepted
abundance_thres (float) – Minimum observation count fraction in a sample needed in order to be accepted
obssum_thres (int) – The theshold applied to the sum of observations for each row

Returns:

Otu instance with bad observations removed

Return type:

Otu

rm_sparse_samples(count_thres: int = 500) → Otu[source]

Remove samples with read counts less than count_thres

Parameters:: count_thres (int, optional) – Counts threshold below which samples are rejected Default value is 500
Returns:: Otu instance with low count samples removed
Return type:: Otu
Raises:: ValueError – If Otu instance is normalized

property sample_metadata: DataFrame

Metadata for the samples

Return type:: pd.DataFrame

property tax_level: str

Returns the taxonomy level of the Otu instance

Returns:: The lowest taxonomy defined in the Otu instance
Return type:: str

write(base_name: str, fol_path: str = '', file_type: str = 'biom') → None[source]

Write Otu instance object to required file_type

Parameters:

base_name (str) – The base name without extension to be used for the files
fol_path (str, optional) – The folder where the files are to be written Default is current directory
file_type ({'tsv', 'biom'}, optional) – The type of file data is to be written to Default is ‘biom’

micone.main package

Submodules

micone.main.lineage module

micone.main.network module

micone.main.network_group module

micone.main.otu module

Module contents