micone.main package
Submodules
micone.main.lineage module
Module that implements the Lineage class and methods to work with taxonomy data
- class micone.main.lineage.Lineage(Kingdom: str = '', Phylum: str = '', Class: str = '', Order: str = '', Family: str = '', Genus: str = '', Species: str = '')[source]
Bases:
Lineage
NamedTuple that stores the lineage of a taxon and methods to interact with it
- Kingdom
- Type:
str
- Phylum
- Type:
str
- Class
- Type:
str
- Order
- Type:
str
- Family
- Type:
str
- Genus
- Type:
str
- Species
- Type:
str
- classmethod from_str(lineage_str: str, style: str = 'gg') Lineage [source]
Create Lineage instance from a lineage string
- Parameters:
lineage_str (str) – Lineage in the form of a string
style ({'gg', 'silva'}, optional) – The style of the lineage string Default is ‘gg’
- Returns:
Instance of the Lineage class
- Return type:
- classmethod from_taxid(taxid: int) Lineage [source]
Create Lineage instance from taxid
- Parameters:
taxid (int) – A valid NCBI taxonomy id
- Returns:
Instance of the Lineage class
- Return type:
“Lineage”
- get_superset(level: str) Lineage [source]
Return a superset of the current lineage for the requested level
- Parameters:
level (str) – The lowest Lineage field to be used to calculate the superset
- Returns:
Lineage instance that is a superset of current instance
- Return type:
- property name: Tuple[str, str]
Get the lowest populated level and name of the taxon
- Returns:
Tuple containing (level, name)
- Return type:
Tuple[str, str]
- property taxid: Tuple[str, int]
Get the NCBI taxonomy id of the Lineage
- Returns:
A tuple containing (taxonomy level, NCBI taxonomy id)
- Return type:
Tuple[str, int]
micone.main.network module
Module that defines the Network object and methods to read, write and manipulate it
- class micone.main.network.Network(nodes: List[str], links: List[Tuple[str, str, Dict[str, float]]], metadata: dict, cmetadata: dict, obs_metadata: DataFrame, children_map: Optional[dict] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False)[source]
Bases:
object
Class that represents a network object
- Parameters:
nodes (List[str]) – The list of nodes in the network
links (List[LinkDType]) – The list of links in the network Each link is a dict and must contain: ‘source’, ‘target’, ‘weight’, ‘pvalue’ as keys
metadata (dict) – The metadata for the whole network (general and experiment) Must contain ‘host’, ‘condition’, ‘location’, ‘experimental_metadata’, ‘pubmed_id’, ‘description’, ‘date’, ‘authors
cmetadata (dict) – The computational metadata for the whole network Must contain information as to how the network was generated
obs_metadata (pd.DataFrame) – The DataFrame containing taxonomy information for the nodes of the network If this contains an ‘Abundance’ column then it is incorporated into the network
children_map (dict, optional) – The dictionary that contains the mapping {obs_id => [children]}
interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation
interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3
pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05
pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction Use Network.pcorr_methods to get the list of supported methods
directed (bool, optional) – True if network is directed Default value is False
- graph
The networkx graph representation of the network
- Type:
Union[nx.Graph, nx.DiGraph]
- nodes
The list of nodes in the network and their corresponding properties
- Type:
DType
- links
The list of links in the network and their corresponding properties
- Type:
DType
- metadata
The metadata for the network
- Type:
Dict[str, Any]
Examples
>>> network = Network.load_data()
- filter(pvalue_filter: bool, interaction_filter: bool) Network [source]
Filter network using pvalue and interaction thresholds
- Parameters:
pvalue_filter (bool) – If True will use pvalue_threshold for filtering
interaction_filter (bool) – If True will use interaction_threshold for filtering
- Returns:
The filtered Network object
- Return type:
“Network”
- get_adjacency_table(key: str) DataFrame [source]
Returns the adjacency table representation for the requested key This method does not support Graph
- Parameters:
key (str) – The edge property to be used to construct the table
- Returns:
The adjacency table
- Return type:
pd.DataFrame
- json(pvalue_filter: bool = False, interaction_filter: bool = False) str [source]
Returns the network as a JSON string
- Parameters:
pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False
interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False
- Returns:
The JSON string representation of the network
- Return type:
str
- property links: List[Dict[str, Any]]
The list of links in the network and their corresponding properties
- classmethod load_data(interaction_file: str, meta_file: str, cmeta_file: str, obsmeta_file: str, pvalue_file: Optional[str] = None, children_file: Optional[str] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False) Network [source]
Create a Network object from files (interaction tables and other metadata)
- Parameters:
interaction_file (str) – The tsv file containing the matrix of interactions
meta_file (str) – The json file containing the metadata for the whole network (general and experiment)
cmeta_file (str) – The json file containings the computational metadata for the whole network
obsmeta_file (str) – The csv file containing taxonomy information for the nodes of the network
pvalue_file (str, optional) – The tsv file containing the matrix of pvalues Default is None
children_file (str, optional) – The json file containing the mapping between observations and their children
interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation
interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3
pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05
pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction Use Network.pcorr_methods to get the list of supported methods
directed (bool) – True if network is directed Default value is False
- Returns:
The instance of the Network class
- Return type:
- classmethod load_elist(elist_file: str, meta_file: str, cmeta_file: str, obsmeta_file: str, children_file: Optional[str] = None, interaction_type: str = 'correlation', interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05, pvalue_correction: Optional[str] = 'fdr_bh', directed: bool = False) Network [source]
Create Network instance from an edge list and associated metadata
- Parameters:
elist_file (str) – The csv file containing the list of edges and their associated metadata
meta_file (dict) – The file containing metadata for the whole network (general and experiment) Must contain ‘host’, ‘condition’, ‘location’, ‘experimental_metadata’, ‘pubmed_id’, ‘description’, ‘date’, ‘authors’
cmeta_file (dict) – The computational metadata for the whole network Must contain information as to how the network was generated
obsmeta_file (str) – The csv file contanining taxonomy information for the nodes of the network If this contains an ‘Abundance’ column then it is incorporated into the network
children_file (str, optional) – The json file that describes the mapping between {obs_id => [children]}
interaction_type (str, optional) – The type of interaction encoded by the edges of the network Default value is correlation
interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3
pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05
pvalue_correction (str, optional) – The method to use for multiple hypothesis correction Default value is ‘fdr_bh’ Set to None to turn off multiple hypothesis correction
directed (bool, optional) – True if network is directed Default value is False
- Returns:
The instance of the Network class
- Return type:
- classmethod load_graph(graph: Union[Graph, DiGraph]) Network [source]
Load Network object from a networkx graph
- Parameters:
graph (Union[nx.Graph, nx.DiGraph]) – The networkx graph of the network
- Returns:
The instance of the Network class
- Return type:
- classmethod load_json(fpath: Optional[str] = None, raw_data: Optional[dict] = None) Network [source]
Create a Network object from a network JSON file Either fpath or raw_data must be specified
- Parameters:
fpath (str, optional) – The path to the network JSON file
raw_data (dict, optional) – The raw data stored in the network JSON file
- Returns:
The instance of the Network class
- Return type:
- property metadata: Dict[str, Any]
The metadata for the network
- property nodes: List[Dict[str, Any]]
The list of nodes in the network and their corresponding properties
- property pcorr_methods: List[str]
Returns list supported pvalue correction methods
- write(fpath: str, pvalue_filter: bool = False, interaction_filter: bool = False) None [source]
Write network to file as JSON
- Parameters:
fpath (str) – The path to the JSON file
pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False
interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False
micone.main.network_group module
Module that defines the NetworkGroup object and methods to read, write and manipulate it
- class micone.main.network_group.NetworkGroup(networks: List[Network], id_field: str = 'taxid')[source]
Bases:
Collection
Class that represents a group of network objects These network objects are intended to be visualized together
- Parameters:
networks (List[Network]) – The collection of networks to be grouped key = context-id, value = Network
id_field (str) – The field to use while combining nodes Default value is “taxid”
- graph
The networkx multi-graph representation of the network
- Type:
Union[nx.MultiGraph, nx.MultiDiGraph]
- nodes
The list of nodes in the network group
- Type:
DType
- links
The list of links in the network group
- Type:
DType
- contexts
The list of all contexts in the network group
- Type:
DType
- combine_pvalues(cids: List[int]) NetworkGroup [source]
Combine pvalues of links in the cids using Brown’s p-value merging method
- cidsList[int]
The list of context ids that are to be used in the merger
- Returns:
The NetworkGroup that contains the merged pvalues
- Return type:
merged_network
- property contexts: List[Dict[str, Any]]
The contexts for the group of networks
- filter(pvalue_filter: bool, interaction_filter: bool) NetworkGroup [source]
Filter network using pvalue and interaction thresholds
- Parameters:
pvalue_filter (bool) – If True will use pvalue_threshold for filtering
interaction_filter (bool) – If True will use interaction_threshold for filtering
- Returns:
The filtered NetworkGroup object
- Return type:
“NetworkGroup”
- get_adjacency_vectors(key: str) DataFrame [source]
Returns the adjacency matrix for each context as a pd.DataFrame
- Parameters:
key (str) – The edge property to be used to contruct the vectors
- Returns:
The DataFrame containing adjacency vectors as columns
- Return type:
pd.DataFrame
- get_consensus_network(cids: Optional[List[int]] = None, method: str = 'simple_voting', parameter: float = 0.0) NetworkGroup [source]
Get consensus network for the network defined by the cids
- cidsOptional[List[int]]
The list of context ids that are to be used in the merger Default is None
- methodstr, {“simple_voting”, “scaled_sum”}
Default value is simple_voting
- parameterfloat
Default value is 0.0 (which is the union of all the links)
- Returns:
The NetworkGroup that represents the consensus network
- Return type:
consensus_network
- json(pvalue_filter: bool = False, interaction_filter: bool = False) str [source]
Returns the network as a JSON string
- Parameters:
pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False
interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False
- Returns:
The JSON string representation of the network
- Return type:
str
- property links: List[Dict[str, Any]]
The list of links in the NetworkGroup and their corresponding properties
- classmethod load_json(fpath: Optional[str] = None, raw_data: Optional[dict] = None, id_field: str = 'taxid') NetworkGroup [source]
Create a NetworkGroup object from network JSON file Either fpath or raw_data must be specified
- Parameters:
fpath (str, optional) – The path to the network JSON file
raw_data (dict, optional) – The raw data stored in the network JSON file
- Returns:
The instance of the NetworkGroup class
- Return type:
- property nodes: List[Dict[str, Any]]
The list of nodes in the NetworkGroup and their corresponding properties
- update_thresholds(interaction_threshold: float = 0.3, pvalue_threshold: float = 0.05) None [source]
Update the thresholds on the networks
- Parameters:
interaction_threshold (float, optional) – The value to which the interactions (absolute value) are to be thresholded To disable thresholding based on interaction value then pass in 0.0 Default value is 0.3
pvalue_threshold (float, optional) – This is the alpha value for pvalue cutoff Default value is 0.05
- write(fpath: str, pvalue_filter: bool = False, interaction_filter: bool = False, split_files: bool = False) None [source]
Write network to file as JSON
- Parameters:
fpath (str) – The path to the JSON file
pvalue_filter (bool) – If True will use pvalue_threshold for filtering Default value is False
interaction_filter (bool) – If True will use interaction_threshold for filtering Default value is False
split_files (bool) – If True will write networks into separate files Default value is False
micone.main.otu module
Module that defines the Otu objects and methods to manipulate it
- class micone.main.otu.Otu(otu_data: Table, sample_metadata: Optional[DataFrame] = None, obs_metadata: Optional[DataFrame] = None)[source]
Bases:
object
An object that represents the OTU counts table
- Parameters:
otu_data (Table) – biom.Table object containing OTU data
sample_metadata (pd.DataFrame, optional) – pd.DataFrame containing metadata for the samples
obs_metadata (pd.DataFrame, optional) – pd.DataFrame containing metadata for the observations (OTUs)
- otu_data
OTU counts table in the biom.Table format
- Type:
biom.Table
- sample_metadata
Metadata for the samples
- Type:
pd.DataFrame
- obs_metadata
Lineage data for the observations (OTUs)
- Type:
pd.DataFrame
- tax_level
The taxonomy level of the current Otu instance
- Type:
str
Notes
All methods that manipulate the Otu object return new objects
- collapse_taxa(level: str) Tuple[Otu, Dict[str, List[str]]] [source]
Collapse Otu instance based on taxa
- Parameters:
level (str) – The tax level of the collapsed table This will also be used as the prefix for the unique ids
- Returns:
Collapsed Otu instance
- Return type:
Tuple[Otu, dict]
- filter(ids: Optional[Iterable[str]] = None, func: Optional[Callable[[ndarray, str, dict], bool]] = None, axis: str = 'observation') Otu [source]
Filter Otu instance based on ids or func
- Parameters:
ids (Iterable[str], optional) – An iterable of ids to keep. If ids are not supplied then func must be supplied
func (Callable[[np.ndarray, str, dict], bool], optional) – A function that takes in (values, id_ind, md) and returns a bool If func is not supplied then ids must be supplied If both ids and func are supplied then ids are used
axis ({'sample', 'observation'}, optional) – The axis along which to filter the Otu instance Default value is ‘observation’
- Returns:
Filtered Otu instance
- Return type:
- classmethod load_data(otu_file: str, meta_file: Optional[str] = None, tax_file: Optional[str] = None, dtype: str = 'biom', ext: Optional[str] = None) Otu [source]
Load data from files into the Otu class instance
- Parameters:
otu_file (str) – The path to the OTU counts file
meta_file (str, optional) – The path to the sample metadata file
tax_file (str, optional) – The path to the taxonomy file
dtype ({'biom', 'tsv'}) – The type of OTU file that is input
ext (str, optional) – The extension of the file if other than supported extensions Supported extensions: - ‘tsv’ dtype: ‘tsv’, ‘txt’, ‘counts’ - ‘biom’ dtype: ‘biom’, ‘hdf5’
- Returns:
An instance of the Otu class
- Return type:
- normalize(axis: str = 'sample', method: str = 'norm') Otu [source]
Normalize the OTU table along the provided axis
- Parameters:
axis ({'sample', 'observation'}, optional) – Axis along which to normalize the OTU table Default is ‘sample’
method ({'norm', 'rarefy', 'css'}) – Normalization method to use
- Returns:
Otu instance which is normalized along the given axis
- Return type:
- property obs_metadata: DataFrame
Lineage data for the observations (OTUs)
- partition(axis: str, func: Callable[[str, dict], Hashable]) Iterable[Tuple[str, Otu]] [source]
Partition the Otu instance based on the func and axis
- Parameters:
axis (str) – The axis on which to partition
func (Callable[[str, dict], Hashable]) – The function that takes in (id, metadata) and returns a hashable
- Returns:
An iterable of tuples - (‘label’, Otu)
- Return type:
Iterable[Tuple[str, Otu]]
Notes
- To group by lineage “level” use:
func = lambda id_ind, md: Lineage(**md).get_superset(level)
- rm_sparse_obs(prevalence_thres: float = 0.05, abundance_thres: float = 0.01, obssum_thres: int = 100) Otu [source]
Remove observations with prevalence < prevalence_thres and abundance < abundance_thres
- Parameters:
prevalence_thres (float) – Minimum fraction of samples the observation must be present in in order to be accepted
abundance_thres (float) – Minimum observation count fraction in a sample needed in order to be accepted
obssum_thres (int) – The theshold applied to the sum of observations for each row
- Returns:
Otu instance with bad observations removed
- Return type:
- rm_sparse_samples(count_thres: int = 500) Otu [source]
Remove samples with read counts less than count_thres
- Parameters:
count_thres (int, optional) – Counts threshold below which samples are rejected Default value is 500
- Returns:
Otu instance with low count samples removed
- Return type:
- Raises:
ValueError – If Otu instance is normalized
- property sample_metadata: DataFrame
Metadata for the samples
- Return type:
pd.DataFrame
- property tax_level: str
Returns the taxonomy level of the Otu instance
- Returns:
The lowest taxonomy defined in the Otu instance
- Return type:
str
- write(base_name: str, fol_path: str = '', file_type: str = 'biom') None [source]
Write Otu instance object to required file_type
- Parameters:
base_name (str) – The base name without extension to be used for the files
fol_path (str, optional) – The folder where the files are to be written Default is current directory
file_type ({'tsv', 'biom'}, optional) – The type of file data is to be written to Default is ‘biom’