public class BiojavaSparkUtils extends Object
SparkUtils.| Constructor and Description |
|---|
BiojavaSparkUtils() |
| Modifier and Type | Method and Description |
|---|---|
static org.rcsb.mmtf.api.StructureDataInterface |
convertToStructDataInt(org.biojava.nbio.structure.Structure structure)
Get a
StructureDataInterface from a Biojava Structure. |
static org.rcsb.mmtf.spark.data.SegmentDataRDD |
filterSequenceSimilar(org.rcsb.mmtf.spark.data.SegmentDataRDD segmentDataRDD,
String inputSequence,
double minSimilarity)
Filter the
SegmentDataRDD based on minimum sequence similarity to a reference sequence. |
static AtomData |
findAtoms(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD)
Find all the atoms in the RDD.
|
static AtomData |
findAtoms(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD,
org.rcsb.mmtf.spark.data.AtomSelectObject selectObjectOne)
Find the given type of atoms for each structure in the PDB.
|
static AtomContactRDD |
findContacts(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD,
org.rcsb.mmtf.spark.data.AtomSelectObject selectObjectOne,
org.rcsb.mmtf.spark.data.AtomSelectObject selectObjectTwo,
double cutoff)
Find the contacts for each structure in the PDB.
|
static AtomContactRDD |
findContacts(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD,
org.rcsb.mmtf.spark.data.AtomSelectObject selectObjectOne,
double cutoff)
Find the contacts for each structure in the PDB.
|
static AtomContactRDD |
findContacts(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD,
double cutoff)
Find the contacts for each structure in the PDB.
|
static org.biojava.nbio.structure.contact.AtomContactSet |
getAtomContacts(List<org.biojava.nbio.structure.Atom> atoms,
double cutoff)
Get all the atom contacts in a list of atoms.
|
static org.biojava.nbio.structure.contact.AtomContactSet |
getAtomContacts(List<org.biojava.nbio.structure.Atom> atomListOne,
List<org.biojava.nbio.structure.Atom> atomListTwo,
double cutoff)
Get the contacts between two lists of atoms
|
static org.biojava.nbio.structure.contact.AtomContactSet |
getAtomContactsSlow(List<org.biojava.nbio.structure.Atom> atomListOne,
List<org.biojava.nbio.structure.Atom> atomListTwo,
double cutoff)
Get the contacts between two lists of atoms using iteration and not grids
|
static List<org.biojava.nbio.structure.Atom> |
getAtoms(org.rcsb.mmtf.api.StructureDataInterface structure)
Get all the atoms in the structure using a
StructureDataInterface. |
static List<org.biojava.nbio.structure.Atom> |
getAtoms(org.rcsb.mmtf.api.StructureDataInterface structure,
org.rcsb.mmtf.spark.data.AtomSelectObject atomSelectObject)
Get all the atoms of a given name or in a given group in the structure using a
StructureDataInterface. |
static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Structure> |
getBiojavaRdd(String filePath)
|
static org.biojava.nbio.structure.Atom[] |
getCaAtoms(org.rcsb.mmtf.spark.data.Segment segment)
Gets the C-alpha
Atom for the given input Segment. |
static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Atom[]> |
getChainRDD(List<String> pdbIdList)
Get the
JavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinates. |
static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Atom[]> |
getChainRDD(List<String> pdbIdList,
int minLength)
Get the
JavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinates. |
static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Atom[]> |
getChainRDD(String filePath,
int minLength,
double sample)
Get the
JavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinates. |
static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Atom[]> |
getChainRDD(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD,
int minLength)
Get the
JavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinates. |
static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Structure> |
getFromList(File[] pdbIdList)
Generate a
JavaPairRDD of String Structure from a list of PDB files. |
static String |
getGroupAtomName(org.biojava.nbio.structure.Atom atom)
Get a conjoined group atom name from an atom.
|
static org.rcsb.mmtf.spark.data.StructureDataRDD |
getStructureRDDFromMmcif(String filePath)
Function (for benchmarking) to get a
StructureDataRDD from a Hadoop file of mmCIF data. |
static String |
getTypeFromChainId(org.rcsb.mmtf.api.StructureDataInterface structureDataInterface,
int chainInd)
Get the type of a given chain index - SHOULD BE MOVED INTO ENCODER UTILS
|
public static org.biojava.nbio.structure.Atom[] getCaAtoms(org.rcsb.mmtf.spark.data.Segment segment)
Atom for the given input Segment.segment - the input Segment objectAtom objectspublic static AtomContactRDD findContacts(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD, org.rcsb.mmtf.spark.data.AtomSelectObject selectObjectOne, org.rcsb.mmtf.spark.data.AtomSelectObject selectObjectTwo, double cutoff)
selectObjectOne - the first type of atomsselectObjectTwo - the second type of atomscutoff - the cutoff distance (max) in AngstromJavaPairRDD of AtomContact objectspublic static AtomContactRDD findContacts(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD, org.rcsb.mmtf.spark.data.AtomSelectObject selectObjectOne, double cutoff)
selectObjectOne - the type of atomscutoff - the cutoff distance (max) in AngstromJavaPairRDD of AtomContact objectspublic static AtomContactRDD findContacts(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD, double cutoff)
cutoff - the cutoff distance (max) in AngstromJavaPairRDD of AtomContact objectspublic static AtomData findAtoms(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD, org.rcsb.mmtf.spark.data.AtomSelectObject selectObjectOne)
selectObjectOne - the type of atom to findJavaRDD of Atom objectspublic static AtomData findAtoms(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD)
JavaRDD of Atom objectspublic static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Structure> getBiojavaRdd(String filePath)
filePath - the input path to the hadoop sequence fileJavaPairRDD of String Structurepublic static List<org.biojava.nbio.structure.Atom> getAtoms(org.rcsb.mmtf.api.StructureDataInterface structure, org.rcsb.mmtf.spark.data.AtomSelectObject atomSelectObject)
StructureDataInterface.structure - the input StructureDataInterfacepublic static org.biojava.nbio.structure.contact.AtomContactSet getAtomContacts(List<org.biojava.nbio.structure.Atom> atoms, double cutoff)
atoms - the list of Atomscutoff - the cutoff distanceAtomContactSet of the contactspublic static org.biojava.nbio.structure.contact.AtomContactSet getAtomContacts(List<org.biojava.nbio.structure.Atom> atomListOne, List<org.biojava.nbio.structure.Atom> atomListTwo, double cutoff)
atomListOne - the first list of AtomsatomListTwo - the second list of Atomscutoff - the cutoff to define a contactAtomContactSet of the contactspublic static org.biojava.nbio.structure.contact.AtomContactSet getAtomContactsSlow(List<org.biojava.nbio.structure.Atom> atomListOne, List<org.biojava.nbio.structure.Atom> atomListTwo, double cutoff)
atomListOne - the first list of AtomsatomListTwo - the second list of Atomscutoff - the cutoff to define a contactAtomContactSet of the contactspublic static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Atom[]> getChainRDD(List<String> pdbIdList, int minLength) throws IOException
JavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinates.pdbIdList - the input list of PDB idsminLength - the minimum length of each chainJavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinatesIOException - due to an error reading the input filepublic static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Atom[]> getChainRDD(String filePath, int minLength, double sample) throws IOException
JavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinates.filePath - the Haddoop file to read fromminLength - the minimum length of each chainsample - the sample of this file to takeJavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinatesIOException - due to an error reading the input filepublic static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Atom[]> getChainRDD(List<String> pdbIdList) throws IOException
JavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinates.pdbIdList - the input list of PDB idsJavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinatesIOException - due to an error reading the input filepublic static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Atom[]> getChainRDD(org.rcsb.mmtf.spark.data.StructureDataRDD structureDataRDD, int minLength) throws IOException
JavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinates.structureDataRDD - the input StructureDataRDDminLength - the minimum length of each chainJavaPairRDD of Key: PDBID.CHAINID and Value: Atom array of the C-alpha coordinatesIOException - due to an error reading the input filepublic static List<org.biojava.nbio.structure.Atom> getAtoms(org.rcsb.mmtf.api.StructureDataInterface structure)
StructureDataInterface.structure - the input StructureDataInterfacepublic static org.rcsb.mmtf.spark.data.SegmentDataRDD filterSequenceSimilar(org.rcsb.mmtf.spark.data.SegmentDataRDD segmentDataRDD,
String inputSequence,
double minSimilarity)
throws org.biojava.nbio.core.exceptions.CompoundNotFoundException
SegmentDataRDD based on minimum sequence similarity to a reference sequence.inputSequence - the reference sequence to compareminSimilarity - the minimum similarity (as a double between 0.00 and 1.00)SegmentDataRDD after being filteredorg.biojava.nbio.core.exceptions.CompoundNotFoundException - if Biojava cannot accurately convert the String sequence to a ProteinSequencepublic static String getGroupAtomName(org.biojava.nbio.structure.Atom atom)
atom - the input atompublic static org.rcsb.mmtf.spark.data.StructureDataRDD getStructureRDDFromMmcif(String filePath)
StructureDataRDD from a Hadoop file of mmCIF data.filePath - the path of the Hadoop sequnece fileStructureDataRDD generatedpublic static org.rcsb.mmtf.api.StructureDataInterface convertToStructDataInt(org.biojava.nbio.structure.Structure structure)
StructureDataInterface from a Biojava Structure.structure - the input structure to covnertStructureDataInterface of the Biojava Structurepublic static org.apache.spark.api.java.JavaPairRDD<String,org.biojava.nbio.structure.Structure> getFromList(File[] pdbIdList)
JavaPairRDD of String Structure from a list of PDB files.pdbIdList - the input list of PDB filesJavaPairRDD of String Structurepublic static String getTypeFromChainId(org.rcsb.mmtf.api.StructureDataInterface structureDataInterface, int chainInd)
structureDataInterface - the input StructureDataInterfacechainInd - the index of the relevant chainString describing the chainCopyright © 2016 Biojava. All Rights Reserved.